Sankha Saha — Stock Market Analysis

My Role

Project maker — ML Codes,Model fit,Data Manage and Documentation.

Mentors

Prof Shirsendu Mukherjee, Project Guide
Prof. Dhiman Dutta, Partial Guidance
Prof. Amlan Chakravarti, ML Guide

Timeline & Status

4 Months, Completed in June 2024

Overview

Followong project is made under the rules of The University of Calcutta.Guided by Department of Statistics,Asutosh College

I had made htis page to demonstrate the whole process of forecasring live.Though it might not be so interactive but will be understandable clearly.

All predictions are done with almost with 80% acuuracy level.Actual data plot,test plot and train plots are almost coninsided to each other.

CONTEXT

A final year university project.

It was more than 85% accurate.

Making a journal level project. Above all else, it was the right thing to do, and an opportunity to overdeliver.

Linkedin Profile.

IMAGE

THE PROBLEM

It was not just an analysis work,prediction is also included as well.

A huge data set.

Data analysis are done in general. However in my case, I wanted to make a predictive work also.
‍
The decision was made to build tons of codes — which came with its own set of unique constraints and challenges:

A few months deadline, because I had to do and make documentation before our external exams.

High GPU Consumption. The system had to be strong enough for training and predicting correctly.

Data should be Clean. As there was a chunk of data so cleaning it was a huge pressure.

Unreliable Data Source. Stok Data is very snsitive so collecting data was a huge task.

THE CHALLENGE

Get the perfect data , Clean it and predict with minimum time cost and with maximum acuuracy.

North Star Machine Learning principles:

Clean Data

Clear, To the Point, and Only necessary unbiased Data.

Less Time cost in Training

We need to fit such model that gives more accurate prediction with less time cost.

Least Error

Turning the RMSE least to get more accurate result.

UPDATE FLOW

Platform and making it the Backbone.

You gotta start somewhere.

I had chosen Google Collab as a Jupyter Notebook Provider. Link to the Github Reposiratory is given below. (Click on Figure 3.0).

3.0 It's a preview of README.md File in my Github for the Project.

IMAGE

Finding Accuracy amidst the chaos of Malaciousity.

Finding the data from NSE and following further upto prediction takes humungous level of effort.

Shown in Figure 3.1, categorizing instructions into all stages from beginning to end.

Additionally, overly-technical terms were revised to better cater to a general audience.

Pinpointing the issues.

Clearifying , Summerising and then Normalizing being analysed are the most difficult parts in this whole work.

Unnecesarry Variables should be omitted.

Data causing bias in analysis must be sorted and deleted

System out of memory — or is it vague?

Data storing issue is solved by bypassing to the Notebook.

No validation if action was performed successfully nor indication for progression

Test Plot , Train Plot and Actual Plot are almost collinear

Getting the quick fixes in.

1. Data collected from NSE. No scare about unbiasedness. 1st problem fixed.

2. As a library Pandas is imported.Data analysis becomes smoother.2nd problem fixed.

3. For model fit GRU Model. is Selected. Less effort and more accuracy. 3rd problem fixed.

(Figure 3.3) — with the objective of fixing the problems..

National Stock Exchange Data resolved.

Pandas dataframe resolved.

Gated Recurrent Unite Model resolved.

Data with garbage — with a static solution?

High RMSE — with a static solution?

Data Storage & High end specification device — once Jupyter Notebook came, now a revolution.

As I was high end device and data storage issue, I had noticed that there is Google Collaboratory (Click to go to Google Colab) for doing these works, there instance services resolved my problem.

KEY DISCOVERY

Discovering Google Collaboratory fixed my all problems about storing the data and having low end device.

Seasonal Decomposition - Trend.

Google data : Seasonal Decompose.

For google there is a very low increasing trend until 2012,but after 2012 there is an exponential high trend.

Very high seasonality

3.1Google Decomposition

IMAGE

Microsoft data : Seasonal Decompose.

Same for Microsoft data, there is a very slow increasing trend until 2012, but after 2012 there was an exponential high trend.
Very high seasonality.

3.2Microsoft Decomposition

IMAGE

IBM data : Seasonal Decompose.

Same for IBM data like Google and Microsoft trend spikes up from 2021.

3.3IBM Decomposition

IMAGE

Amazon data : Seasonal Decompose.

Amazon data is similar to Google.

3.4Amazon Decomposition

IMAGE

DATA PATTERNS

Google , Microsoft , IBM and Amazon - Individual

Analysing Data of Google — Normalizing.

4.1 Google Normalise.

IMAGE

Analysing Data of Microsoft — Normalise.

4.2 Microsoft Normalise

IMAGE

Analysing Data of IBM — Normalise.

4.3 IBM Normalise.

IMAGE

Analysing Data of Amazon — Normalise.

4.4 Amazon Normalise.

IMAGE

FINAL Prediction

A less errored Forecasting

Plots are almost coinsided

Google Prediction : Forecasting.

Click to focus

7.1Google Prediction

INTERACTIVE

Microsoft Prediction : Forecasting.

Click to Focus

7.2Microsoft prediction

INTERACTIVE

Microsoft Prediction Forecasting.

Click to focus

7.3IBM prediction

INTERACTIVE

IBM Prediction : Forecasting

Click to focus

7.4Amazon prediction

INTERACTIVE

Amazon prediction : Forecasting

RETROSPECTIVE

A bittersweet ending.

A HUGE SUCCESS

Over 85% of accuracy the forecasting plots are generated which broke the general error bound record!

Project Takeaways:

Big brand Stock Market

Viewing and analysing the stock markets of unicorn brands in tech industries.

Leveraging existing resources

It led me to the deep ocean of Stock and Machine Learning resources.

Bursting Facts

The trend of using ARIMA and base LSTM models for Time Series Forecasting is finally broke.

Satisfaction for over accuracy

In general RMSE takes around 15-20, but for my case it was 5.23 - 12.23 which pushed me towards 85% of accuracy level of real world.

Next project:

Milk Production Analysis

University of Calcutta, '23 — Making a Time Series Forecasting.