Name: Hrithik Rai Saxena

Job Role: Full Stack Data Scientist

Experience: 4 Years 2 Months

Address: Wuppertal, Germany

Skills

Data Science and Visualization 95%
Machine Learning 90%
Data Engineering 70%
Statistical Analysis 95%
MLOps 80%

About

About Me

With over 4 years of immersive industrial experience in the field of data science & machine learning, accompanied by a master's degree in artificial Intelligence and data Science. Proficient in creating production level data science softwares, statistical analysis, research and hypothesis testing, time series analysis, deep learning & machine learning. Demonstrated success in leading impactful projects and providing effective mentorship.

  • Profile: Data Science & Machine Learning
  • Domain: Autonomous Retail, Packaging and Supply Chain, Finance & Public Sector
  • Education: M.Sc. Artificial Intelligence and Data Science
  • Language: English, German, Hindi
  • Programming: Python, C, C++, JavaScript, SQL, HTML & CSS
  • Data Science: Descriptive analysis,Exploratory Data analysis, Predictive analysis, Prescriptive analysis, Diagnostic analysis, Inferential analysis, Regression and Factor analysis
  • Machine Learning: Supervised and unsupervised learning, Deep learning,Reinforcement learning, Transformer based models, Streaming based, Continual Learning, Generative models
  • Database Management: SQL (PostGres, MySQL), NoSQL(MongoDB), Data migration pipelines, Data Modeling, Data Warehousing and ETL Processes
  • Data Engineering: Apache Spark, Power BI, AWS S3, Glue, Kinesis, MongoDB, BigQuery, Data Warehouse (Amazon Redshift)
  • MLOps: Version Control(Git), MLFlow, YAML for MLOps configuration and deployment, Docker, GitLab CI - CI/CD Pipelines and DevOps, Continuous monitoring for MLOps (Grafana) and Retraining(event/trigger based - WhyLogs), System diagnostics and testing, Deployment(Heroku, Google Cloud Run)
  • Cloud Computing: AWS, Google Cloud
  • Software Engineering: Procedural Programming, Object Oriented Programming, System Architecture Design and Optimization, Unit testing, Logging, Debugging, AGILE oriented development, Jira ticketing, Confluence Documentation, SRS and PRD, Parallel Programming frameworks, Asynchronous calling frameworks, caching
  • Interest: Traveling, Blogging, Music Composition

0 +   Projects completed

LinkedIn

Resume

Resume

I am a Full stack Data scientist with over 4 years of immersive industrial experience. From taking an idea from the proof of concept until production, with holistic collaboration followed by utmost dedication and unwavering commitment to deliver the best product possible under the given resources and time, I find joy in delivering satisfaction..

Experience


Aug 2023-Present

Master Thesis Student/ Data Science Intern

Livello Technologies, Düsseldorf, Germany

Thesis title - A comparison of time series models for revenue and product demand forecasting in smart fridges. / Ein Vergleich von Zeitreihenmodellen für Umsatz- und Produktnachfrageprognosen bei intelligenten Kühlschränken.

  • Developed an end-to-end Revenue and Product demand forecasting system with a novel ‘storage-oriented’ architecture, thus bringing down the average response time from 8 seconds to 3.5 seconds.
  • Introduced efficient data migration strategies (MongoDB, BigQuery), cutting monthly bills by 42%.
  • Introduced exogenous variables and dynamic model parametrization components, bringing down the MAPE from 63% to 27%.
  • Implemented a ‘data-drift based’ model retraining strategy, eliminating the human in the loop.
  • Implemented loggers, test cases and memory profilers for better system diagnostics and monitoring.
  • Efficient query optimization and caching for better payload handling.
  • Configured CI/CD pipeline – Docker Image build, Test, Security Check, Cloud Run, Deploy, Migrate.
  • Setting up API service and configuring Nvidia Jetson Nano for the Google’s ‘Mediapipe’ library for people counting, age and gender detection service. Created a real time dashboard for the same using Dash.
  • Achievements - Top Star Performer Award.

Oct 2022- Mar 2023

Artificial Intelligence Intern

Syskron GmbH (Krones), Regensburg, Germany

The digitalisation unit of the Krones group, implementing state-of-the-art digital filling and packaging line.

  • Formulated strategies for model retraining of the environment model of the company's benchmark Reinforcement Learning based product – ContiloopAI.
  • Updated the previous data pipeline to feature engineer raw factory data coming from an end device(ReadyKit) via Share2Act. This helped us to save on-site manual data acquisition and field trips.
  • Developed strategies for model retraining via different kind of performance based and time based triggers.
  • Documented training articles on reinforcement learning and model retraining and monitoring for the AI team.

Sept 2018- Jul 2021

Data Scientist/ Data Analyst

Government of Madhya Pradesh, Bhopal, India

SCHMP, Department of Food, Civil Supplies and Consumer Protection (Run by AS Foundation).

  • Developed end-to-end machine learning prototypes and scaled them to run in a production environment.
  • Created a SVM based classifier system with 92% accuracy to assign ration cards to different consumer based on their income. This reduced the request processing time for a consumer by 60%
  • Derived actionable insights from massive data obtained from 52 districts of the state to facilitate better policy making decisions.
  • Analyzed old information architectures and contributed to the design and development of new ones.
  • Created easy-to-understand visual dashboards projecting growth and giving insights to facilitate the proper execution of government schemes.
  • Created a chatbot using IBM Watson for quick redressal of urban consumer grievances.



Education


2015-2019

Bachelor of Engineering - Information Technology

University Institute of Technology, RGPV

Grade: First class distinction.

2021-2024

Master of Science - Artificial Intelligence and Data Science

Deggendorf Institute of Technology, Germany/ University of South Bohemia, Czech Republic

Grade: 1.8

Projects

Projects

Below are some of the projects that I have worked on.

AMDB - Awesome Movie Database and Recommender System

A highly responsive movie database and content based recommender system using cosine similarity.

Dataset Used = Movielens ratings(25 million rows).

  • Search for a movie by its title or part and get similar recommendations
  • View Top rated movies, Genre Wise
  • Find users in the database with similar taste of movies
  • Recommend movies to a given user from the database
  • Get recommendations based on your choice of movies

Autoencoder as an end-to-end communication system

  • To train an Autoencoder for Information Transmission at different hyperparameter tunings and represent it as a standalone end-to-end communication system.
  • To build the parts of the Autoencoder such that they replace the components of a conventional communication system.
  • Performing Hyperparameter tuning to ensure optimal reconstruction of outputs.
  • Comparing the performance of the autoencoder with different Modulation Schemes (QPSK/8PSK)

Parallelization of Energy calculation for a box of water molecules

  • I have parallelized the massive energy calculation using technologies like MPI and OpenMP.
  • Project executed over a distributed computing infrastructure - MetaCentrum (Czech National Grid Organization)
  • Components parallelized - Reading the massive data, The energy calculation loop.
  • To facilitate Parallelization while achieving minimum load imbalance.


Bike sharing demand estimation

  • This Bike sharing system will function as a sensor network, which can be used for studying mobility in a city. Here I combined historical usage patterns with weather data in order to forecast bike rental demand in the Capital Bikeshare program in Washington, D.C.
  • Bike sharing demand prediction using hourly dataset (17379 rows, 17 features).
  • Used a multilinear regression model.
  • Took care of Multiple Linear Regression Assumptions : Autocorrelation, Multicollinearity, Endogeneity, Residual Normality

A collaborative based movie recommender system using PySpark

  • Recommending movies to a given user based on collabarative filtering
  • Built using PySpark
  • Dataset Used = Movielens ratings(25 million rows).
  • Used Alternating Least Squares (ALS) matrix factorization.

Blogs

Blogs

Since you have made it this far, I'd love to hear your views on some of the blogs that I've written.

The sheer beauty of Time Series Analysis

Understanding the core intuition behind time series analysis.

Stuff that every machine learning engineer should know.

Understanding benchmarks for creating a robust machine learning system.

The model retraining bible - Part 1

Post deployment model decay? No problem...


The model retraining bible - Part 2

Cause a model will only behave under supervision.

Reinforcement Learning, can’t get any easier than this...

Understanding the intuition behind reinforcement learning, policy and value functions and reward

Certifications

Certifications

Here are some of the certifications that I collected along my Data Science Journey.

Time Series Analysis, Forecasting, and Machine Learning.

Issuer - Udemy

ETS, Time Series Modelling - Statistical (ARIMA,SARIMAX), Deep Learning (ANN,CNN,RNNs,LSTMs,NBeats,NHits,PSTPatch), Vector models, GARCH, FB Prophet, Holt-Winters, Tensorflow2, AWS Forecast

Google's GenerativeAI path.

Issuer - Google

Generative AI Studio, Image captioning models, transformer based and BERT, Attention mechanism, Encoder-Decoder architecture, Image generation, Large Language Models(LLMs), Responsible AI

Data Science Math Skills

Issuer - Duke University

Set Theory, Functions and Graphs, Derivatives, Optimization (non-constrained, contrained - Lagrange's multiplier, Linear Programming), Probability


CI/CD Pipelines and DevOps

Issuer - Udemy

Continuous Integration and Continuous Delivery (CI/CD) · DevOps · MLOps · Gitlab · Docker · Unit Testing

Hypothesis Testing

Issuer - UpGrad

Types of hypothesis, types of test for the hypothesis, decision-making criteria for hypothesis, critical value and p-value method for testing

0 Achievements
0 Projects
0 DS/ML softwares deployed
0 Cups of coffee

More projects on Github

I love to solve business problems & uncover hidden data stories


GitHub

Contact

Contact Me

Below are the details to reach out to me!

Address

Wuppertal, Germany

Contact Number

+ 49 15172456126

Email Address

hrsaxena97@gmail.com

Download Resume

Click Me



Have a Question? Click Here