Mohammad Junaid

I'm

About

I'm a recent MTech graduate from IIT Guwahati with one year hands-on experience as a Data Science Trainee at iNeuron. Eager to apply my analytical and technical skills, I'm seeking roles in Data Analysis, Data Science, Data Engineering, Business Analyst or Bioinformatics. Motivated, enthusiastic, and open to opportunities, I bring a strong foundation in Data Science and a commitment to excellence. Let's connect and explore how I can contribute to your team.

Skills

Python Programming 70%
Machine Learning70%
Deep Learning60%
Computer Vision | Deep Learning60%
NLP | LLMS | Generative AI50%
Time Series Analysis | Deep Learning50%
Excel | Power Query | Power Pivot70%
Power BI 70%
Tableau 60%
S Q L (My S Q L) 40%
NoS Q L (Mongo D B) 30%
Data Analysis And Data Visualisation With Python 70%
NLP
  • LSTM, BERT, GPT2, Google FLAN-T5
Generative AI
  • LangChain, ChromaDb, FAISS, Google PaLM, OpenAI GPT
Computer Vision DL
  • Classification, Detection, Segmentation, Tracking
MachineLearning DeepLearning & Generative AI
LangChain ChromaDB PaLM TensorFlow Detectron2 YOLO Scikit-learn
Database & Analytics
MySQL MongoDB Tableau PowerBi Excel
Visualization & Processing
Matplotlib Seaborn Plotly Pandas Numpy
Development & Deployment
AWS AZURE Streamlit Flask
AIOPS/ MLOPS
Linux Docker GIT
Automation
BeautifulSoup Selenium ydata_profiling AutoViz D-tale
Programming
Python SAS Programming
Tools
PyCharm vscode jupyter

Resume

Education

Master of Technology In Biotechnology

2020 - 2022

Indian Institute of Technology, Guwahati, India

Bachelor of Technology In Biotechnology

2016 - 2020

Dr A. P. J. Abdul Kalam Technical University, Lucknow, Uttar Pradesh, India

National Level Examination

BET: Biotechnology Eligibility Test (DBT JRF)

2021

Conducted By Department of Biotechnology, Ministry of Science & Technology, Government of India to grant fellowships to candidates who want to pursue a Ph. D. in biotechnology

  • Category - I

GATE: Graduate Aptitude Test in Engineering

Decipline: Biotechnology

2020
  • Rank: 497
  • Score: 515

UPSEE: Uttar Pradesh State Entrance Examination

Decipline: Biotechnology and B.Pharma

2016
  • Rank (Biotech): 608
  • Rank (B.Pharma): 400

Relevant Courses

Biostatistics

BTech | AKTU

Quantitative Biology

MTech | IIT Guwahati

Experience

Data Scientist

Feb 2024 - Present

Ministry of Rural Development

Skills: Python, Deep Learning for computer vision, FastAI, Data Analysis

Data Science Trainee

Sep 2022 - Sep 2023

PW Skills (Formerly: iNeuron.ai)

Skills: Python, Machine Learnig, Deep Learning For Computer vision, SQL, MongoDB, Excel, Tableau, Power BI, MLOPs, Git, Docker, Linux

Teaching Assistant

Feb 2022 - Jun 2022

Indian Institute of Technology, Guwahati, India

Performed TA duties

Summer Research Trainee

Jun 2018 - Jul 2018

MRD LifeSciences Pvt Ltd, Lucknow

Skills: PCR, SDS-PAGE, Protein, DNA and RNA Isolation, Western Blotting, Antibiotic Sensitivity Test, Cloning, Minimal Inhibitory Concentration Test, Antibiotic Sensetivity Test

Training

Full Stack Data Science

01 YEAR | Ongoing | iNeuron- PW Skills
  • Python Programming
  • Machine Learning
  • Deep Learning | Computer Vision

Applied Data Science With Python Specialization

20 Weeks | Michigan University- Coursera

Five courses included

  • Applied Plotting, Charting & Data Representation
  • Applied Machine Learning In Python
  • Applied Text Mining In Python
  • Applied Social Network Analysis In Python

Data Science & Machine Learning

8 Weeks | Consulting & Analytics Club, IIT Guwahati

Excel For Data Analytics and Data Visualization

14 Weeks | Macquarie University- Coursera

Three courses included

  • Excel Fundamentals For Data Analysis
  • Data Visualization in Excel
  • Excel Power Tools For Data Analysis
  • Skills: Power Query, Power Pivot, Power BI

Portfolio

Computer Vision | Deep Learning


Personal Data Versioning Code Versioning


This project involves 3D segmentation of brain tumors from the BraTS2020 dataset using TensorFlow and 3D-UNET architecture. The BraTS2020 dataset is approximately 40 GB in size, comprising both training (~30 GB) and validation sets (~10 GB). The BraTS2020 dataset provides multimodal brain scans in NIFTI format (.nii.gz), which is commonly used in medical imaging to store brain imaging data obtained using MRI.

Personal Data Versioning Code Versioning


This project aims to detect the presence of coccidiosis, a common parasitic disease in Poultry(Chickens), using deep learning techniques. I have utilized transfer learning approach with the VGG16 architecture to classify input fecal images as either indicative of coccidiosis or not

Personal Data Versioning Code Versioning


The project offers a comprehensive solution for object detection or segmentation using YOLOv8, enabling precise localization and delineation of objects. It incorporates DeepSORT for object tracking with unique IDs, facilitating continuous monitoring and providing visual tracking trails for object movement analysis.

Personal Data Versioning Code Versioning


The system uses computer vision techniques to prevent spoofing attempts during face recognition, ensuring the accuracy and security of attendance records. Features Include Real-time face recognition using the face-recognition Python library, Integration of the Silent-Face-Anti-Spoofing model for enhanced security. User-friendly graphical user interface (GUI) created using the tkinter library.

Machine Learning


Launch Personal Data Versioning Code Versioning S3 Bucket Modular Coding Custom Logger and Exception Handler Package Building Docker


Thyroid diseases, such as hyperthyroidism and hypothyroidism, affect a significant portion of the population and can lead to serious health complications.In this project, I developed a machine learning-based multiclass classification model to predict the likelihood of a patient having a diseased state of the thyroid. Performed missing value handling, outlier handling, feature selection. 30 independent features, 15 class lables are present in original dataset. Class label of interest with miminum share makes 0.021% and with maximum share is 4.87% in original data. Score cards on Dshboard UI display LIVE SCORES for F1, ROC AUC, Balanced Accuracy and log loss. From dashboard, user can view Data Profiling Report, EDA Report, Drift Report, Model Performance Reports, logs, all artifacts and deployed model. It also alow user to view and modify configuration and retrain model from dashboard itself.

Launch Personal RandomForest LightGBM yDataProfiling VotingEnsemble Plotly Flask Render


I developed a web application for predicting forest fires (classification) and FWI- Fire Weather Index (regression) using machine learning techniques. The dataset had high multicollinearity among important features such as Fine Fuel Moisture Code (FFMC), Drought Code (DC), Initial Spread Index (ISI), and Fire Weather Index (FWI). To address this issue, I performed feature selection and feature engineering techniques to reduce multicollinearity, trained and evaluated different models including Random Forest, Gradient Boosting Trees, LightGBM, Support Vector Regressor, and AdaBoost. I utilized the ensemble technique of Voting Classifier and Voting Regressor in scikit-learn to produce final predictors. These ensembled models were used in the final Flask web application and deployed on Render.

Dash WebApp Personal BaggingClassifier Plotly Render Streamlit


The web app for making predictions about earnings is built using a Bagging Classifier, which has outperformed other algorithms in our testing. The model was trained and tested using the US Adult Census dataset from 1994.

Launch Personal MultinomialNB BernauliNB Plotly Streamlit Render


"I developed a model for predicting spam emails and SMS messages using a combination of Multinomial Naive Bayes and Bernoulli Naive Bayes. The model was able to accurately identify spam messages in most cases, with the exception of messages containing numerical values written in alphabetical form. In such cases, the Bernoulli Naive Bayes algorithm performed better. Additionally, the model was able to accurately identify legitimate emails from banks, stock brokers, and other investment partners, avoiding the potential for important messages to be flagged as spam. The final model is able to predict one of three categories: 'Spam', 'Not Spam', or 'Others: Be cautious' for emails that may require further scrutiny."

Gradient Boosting Biopython


As part of my MTech thesis, As part of my MTech thesis, I completed a project that utilized Gradient Boosting algorithms to identify DNA and protein backbones based on their backbone coordinates. Utilizing data from the PDB database, I extracted PDB files and extracted the coordinates of key atoms, including P, O5, C5, C4, C3, O3 for DNA and N, C-alpha, C for proteins for the training of model.

Natural Language Processing & Generative AI


Personal Render


Text summarization is a crucial task in natural language processing and information retrieval. This project demonstrates the implementation of a Transformer-based model for generating concise and coherent summaries from longer text documents.

Personal Personal Personal HuggingFace


This is an end to end LLM project based on Google Palm, Langchain, ChromaDB. We are building a system that can talk (Query in Natural Language) to MySQL database. User asks questions in a natural language and the system generates answers by converting those questions to an SQL query and then executing that query on MySQL database.

Personal Personal Personal HuggingFace


Powered by OpenAI GPT, LangChain, FAISS. this tool lets you dump URLs and ask questions about them. You may ask for summarization or may ask other questions to get insights from the articles.

Python Programming


Launch Personal Biopython Plotly Streamlit Render


Developed webapp for calculating torsion angles and plotting Ramachandran Plots. It uses Biopython for scrapping protein PDB files (PDB files store atomic cordinates for 3D structure of protein macromolecules) and Plotly for Plotting Ramachandran Plots. It is fully automated app which allow user to upload PDB files or to just provide list of PDB ID (which is unique identifier for each file), and to visualize full protein molecule or each chain of the molecule separately or to choose amino acids to be displayed on plot. Apart from it, the app generate fully interactive plot as well as table of calculated angles which can be downloaded by the user.

Launch Personal Beautifulsoup MongoDB Flask


Scrapyy is a web application that allows users to scrape data from Amazon.in and store it in a MongoDB database. Scrapyy is based on BeautifulSoup and generates two types of dataframes:

  1. Detailed DataFrame: Product Category, ProductID, ProductName, Rating, Number of customers rated, Discounted Price, DiscountPercentage and Actual Price
  2. Reviews DataFrame: Product Category, ProductID, ProductName, Customer Name and Reviews
Users can either download already prepared data or search for a specific product and specify the range of pages to be scraped for search results and reviews.


Personal Selenium


Developed an image scraper for scraping images using Google Chrome (chrome Driver). In this projects I have used Selenium for automating the process. Here user is allowed to search for the image of his/her interest and can mention number of images he/she wants. Also, all the images for a given query will be stored in a single folder the name of query.


PowerBI


Demo Personal


Performed data transformation, cleaning, conditional columns and measure in power query and build full working HR analytics dashboard that shows differnt aspects for Attrition analysis. Final cleaned and transformed data have shape of (1470 rows, 37 cols).


powerpoint


Demo Personal


Performed data transformation, cleaning, conditional columns and measure in power query and build full working dashboard for revenue and analytics for Atliq Hotels.

powerpoint

SQL


Personal

In this mini project, I performed an analysis of a music store dataset. I designed an Entity-Relationship (EER) diagram to visualize the relationships between the different tables. To improve code organization and reusability, I implemented stored procedures in MySQL. Additionally, I developed a Python script that efficiently pushed data into the MySQL database.