6 Months Data Science

6 Months Data Science

  • Statistics
  • Data visualization in python
  • EDA
  • Regression
  • Supervised Machine Learning
  • Unsupervised Machine Learning
  • Ensemble Techniques
  • Association Rule
  • Recommendation system
  • Artificial Neural Network
  • Introduction to NLP
  • Preprocessing of data
  • Feature extraction
  • POS
  • NER
  • How to implement spam detection
  • How to implement sentiment analysis
  • How to implement an article spinner
  • How to implement text summarization
  • How to implement latent semantic indexing
  • How to implement topic modelling
  • Hugging Face Transformers
  • Assignments for assessment
  • Projects
  • Internship

Course Outline

Module 1: Python

Python is the most important and necessary topic that every data scientist should have knowledge about. In this section, our instructors will take you through the basics of Python and areas where it can be used. You will learn how to use some of the current tools such as Numpy, Pandas, and Matplotlib. Therefore, module 1 includes –

  • Environment set-up
  • Jupyter overview
  • Python Numpy
  • Python Pandas
  • Python Matplotlib
  • Python Seaborn

Module 2: R

Used for statistical and data analysis, R programming language is one of the advanced statistical languages used in data science. This module teaches you how to explore data sets using R. Here you will learn –

  • An introduction to R
  • Data structures in R
  • Data visualization with R
  • Data analysis with R

Module 3: Statistics

When working with data, the knowledge of statistics is necessary and an important skill set that you must have. In this module, you will learn –

  • Important statistical concepts used in data science
  • Difference between population and sample
  • Types of variables
  • Measures of central tendency
  • Measures of variability
  • Coefficient of variance
  • Skewness and Kurtosis

Module 4: Inferential statistics

Inferential statistics is used to make generalizations of populations, from which samples are drawn. This is a new branch of statistics, which helps you learn to analyze representative samples of large data sets. In this module, you will learn –

  • Normal distribution
  • Test hypotheses
  • Central limit theorem
  • Confidence interval
  • T-test
  • ANOVA
  • Type I and II errors
  • Student’s T distribution

Module 5: Regression and Anova

This lesson will help you understand how to establish a relationship between two or more objects. Here you will learn –

  • Linear Regression
  • Logistic Regression
  • R square
  • Scatter Plot and Correlation

Module 6: Exploratory data analysis

In this lesson you will learn –

  • Data visualization
  • Missing value analysis
  • The correction matrix
  • Outlier detection analysis

Module 7: Supervised machine learning

This is a comprehensive module to help you understand how to make machines or computers interpret human language. You will learn –

  • Python Scikit tool
  • Neural networks
  • Support vector machine
  • Decision tree classifier
  • Feature Engineering
  • Model Evaluation
  • Naive Bayes
  • Ensemble methods
  • KNN

Module 8: Unsupervised machine learning

 

  • What is Unsupervised Learning
  • Clustering
  • Hierarchical Clustering
  • K-means Clustering
  • Association Rules
  • Recommendation Engines

 

Module 9: Time Series Analysis

In this lesson, you will learn –

  • Trend and seasonality – Trend is a systematic linear or non-linear component in Time Series metrics, which changes over a while and does not repeat.
    Seasonality is a systematic linear or non-linear component in Time Series metrics, which changes over a while and repeats.
  • Decomposition – This module will teach you how to decompose the time series data into Trend and Seasonality.
  • Smoothing (moving average) – This module will teach you how to use this method for univariate data.
  • SES, Holt & Holt-Winter Model – SES, Holt, and Holt-Winter Models are various Smoothing models, and you will learn everything you need to know about these models in this module.
  • AR, Lag Series, ACF, PACF – In this module, you will learn about AR, Lag Series, ACF, and PACF models used in Time Series.
  • ADF, Random walk and Auto Arima – In this module, you will learn about ADF, Random walk, and Auto Arima techniques used in Time Series.

Module 10: Tableau

Tableau is a sophisticated business intelligence tool used for data visualization. In this lesson, you will learn –

  • Working with Tableau
  • Deep diving with data and connection
  • Creating charts
  • Mapping data in Tableau
  • Dashboards and stories

Module 11: Machine learning on cloud

In this lesson, you will learn –

  • ML on cloud platform
  • ML on AWS
  • ML on Microsoft Azure

 

Assignments for assessment

Projects

Internship