# 3 Months Data Science

3 Months Data Science

• Statistical Foundations
• Exploratory Data Analysis (EDA)
• Probability
• Inferential Statistics
• Regression- Linear Regression
• Regression- Logistic Regression
• Understanding to Machine Learning
• Supervised Learning – Classification I
• Supervised Learning – Classification II
• Model Selection and Boosting
• Unsupervised Learning
• Dimensionality Reduction
• Association Rules Mining & Recommendation Systems
• Time Series Analysis
• Understanding to Deep Learning – Single Layer Perceptron
• Understanding Multilayer Perceptron
• Convolutional Neural Network – I
• Convolutional Neural Network – II
• One Project
• Assignments
• Statistical Foundations
• Exploratory Data Analysis (EDA)
• Probability
• Inferential Statistics
• Regression- Linear Regression
• Regression- Logistic Regression
• Understanding to Machine Learning
• Supervised Learning – Classification I
• Supervised Learning – Classification II
• Model Selection and Boosting
• Unsupervised Learning
• Dimensionality Reduction
• Association Rules Mining & Recommendation Systems
• Time Series Analysis
• Understanding to Deep Learning – Single Layer Perceptron
• Understanding Multilayer Perceptron
• Convolutional Neural Network – I
• Convolutional Neural Network – II
• One Project
• Assignments

Course Outline

Statistical Foundations

In this module, you will learn everything you need to know about all the statistical methods used for decision making in this Data Science course.

• Probability distribution – A statistical function reporting all the probable values that a random variable takes within a specific range is known as a Probability Distribution. This module will teach you about Probability Distributions and various types like Binomial, Poisson, and Normal Distribution in Python.
• Normal distribution – Normal Distribution is the most critical Probability Distribution in Statistics, which describes the distribution of values of a variable.
• Poisson’s distribution – Poisson’s Distribution is a Probability Distribution in Statistics, which determines an event’s occurrence within a specified time interval.
• Bayes’ theorem – Baye’s Theorem is a mathematical formula named after Thomas Bayes, which determines conditional probability. Conditional Probability is the probability of an outcome occurring predicated on the previously occurred outcome.
• Central limit theorem – This module will teach you how to estimate a normal distribution using the Central Limit Theorem (CLT).
• Hypothesis testing – This module will teach you about Hypothesis Testing in Statistics. Hypothesis Testing is a necessary procedure in Applied Statistics for doing experiments based on the observed/surveyed data.
• One Sample T-Test – One-Sample T-Test is a Hypothesis testing method used in Statistics. In this module, you will learn to check whether an unknown population mean is different from a specific value using the One-Sample T-Test procedure.
• Anova and Chi-Square – Analysis of Variance, also known as ANOVA, is a statistical technique used in Data Science, which is used to split observed variance data into various components for additional analysis and tests.
Chi-Square is a Hypothesis testing method used in Statistics, which is used to measure how a model compares to actual observed data.
This module will teach you how to identify the significant differences between the means of two or more groups.

Exploratory Data Analysis (EDA)

This module of 3 months in Data Science courses will teach you all about Exploratory Data Analysis like Pandas, Seaborn, Matplotlib, and Summary Statistics.

• Pandas – Pandas is one of the most widely used Python libraries. Pandas is used to analyze and manipulate data. This module will give you a deep understanding of exploring data sets using Pandas.
• Summary statistics (mean, median, mode, variance, standard deviation) – In this module, you will learn about various statistical formulas and implement them using Python.
• Seaborn – Seaborn is also one of the most widely used Python libraries. Seaborn is a Matplotlib based data visualization library in Python. This module will give you a deep understanding of exploring data sets using Seaborn.
• Matplotlib – Matplotlib is another widely used Python library. Matplotlib is a library to create statically animated, interactive visualizations. This module will give you a deep understanding of exploring data sets using Matplotlib.

Regression- Linear Regression

This module will get us comfortable with all the techniques used in Linear and Logistic Regression.

• Multiple linear regression – Multiple Linear Regression is a supervised machine learning algorithm involving multiple data variables for analysis. It is used for predicting one dependent variable using various independent variables.
This module will drive you through all the concepts of Multiple Linear Regression used in Machine Learning.
• Fitted regression lines – A fitted regression line is a mathematical regression equation on a graph for your data. This model can be used to identify the relationship between a predictor variable (x-scale) and a response variable (y-scale) so that it can assess whether the model fits your data.
• AIC, BIC, Model Fitting, Training and Test Data – In this module, you will go through everything you need to know about several models such as AIC, BIC, Model Fitting, Training, and Test Data.

Regression- Logistic Regression

Introduction to Logistic regression, interpretation, odds ratio – Logistic Regression is one of the most popular ML algorithms, like Linear Regression. It is a simple classification algorithm to predict the categorical dependent variables with the assistance of independent variables.
This module will drive you through all the Logistic Regression concepts used in Machine Learning, interpret Machine Learning models, and find the odds ratio relationship.

• Misclassification, Probability, AUC, R-Square – This module will teach everyone how to work with Misclassification, Probability, AUC, and R-Square.

Supervised Machine Learning

In the next module, you will learn all the Supervised Learning techniques used in Machine Learning.

• CART – CART, also known as Classification And Regression Tree, is a predictive machine learning model that describes the prediction of outcome variable’s values predicated on other values.
You will learn about the usage of this predictive model in this module.
• KNN (classifier, distance metrics, KNN regression) – KNN or k-Nearest Neighbors algorithm is one of the most straightforward machine learning algorithms for solving regression and classification problems.
You will learn about using this algorithm like classification, distance metrics, and KNN regression through this module.
• Decision Trees (hyper parameter, depth, number of leaves) – Decision Tree is a Supervised Machine Learning algorithm used for both classification and regression problems. It is a hierarchical structure where internal nodes indicate the dataset features, branches represent the decision rules, and each leaf node indicates the result.
You will learn about hyperparameter, depth, and the number of leaves in this module.
• Naive Bayes – Naive Bayes Algorithm is used to solve classification problems using Baye’s Theorem. This module will teach you about the theorem and solving the problems using it.

Unsupervised Learning

In the next module, you will learn all the Unsupervised Learning techniques used in Machine Learning.

• Clustering – K-Means & Hierarchical – Clustering is an unsupervised learning technique involving the grouping of data. In this module, you will learn everything you need to know about the method and its types, like K-means clustering and hierarchical clustering.
K-means clustering is a popular unsupervised learning algorithm to resolve the clustering problems in Machine Learning or Data Science.
Hierarchical Clustering is an ML technique or algorithm to build a hierarchy or tree-like structure of clusters. For example, it is used to combine a list of unlabeled datasets into a cluster in the hierarchical structure.
• Distance methods – Euclidean, Manhattan, Cosine, Mahalanobis – This module will teach you how to work with all the distance methods or measures such as Euclidean, Manhattan, Cosine, and Mahalanobis.
• Features of a Cluster – Labels, Centroids, Inertia – This module will drive you through all the features of a Cluster like Labels, Centroids, and Inertia.
• Eigen vectors and Eigen values – In this module, you will learn how to implement Eigenvectors and Eigenvalues in a matrix.
• Principal component analysis – Principal Component Analysis is a technique to reduce the complexity of a model, like eliminating the number of input variables for a predictive model to avoid overfitting.

Ensemble Techniques

In this Machine Learning, we discuss supervised standalone models’ shortcomings and learn a few techniques, such as Ensemble techniques, to overcome these shortcomings.

• Bagging & Boosting – Bagging, also known as Bootstrap Aggregation, is a meta-algorithm in machine learning used for enhancing the stability and accuracy of machine learning algorithms, which are used in statistical classification and regression.
As the name suggests, Boosting is a meta-algorithm in machine learning that converts robust classifiers from several weak classifiers. Boosting can be further classified as Gradient boosting and ADA boosting or Adaptive boosting.
• Random Forest – Random Forest is a popular supervised learning algorithm in machine learning. As the name indicates, it comprises several decision trees on the provided dataset’s several subsets. Then, it calculates the average for enhancing the dataset’s predictive accuracy.

Association Rules Mining & Recommendation Systems

Association rule mining is the data mining process of finding the rules that may govern associations and causal objects between sets of items.

Recommendation engines are a subclass of machine learning which generally deal with ranking or rating products / users. Loosely defined, a recommender system is a system which predicts ratings a user might give to a specific item. These predictions will then be ranked and returned back to the user.

Time Series Analysis

This block will teach you all the techniques involved in Time Series.

• Trend and seasonality – Trend is a systematic linear or non-linear component in Time Series metrics, which changes over a while and does not repeat.
Seasonality is a systematic linear or non-linear component in Time Series metrics, which changes over a while and repeats.
• Decomposition – This module will teach you how to decompose the time series data into Trend and Seasonality.
• Smoothing (moving average) – This module will teach you how to use this method for univariate data.
• SES, Holt & Holt-Winter Model – SES, Holt, and Holt-Winter Models are various Smoothing models, and you will learn everything you need to know about these models in this module.
• AR, Lag Series, ACF, PACF – In this module, you will learn about AR, Lag Series, ACF, and PACF models used in Time Series.
• ADF, Random walk and Auto Arima – In this module, you will learn about ADF, Random walk, and Auto Arima techniques used in Time Series.

Understanding to Deep Learning – Single Layer Perceptron

Artificial neural networks, usually simply called neural networks or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain.

Convolutional Neural Network

A convolutional neural network is a feed-forward neural network that is generally used to analyze visual images by processing data with grid-like topology. It’s also known as a ConvNet. A convolutional neural network is used to detect and classify objects in an image.

• One Project
• Assignments