**Data Science Master Program :-**

** Python | AI | ML | DL | NLP | Spark | Scala | R | Tenser Flow | Tableau | Statistics**

Be a Data Scientist within 6 Months – Flexible Time and Highly Practical Oriented Training

Program is designed for both both Freshers & Experienced

Weekdays Program :- Daily 2 Hours – 6 Months – 320 Hours

Weekend Program: 8 months

Project Covered : 5

POC : 2

Assignments : 300+

Job Oriented Scenarios :- 40 +

Interview Calls :- Unlimited – Minimum 5 and Maximum until you get placed

**What You Get :-**

This program intended for those who love programming and mathematics who wants to get into complexity of learning programming languages. Aspirants who wants to become a data scientist .Suitable for both IT and Non IT candidates .

A Well organized program with Job oriented POC’s , Real Time Projects , Well organised Assignments with solutions . We cover 300+ Job oriented scenarios and solve complex assignments

supported with placement calls until you get placed . Unlimited interview calls until you get placed in the respective fields .

**Experience Our Expertise** :-

Experience our High Values we keep in our training programs . Our trainers are Highly experienced trainers who is having real time experience and will guide until you find a suitable JOB in Data Science and related technologies . Trainers make you work on real time project , POC , Job oriented Scenarios . This make you perfect to be an industry ready person .

**Practicals** :- Setup complete project on your own laptop . Refer it when ever required in future . We create complete Live environment work culture in our whole training .You get training as you feel that , you work on real time problems and you deliver complete lifecycle of project engaged

**Why Radical Technologies :-**

Radical is a Project based IT Training and Certification Center .We trained 5000+ candidates in Data Science and related technologies and Interesting factor is that 50% are from non technical background . Our Job oriented scenarios made them capable to work in the industry and to be JOB Ready Candidate .1200 + professionals placed with 3.5 Lakh to 10 Lakh salary package as Data Scientist and Data Analyst role.

**What we Promise :-**

Minimum 5 Interview calls And Maximum chance is until you find the JOB . This is a Job Guarantee program for the Right candidate with great ambitions .

**Our History :-**

Started 10 Years Back by a group of Technical aspirants , which is now nourished with 5 Branches in Pune and One Branch in Bangalore . We do deal with all major MNC’s in Pune . We deliver more than 250+ courses all over the world in the form of Online + Classroom sessions . Our expansion plan is 50+ Branches all over India by 2025

**Our Trainers :-**

We have a pool of 250+ consultant cum Trainers with experience range from 5 + to 20+ years of experience . This force is ready to help our candidate , moulding them up to the industry Level and make them work from the first day of their career in respective Technology.

**Data Science, Deep Learning, Machine Learning ,Python & R Language With Live Python , Machine Learning & Deep Learning Projects**

**Project 1**Build your own image recognition model with TensorFlow

**Project 2** Predict fraud with data visualization & predictive modelling!

**Project 3** Spam Detection

**Project 4** Build your own Recommendation System

**Project 5** Build your own Python predictive modelling, regression analysis & machine learning Model** **

**Key Benefits of Our programs:-**

I ) Placements :- Each candidate , we will provide minimum 5 Interview calls and there is no maximum limit for the interview calls . Selected candidates can appear for more interview calls depends upon their interest in technology to work.

II) 3rd Year and 4th Year Students, fresher & also candidates who is having career gap can enroll for this particular program depends upon their interested subject . This program help to equip you with current industry knowledge with global certifications. Project based Training and Certification is the key factor deciding the selection in campus interviews

III) Benefits for 3rd Year Degree Students and 2nd year Diploma Students :- You can enroll for this program before the completion of your academic and by the time you complete the degree / diploma , you will be able to achieve the knowledge of at least 3 to 6 year experience in related technology you are selecting .BE/ME/Btech/Bsc/MSC/BCA/MCA/

IV) Benefits for candidates having Career Gap Or those who do not placed in Any Campus Interviews :- Radical Will take care of you end to end , and mould you as experienced professional with knowledge of at least 3 to 6 year experience in related technology you are selecting .BE/ME/Btech/Bsc/MSC/BCA/MCA/

V ) Benefits for Candidates From Non IT Background :- We will suggest you to go for non- programming program like IMS Master programs Or Business Intelligence | BA Master programs . We will make you upto industry standard and will support you until you find a suitable job in the IT Industry .

VI) Our Job Oriented Master Program aim towards those who are interested in Programming Languages and Those who hate Programming

VII ) Each topic we covered with 5 Real Time Projects , Multiple Job Oriented Real Time scenarios , POC’s ,Real Time Case Studies , Each day assignment etc .

VIII) If you miss any classes during the lecture , you have chances to repeat the session at our Aundh / Kharadi / Kothrud / Pimple Saudagar /Hinjewadi/ Bangalore Facilities

IX ) We start each technology from the very basic to the advanced level.

X ) We Give chances to appear for Global Certification Exams through Radical Exam Facility . Our Key Exam vendors are Pearson View , Redhat , Kryterion , PSI ,through which we deliver Global Certifications of Oracle , Microsoft , IBM , Salesforce , Data Science , Hadoop , AWS etc .We train each candidate to appear for Global certification which will help the candidate to differentiate their skills from others.** **

**DATASCIENCE & MACHINE LEARNING WITH PYTHON**

**Course Content**

**Introduction to Data Science with Python**

What is analytics & Data Science?

Common Terms in Analytics

Analytics vs. Data warehousing, OLAP, MIS Reporting

Relevance in industry and need of the hour

Types of problems and business objectives in various industries

How leading companies are harnessing the power of analytics?

Critical success drivers

Overview of analytics tools & their popularity

Analytics Methodology & problem solving framework

List of steps in Analytics projects

Identify the most appropriate solution design for the given problem statement

Project plan for Analytics project & key milestones based on effort estimates

Build Resource plan for analytics project

**Python Essentials**

Why Python for data science?

Overview of Python- Starting with Python

Introduction to installation of Python

Introduction to Python Editors & IDE’s(Canopy, pycharm, Jupyter, Rodeo, Ipython etc…)

Understand Jupyter notebook & Customize Settings

Concept of Packages/Libraries – Important packages(NumPy, SciPy, scikit-learn, Pandas, Matplotlib, etc)

Installing & loading Packages & Name Spaces

Data Types & Data objects/structures (strings, Tuples, Lists, Dictionaries)

List and Dictionary Comprehensions

Variable & Value Labels – Date & Time Values

Basic Operations – Mathematical – string – date

Reading and writing data

Simple plotting

Control flow & conditional statements

Debugging & Code profiling

How to create class and modules and how to call them?

Scientific Distributions Used In Python For Data Science

NumPy, pandas, scikit-learn, stat models, nltk

**Accessing/Importing And Exporting Data Using Python Modules **

Importing Data from various sources (Csv, txt, excel, access etc)

Database Input (Connecting to database)

Viewing Data objects – subsetting Data, methods

Exporting Data to various formats

Important python modules: Pandas, beautiful soup

**Data Manipulation – Cleansing – Munging using python modules**

Cleansing Data with Python

Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting, derived variables, sampling, Data type conversions, renaming, formatting etc)

Data manipulation tools(Operators, Functions, Packages, control structures, Loops, arrays etc)

Python Built-in Functions (Text, numeric, date, utility functions)

Python User Defined Functions

Stripping out extraneous information

Normalizing data

Formatting data

Important Python modules for data manipulation (Pandas, Numpy, re, math, string, datetime etc)

**Data Analysis – Visualization Using Python**

Introduction exploratory data analysis

Descriptive statistics, Frequency Tables and summarization

Univariate Analysis (Distribution of data & Graphical Analysis)

Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)

Creating Graphs- Bar/pie/line chart/histogram/ boxplot/ scatter/ density etc)

Important Packages for Exploratory Analysis(NumPy Arrays, Matplotlib, seaborn, Pandas and SciPy. Stats etc)

**Introduction to Statistics**

Basic Statistics – Measures of Central Tendencies and Variance

Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem

Inferential Statistics -Sampling – Concept of Hypothesis Testing Statistical Methods – Z/t-tests( One sample, independent, paired), Analysis of variance, Correlations and Chi-square

Important modules for statistical methods: NumPy, SciPy, Pandas

**Introduction to Predictive Modelling**

Concept of model in analytics and how it is used?

Common terminology used in analytics & Modelling process

Popular modelling algorithms

Types of Business problems – Mapping of Techniques

Different Phases of Predictive Modelling

**Data Exploration For Modelling**

Need for structured exploratory data

EDA framework for exploring the data and identifying any problems with the data (Data Audit Report)

Identify missing data

Identify outliers data

Visualize the data trends and patterns

**Data Preparation**

Need of Data preparation

Consolidation/Aggregation – Outlier treatment – Flat Liners – Missing values- Dummy creation – Variable Reduction

Variable Reduction Techniques – Factor & PCA Analysis

**Segmentation: Solving Segmentation Problems**

Introduction to Segmentation

Types of Segmentation (Subjective Vs Objective, Heuristic Vs. Statistical)

Heuristic Segmentation Techniques (Value Based, RFM Segmentation and Life Stage Segmentation)

Behavioural Segmentation Techniques (K-Means Cluster Analysis)

Cluster evaluation and profiling – Identify cluster characteristics

Interpretation of results – Implementation on new data

**Linear Regression: Solving Regression Problems**

Introduction – Applications

Assumptions of Linear Regression

Building Linear Regression Model

Understanding standard metrics (Variable significance, R-square/Adjusted R-square, Global hypothesis ,etc)

Assess the overall effectiveness of the model

Validation of Models (Re running Vs. Scoring)

Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers etc.)

Interpretation of Results – Business Validation – Implementation on new data

**Logistic Regression : Solving Classification Problems**

Introduction – Applications

Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models

Building Logistic Regression Model (Binary Logistic Model)

Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test, Gini, KS, Misclassification, ROC Curve etc)

Validation of Logistic Regression Models (Re running Vs. Scoring)

Standard Business Outputs (Decile Analysis, ROC Curve, Probability Cut-offs, Lift charts, Model equation, Drivers or variable importance, etc)

Interpretation of Results – Business Validation – Implementation on new data

**Time Series Forecasting : Solving Forecasting Problems**

Introduction – Applications

Time Series Components( Trend, Seasonality, Cyclicity and Level) and Decomposition

Classification of Techniques(Pattern based – Pattern less)

Basic Techniques – Averages, Smoothening, etc

Advanced Techniques – AR Models, ARIMA, etc

Understanding Forecasting Accuracy – MAPE, MAD, MSE, etc

**Machine Learning : Predictive Modelling**

Introduction to Machine Learning & Predictive Modelling

Types of Business problems – Mapping of Techniques – Regression vs. classification vs. segmentation vs. Forecasting

Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning

Different Phases of Predictive Modelling (Data Pre-processing, Sampling, Model Building, Validation)

Overfitting (Bias-Variance Trade off) & Performance Metrics

Feature engineering & dimension reduction

Concept of optimization & cost function

Overview of gradient descent algorithm

Overview of Cross validation(Bootstrapping, K-Fold validation etc)

Model performance metrics (R-square, Adjusted R-square, RMSE, MAPE, AUC, ROC curve, recall, precision, sensitivity, specificity, confusion metrics )

**Unsupervised Learning : Segmentation**

What is segmentation & Role of ML in Segmentation?

Concept of Distance and related math background

K-Means Clustering

Expectation Maximization

Hierarchical Clustering

Spectral Clustering (DBSCAN)

Principle component Analysis (PCA)

**Supervised Learning :- Decision Trees**

Decision Trees – Introduction – Applications

Types of Decision Tree Algorithms

Construction of Decision Trees through Simplified Examples; Choosing the “Best” attribute at each Non-Leaf node; Entropy; Information Gain, Gini Index, Chi Square, Regression Trees

Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with Numerical Variables; other Measures of Randomness

Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as Rules

Decision Trees – Validation

Overfitting – Best Practices to avoid

**Supervised Learning :- Ensemble Learning**

Concept of Ensembling

Manual Ensembling Vs. Automated Ensembling

Methods of Ensembling (Stacking, Mixture of Experts)

Bagging (Logic, Practical Applications)

Random forest (Logic, Practical Applications)

Boosting (Logic, Practical Applications)

Ada Boost

Gradient Boosting Machines (GBM)

XGBoost

**Supervised Learning :- Artificial Neural Network – ANN**

Motivation for Neural Networks and Its Applications

Perceptron and Single Layer Neural Network, and Hand Calculations

Learning In a Multi Layered Neural Net: Back Propagation and Conjugant Gradient Techniques

Neural Networks for Regression

Neural Networks for Classification

Interpretation of Outputs and Fine tune the models with hyper parameters

Validating ANN models

**Supervised Learning :- Support Vector Machines**

Motivation for Support Vector Machine & Applications

Support Vector Regression

Support vector classifier (Linear & Non-Linear)

Mathematical Intuition (Kernel Methods Revisited, Quadratic Optimization and Soft Constraints)

Interpretation of Outputs and Fine tune the models with hyper parameters

Validating SVM models

**Supervised Learning :-KNN**

What is KNN & Applications?

KNN for missing treatment

KNN For solving regression problems

KNN for solving classification problems

Validating KNN model

Model fine tuning with hyper parameters

**Supervised Learning :- Naive Bayes**

Concept of Conditional Probability

Bayes Theorem and Its Applications

Naïve Bayes for classification

Applications of Naïve Bayes in Classifications

**Text Mining And Analytics**

Taming big text, Unstructured vs. Semi-structured Data; Fundamentals of information retrieval, Properties of words; Creating Term-Document (TxD);Matrices; Similarity measures, Low-level processes (Sentence Splitting; Tokenization; Part-of-Speech Tagging; Stemming; Chunking)

Finding patterns in text: text mining, text as a graph

Natural Language processing (NLP)

Text Analytics – Sentiment Analysis using Python

Text Analytics – Word cloud analysis using Python

Text Analytics – Segmentation using K-Means/Hierarchical Clustering

Text Analytics – Classification (Spam/Not spam)

Applications of Social Media Analytics

Metrics(Measures Actions) in social media analytics

Examples & Actionable Insights using Social Media Analytics

Important python modules for Machine Learning (SciKit Learn, stats models, scipy, nltk etc)

Fine tuning the models using Hyper parameters, grid search, piping etc.

**DATASCIENCE WITH R **

What is analytics & Data Science?

Common Terms in Analytics

Analytics vs. Data warehousing, OLAP, MIS Reporting

Relevance in industry and need of the hour

Types of problems and business objectives in various industries

How leading companies are harnessing the power of analytics?

Critical success drivers

Overview of analytics tools & their popularity

Analytics Methodology & problem solving framework

List of steps in Analytics projects

Identify the most appropriate solution design for the given problem statement

Project plan for Analytics project & key milestones based on effort estimates

Build Resource plan for analytics project

Why R for data science?

**Data Importing / Exporting**

Introduction R/R-Studio – GUI

Concept of Packages – Useful Packages (Base & Other packages)

Data Structure & Data Types (Vectors, Matrices, factors, Data frames, and Lists)

Importing Data from various sources (txt, dlm, excel, sas7bdata, db, etc.)

Database Input (Connecting to database)

Exporting Data to various formats)

Viewing Data (Viewing partial data and full data)

Variable & Value Labels – Date Values

**Data Manipulation**

Data Manipulation steps

Creating New Variables (calculations & Binning)

Dummy variable creation

Applying transformations

Handling duplicates

Handling missings

Sorting and Filtering

Subsetting (Rows/Columns)

Appending (Row appending/column appending)

Merging/Joining (Left, right, inner, full, outer etc)

Data type conversions

Renaming

Formatting

Reshaping data

Sampling

Data manipulation tools

Operators

Functions

Packages

Control Structures (if, if else)

Loops (Conditional, iterative loops, apply functions)

Arrays

R Built-in Functions (Text, Numeric, Date, utility)

Numerical Functions

Text Functions

Date Functions

Utilities Functions

R User Defined Functions

R Packages for data manipulation (base, dplyr, plyr, data.table, reshape, car, sqldf, etc)

**Data Analysis – Visualization**

Introduction exploratory data analysis

Descriptive statistics, Frequency Tables and summarization

Univariate Analysis (Distribution of data & Graphical Analysis)

Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)

Creating Graphs- Bar/pie/line chart/histogram/boxplot/scatter/density etc)

R Packages for Exploratory Data Analysis(dplyr, plyr, gmodes, car, vcd, Hmisc, psych, doby etc)

R Packages for Graphical Analysis (base, ggplot, lattice,etc)

**Introduction To Statistics**

Basic Statistics – Measures of Central Tendencies and Variance

Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem

Inferential Statistics -Sampling – Concept of Hypothesis Testing

Statistical Methods – Z/t-tests( One sample, independent, paired), Anova, Correlations and Chi-square

**Predictive Modelling**

Concept of model in analytics and how it is used?

Common terminology used in analytics & modelling process

Popular modelling algorithms

Types of Business problems – Mapping of Techniques

Different Phases of Predictive Modelling

**Data Exploration For Modeling**

**Data Preparation**

Need of Data preparation

Consolidation/Aggregation – Outlier treatment – Flat Liners – Missing values- Dummy creation – Variable Reduction

Variable Reduction Techniques – Factor & PCA Analysis

**Segmentation: Solving Segmentation Problems**

Introduction to Segmentation

Types of Segmentation (Subjective Vs Objective, Heuristic Vs. Statistical)

Heuristic Segmentation Techniques (Value Based, RFM Segmentation and Life Stage Segmentation)

Behavioral Segmentation Techniques (K-Means Cluster Analysis)

Cluster evaluation and profiling – Identify cluster characteristics

Interpretation of results – Implementation on new data

**Linear Regression: Solving Regression Problems**

Introduction – Applications

Assumptions of Linear Regression

Building Linear Regression Model

Understanding standard metrics (Variable significance, R-square/Adjusted R-square, Global hypothesis ,etc)

Assess the overall effectiveness of the model

Validation of Models (Re running Vs. Scoring)

Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers etc.)

Interpretation of Results – Business Validation – Implementation on new data

**Logistic Regression: Solving Classification Problems**

Introduction – Applications

Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models

Building Logistic Regression Model (Binary Logistic Model)

Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test, Gini, KS, Misclassification, ROC Curve etc)

Validation of Logistic Regression Models (Re running Vs. Scoring)

Standard Business Outputs (Decile Analysis, ROC Curve, Probability Cut-offs, Lift charts, Model equation, Drivers or variable importance, etc)

Interpretation of Results – Business Validation – Implementation on new data

**Time Series Forecasting: Solving Forecasting Problems**

Introduction – Applications

Time Series Components( Trend, Seasonality, Cyclicity and Level) and Decomposition

Classification of Techniques(Pattern based – Pattern less)

Basic Techniques – Averages, Smoothening, etc

Advanced Techniques – AR Models, ARIMA, etc

Understanding Forecasting Accuracy – MAPE, MAD, MSE, etc

**Machine Learning -Predictive Modeling – Basics**

Introduction to Machine Learning & Predictive Modeling

Types of Business problems – Mapping of Techniques – Regression vs. classification vs. segmentation vs. Forecasting

Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning

Different Phases of Predictive Modeling (Data Pre-processing, Sampling, Model Building, Validation)

Overfitting (Bias-Variance Trade off) & Performance Metrics

Feature engineering & dimension reduction

Concept of optimization & cost function

Overview of gradient descent algorithm

Overview of Cross validation(Bootstrapping, K-Fold validation etc)

Model performance metrics (R-square, Adjusted R-squre, RMSE, MAPE, AUC, ROC curve, recall, precision, sensitivity, specificity, confusion metrics )

**Unsupervised Learning: Segmentation**

What is segmentation & Role of ML in Segmentation?

Concept of Distance and related math background

K-Means Clustering

Expectation Maximization

Hierarchical Clustering

Spectral Clustering (DBSCAN)

Principle component Analysis (PCA)

**Supervised Learning: Decision Trees**

Decision Trees – Introduction – Applications

Types of Decision Tree Algorithms

Construction of Decision Trees through Simplified Examples; Choosing the “Best” attribute at each Non-Leaf node; Entropy; Information Gain, Gini Index, Chi Square, Regression Trees

Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with Numerical Variables; other Measures of Randomness

Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as Rules

Decision Trees – Validation

Overfitting – Best Practices to avoid

**Supervised Learning: Ensemble Learning**

Concept of Ensembling

Manual Ensembling Vs. Automated Ensembling

Methods of Ensembling (Stacking, Mixture of Experts)

Bagging (Logic, Practical Applications)

Random forest (Logic, Practical Applications)

Boosting (Logic, Practical Applications)

Ada Boost

Gradient Boosting Machines (GBM)

XGBoost

**Supervised Learning: Artificial Neural Networks (ANN)**

Motivation for Neural Networks and Its Applications

Perceptron and Single Layer Neural Network, and Hand Calculations

Learning In a Multi Layered Neural Net: Back Propagation and Conjugant Gradient Techniques

Neural Networks for Regression

Neural Networks for Classification

Interpretation of Outputs and Fine tune the models with hyper parameters

Validating ANN models

**Supervised Learning: Support Vector Machines**

Motivation for Support Vector Machine & Applications

Support Vector Regression

Support vector classifier (Linear & Non-Linear)

Mathematical Intuition (Kernel Methods Revisited, Quadratic Optimization and Soft Constraints)

Interpretation of Outputs and Fine tune the models with hyper parameters

Validating SVM models

**Supervised Learning: KNN**

What is KNN & Applications?

KNN for missing treatment

KNN For solving regression problems

KNN for solving classification problems

Validating KNN model

Model fine tuning with hyper parameters

**Supervised Learning: Naïve Bayes**

Concept of Conditional Probability

Bayes Theorem and Its Applications

Naïve Bayes for classification

Applications of Naïve Bayes in Classifications

**Text Mining & Analytics**

Taming big text, Unstructured vs. Semi-structured Data; Fundamentals of information retrieval, Properties of words; Creating Term-Document (TxD);Matrices; Similarity measures, Low-level processes (Sentence Splitting; Tokenization; Part-of-Speech Tagging; Stemming; Chunking)

Finding patterns in text: text mining, text as a graph

Natural Language processing (NLP)

Text Analytics – Sentiment Analysis using R

Text Analytics – Word cloud analysis using R

Text Analytics – Segmentation using K-Means/Hierarchical Clustering

Text Analytics – Classification (Spam/Not spam)

Applications of Social Media Analytics

Metrics(Measures Actions) in social media analytics

Examples & Actionable Insights using Social Media Analytics

Important R packages for Machine Learning (caret, H2O, Randomforest, nnet, tm etc)

Fine tuning the models using Hyper parameters, grid search, piping etc.

**Project**

Case Studies

**ADVANCED BIG DATASCIENCE **

**Introduction To Data Science**

What is Data Science?

Why Python for data science?

Relevance in industry and need of the hour

How leading companies are harnessing the power of Data Science with Python?

Different phases of a typical Analytics/Data Science projects and role of python

Anaconda vs. Python

**Python Essentials (Core)**

Overview of Python- Starting with Python

Introduction to installation of Python

Introduction to Python Editors & IDE’s(Canopy, pycharm, Jupyter, Rodeo, Ipython etc…)

Understand Jupyter notebook & Customize Settings

Concept of Packages/Libraries – Important packages(NumPy, SciPy, scikit-learn, Pandas, Matplotlib, etc)

Installing & loading Packages & Name Spaces

Data Types & Data objects/structures (strings, Tuples, Lists, Dictionaries)

List and Dictionary Comprehensions

Variable & Value Labels – Date & Time Values

Basic Operations – Mathematical – string – date

Reading and writing data

Simple plotting

Control flow & conditional statements

Debugging & Code profiling

How to create class and modules and how to call them?

Scientific distributions used in python for Data Science – Numpy, scify, pandas, scikitlearn, statmodels, nltk etc

**Accessing/Importing And Exporting Data Using Python Modules**

Importing Data from various sources (Csv, txt, excel, access etc)

Database Input (Connecting to database)

Viewing Data objects – subsetting, methods

Exporting Data to various formats

Important python modules: Pandas, beautifulsoup

**Data Manipulation – Cleansing – Munging Using Python Modules**

Cleansing Data with Python

Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting, derived variables, sampling, Data type conversions, renaming, formatting etc)

Data manipulation tools(Operators, Functions, Packages, control structures, Loops, arrays etc)

Python Built-in Functions (Text, numeric, date, utility functions)

Python User Defined Functions

Stripping out extraneous information

Normalizing data

Formatting data

Important Python modules for data manipulation (Pandas, Numpy, re, math, string, datetime etc)

**Data Analysis – Visualization Using Python**

Introduction exploratory data analysis

Descriptive statistics, Frequency Tables and summarization

Univariate Analysis (Distribution of data & Graphical Analysis)

Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)

Creating Graphs- Bar/pie/line chart/histogram/ boxplot/ scatter/ density etc)

Important Packages for Exploratory Analysis(NumPy Arrays, Matplotlib, seaborn, Pandas and scipy.stats etc)

**Basic Statistics & Implementation Of Stats Methods In Python**

Basic Statistics – Measures of Central Tendencies and Variance

Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem

Inferential Statistics -Sampling – Concept of Hypothesis Testing

Statistical Methods – Z/t-tests (One sample, independent, paired), Anova, Correlation and Chi-square

Important modules for statistical methods: Numpy, Scipy, Pandas

**Python: Machine Learning -Predictive Modeling – Basics**

Introduction to Machine Learning & Predictive Modeling

Types of Business problems – Mapping of Techniques – Regression vs. classification vs. segmentation vs. Forecasting

Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning

Different Phases of Predictive Modeling (Data Pre-processing, Sampling, Model Building, Validation)

Overfitting (Bias-Variance Trade off) & Performance Metrics

Feature engineering & dimension reduction

Concept of optimization & cost function

Concept of gradient descent algorithm

Concept of Cross validation(Bootstrapping, K-Fold validation etc)

Model performance metrics (R-square, RMSE, MAPE, AUC, ROC curve, recall, precision, sensitivity, specificity, confusion metrics)

**Machine Learning Algorithms & Applications – Implementation In Python**

Linear & Logistic Regression

Segmentation – Cluster Analysis (K-Means)

Decision Trees (CART/CD 5.0)

Ensemble Learning (Random Forest, Bagging & boosting)

Artificial Neural Networks(ANN)

Support Vector Machines(SVM)

Other Techniques (KNN, Naïve Bayes, PCA)

Introduction to Text Mining using NLTK

Introduction to Time Series Forecasting (Decomposition & ARIMA)

Important python modules for Machine Learning (SciKit Learn, stats models, scipy, nltk etc)

Fine tuning the models using Hyper parameters, grid search, piping etc.

Project – Consolidate Learnings

Applying different algorithms to solve the business problems and bench mark the results

**Introduction To Big Data**

Introduction and Relevance

Uses of Big Data analytics in various industries like Telecom, E- commerce, Finance and Insurance etc.

Problems with Traditional Large-Scale Systems

** Hadoop(Big Data) Eco-System**

Motivation for Hadoop

Different types of projects by Apache

Role of projects in the Hadoop Ecosystem

Key technology foundations required for Big Data

Limitations and Solutions of existing Data Analytics Architecture

Comparison of traditional data management systems with Big Data management systems

Evaluate key framework requirements for Big Data analytics

Hadoop Ecosystem & Hadoop 2.x core components

Explain the relevance of real-time data

Explain how to use Big Data and real-time data as a Business planning tool

**Hadoop Cluster-Architecture-Configuration Files**

Hadoop Master-Slave Architecture

The Hadoop Distributed File System – Concept of data storage

Explain different types of cluster setups(Fully distributed/Pseudo etc)

Hadoop cluster set up – Installation

Hadoop 2.x Cluster Architecture

A Typical enterprise cluster – Hadoop Cluster Modes

Understanding cluster management tools like Cloudera manager/Apache ambari

**Hadoop-HDFS & MapReduce (YARN)**

HDFS Overview & Data storage in HDFS

Get the data into Hadoop from local machine(Data Loading Techniques) – vice versa

Map Reduce Overview (Traditional way Vs. MapReduce way)

Concept of Mapper & Reducer

Understanding MapReduce program Framework

Develop MapReduce Program using Java (Basic)

Develop MapReduce program with streaming API) (Basic)

**Data Integration Using Sqoop & Flume**

Integrating Hadoop into an Existing Enterprise

Loading Data from an RDBMS into HDFS by Using Sqoop

Managing Real-Time Data Using Flume

Accessing HDFS from Legacy Systems

**Data Analysis Using Pig**

Introduction to Data Analysis Tools

Apache PIG – MapReduce Vs Pig, Pig Use Cases

PIG’s Data Model

PIG Streaming

Pig Latin Program & Execution

Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators, Pig UDF

Writing JAVA UDF’s

Embedded PIG in JAVA

PIG Macros

Parameter Substitution

Use Pig to automate the design and implementation of MapReduce applications

Use Pig to apply structure to unstructured Big Data

**Data Analysis Using Hive**

Apache Hive – Hive Vs. PIG – Hive Use Cases

Discuss the Hive data storage principle

Explain the File formats and Records formats supported by the Hive environment

Perform operations with data in Hive

Hive QL: Joining Tables, Dynamic Partitioning, Custom Map/Reduce Scripts

Hive Script, Hive UDF

Hive Persistence formats

Loading data in Hive – Methods

Serialization & Deserialization

Handling Text data using Hive

Integrating external BI tools with Hadoop Hive

**Data Analysis Using Impala**

Impala & Architecture

How Impala executes Queries and its importance

Hive vs. PIG vs. Impala

Extending Impala with User Defined functions

**Introduction To Other Ecosystem Tools**

NoSQL database – Hbase

Introduction Oozie

**Spark: Introduction**

Introduction to Apache Spark

Streaming Data Vs. In Memory Data

Map Reduce Vs. Spark

Modes of Spark

Spark Installation Demo

Overview of Spark on a cluster

Spark Standalone Cluster

**Spark: Spark In Practice**

Invoking Spark Shell

Creating the Spark Context

Loading a File in Shell

Performing Some Basic Operations on Files in Spark Shell

Caching Overview

Distributed Persistence

Spark Streaming Overview(Example: Streaming Word Count)

**Spark: Spark Meets Hive**

Analyze Hive and Spark SQL Architecture

Analyze Spark SQL

Context in Spark SQL

Implement a sample example for Spark SQL

Integrating hive and Spark SQL

Support for JSON and Parquet File Formats Implement Data Visualization in Spark

Loading of Data

Hive Queries through Spark

Performance Tuning Tips in Spark

Shared Variables: Broadcast Variables & Accumulators

**Spark Streaming**

Extract and analyze the data from twitter using Spark streaming

Comparison of Spark and Storm – Overview

**Spark GraphX**

Overview of GraphX module in spark

Creating graphs with GraphX

**Introduction To Machine Learning Using Spark**

Understand Machine learning framework

Implement some of the ML algorithms using Spark MLLib

**Project**

Consolidate all the learnings

Working on Big Data Project by integrating various key components

**TABLEAU**

**Course Topics:**

**Introduction and Getting Started**

Why Tableau? Why Visualization?

The Tableau Product Line

Level Setting – Terminology

Getting Started – creating some powerful visualizations quickly

Review of some Key Fundamental Concepts

**Filtering, Sorting & Grouping–** Filtering, Sorting and Grouping are fundamental concepts

when working with and analyzing data. We will briefly review these topics as they apply to Tableau

Advanced options for filtering and hiding

Understanding your many options for ordering and grouping your data: Sort, Groups, Bins, Sets

Understanding how all of these options inter-relate

**Working with Data–** In the Advanced class, we will understand the difference between joining and blending data, and when we should do each. We will also consider the implications of working with large data sets, and consider options for when and how to work with extracts and the data engine. We will also investigate best practices in “sharing” data sources for Tableau Server users.

Data Types and Roles

Dimension versus Measures

Data Types

Discrete versus Continuous

The meaning of pill colors

Database Joins

Data Blending

Working with the Data Engine / Extracts and scheduling extract updates

Working with Custom SQL

Adding to Context

Switching to Direct Connection

**Working with Calculated Data and Statistics–** In the Fundamentals Class, we were introduced to some basic calculations: basic string and arithmetic calculations and ratios and quick table calculations. In the Advanced class, we will extend those concepts to understand the intricacies of manipulating data within Tableau

**A Quick Review of Basic Calculations**

o Arithmetic Calculations

o String Manipulation

o Date Calculations

o Quick Table Calculations

o Custom Aggregations

o Custom Calculated Fields

o Logic and Conditional Calculations

o Conditional Filters

**Advanced Table Calculations**

o Understanding Scope and Direction

o Calculate on Results of Table Calculations

o Complex Calculations

o Difference From Average

o Discrete Aggregations

o Index to Ratios

**Working with Parameters–** In the Fundamentals class, we were introduced to parameters – How to create a parameter and use it in a calculation. In the Advanced class, we will go into more details on how we can use parameters to modify our title, create What-If analysis, etc

**Parameter Basics**

o Data types of parameters

o Using parameters in calculated fields

o Inputting parameter values and parameter control options

**Advanced Usage of Parameters**

o Using parameters for titles, field selections, logic statements, Top X** **

**Building Advanced Chart Types and Visualizations / Tips & Tricks–** This topic covers how to create some of the chart types and visualizations that may be less obvious in Tableau. It also covers some of the more common tips & tricks / techniques that we use to assist customers in solving some of their more complex problems.

Bar in Bar

Box Plot

Bullet Chart

Custom Shapes

Gantt Chart

Heat Map

Pareto Chart

Spark Line

KPI Chart

**Best Practices in Formatting and Visualizing**

Formatting Tips

o Drag to Legend

o Edit Legend

o Highlighting

o Labeling

o Legends

o Working with Nulls

o Table Options

o Annotations and Display Options

Introduction to Visualization Best Practices