Data Science Training & Certification in Pune

DataScience Online Training with Certification Preparation

Learn

Data Science
Deep Learning
Machine Learning with Python/R/SAS
Live Machine Learning Projects
Deep Learning Projects

Duration of Data Science Training : 80 hrs

Batch type : weekdays /weekends

Mode of Data Science Training: Classroom / Online / Corporate Training

Why Radical Technologies

100% Placement Guarantee for the Right Candidate

10+ Years Real Time Experienced Trainers

Learn from Industry Experts, Hands-on labs

Flexible Options: online, instructor-led, self-paced

14+ Years of Industry Recognitions

1 Lakh+ Students Trained

50,000+ Students Placed

Guaranteed 5+ Interview Calls

Top MNCs - Associated with 800+ Recruiters

Free Internship Project & Certification

Monthly Job Fair - Virtual as well as Physica

5000+ Reviews & Ratings

Data Science Training , Real Time Projects , Assignments , scenarios are part of this course

Preparing you to become a Certified Data Scientist & Complete Placement Support for getting the job.

Data Sets , Installations , Interview Preparations , Repeat the session until 6 months are all attractions of this particular course

Trainer :- Experienced DataScience Consultant

Want to be Future Data Scientist

Data Science Training Introduction: This course does not require a prior quantitative or mathematics background. It starts by introducing basic concepts such as the mean, median mode etc. and eventually covers all aspects of an analytics (or) data science career from analyzing and preparing raw data to visualizing your findings. If you’re a programmer or a fresh graduate looking to switch into an exciting new career track, or a data analyst looking to make the transition into the tech industry – this course will teach you the basic to Advance techniques used by real-world industry data scientists.

Data Science, Statistics with Python / R / SAS : This course is an introduction to Data Science and Statistics using the R programming language OR Python OR SAS. It covers both the theoretical aspects of Statistical concepts and the practical implementation using R / Python/ SaS. If you’re new to Python, don’t worry – the course starts with a crash course. If you’ve done some programming before or you are new in Programming, you should pick it up quickly. This course shows you how to get set up on Microsoft Windows-based PC’s; the sample code will also run on MacOS or Linux desktop systems.

Data Science Analytics: Using Spark and Scala you can analyze and explore your data in an interactive environment with fast feedback. The course will show how to leverage the power of RDDs and Data frames to manipulate data with ease.

Machine Learning and Data Science : Spark’s core functionality and built-in libraries make it easy to implement complex algorithms like Recommendations with very few lines of code. We’ll cover a variety of datasets and algorithms including PageRank, MapReduce and Graph datasets.

Data Science Real life examples: Every concept is explained with the help of examples, case studies and source code in R wherever necessary. The examples cover a wide array of topics and range from A/B testing in an Internet company context to the Capital Asset Pricing Model in a quant finance context.

Data Science Target audience?

Engineering/Management Graduate or Post-graduate Fresher Students who want to make their career in Data Science Industry or want to be future Data Scientist.
Engineers who want to use a distributed computing engine for batch or stream processing or both
Analysts who want to leverage Spark for analyzing interesting datasets
Data Scientists who want a single engine for analyzing and modelling data as well as productionizing it.
MBA Graduates or business professionals who are looking to move to a heavily quantitative role.
Engineering Graduate/Professionals who want to understand basic statistics and lay a foundation for a career in Data Science
Working Professional or Fresh Graduate who have mostly worked in Descriptive analytics or not work anywhere and want to make the shift to being data scientists
Professionals who’ve worked mostly with tools like Excel and want to learn how to use R for statistical analysis.

DATASCIENCE & MACHINE LEARNING WITH PYTHON

Data Science Course Content

Introduction to Data Science with Python

What is analytics & Data Science?
Common Terms in Analytics
Analytics vs. Data warehousing, OLAP, MIS Reporting
Relevance in industry and need of the hour
Types of problems and business objectives in various industries
How leading companies are harnessing the power of analytics?
Critical success drivers
Overview of analytics tools & their popularity
Analytics Methodology & problem solving framework
List of steps in Analytics projects
Identify the most appropriate solution design for the given problem statement
Project plan for Analytics project & key milestones based on effort estimates
Build Resource plan for analytics project

Python Essentials

Why Python for data science?
Overview of Python- Starting with Python
Introduction to installation of Python
Introduction to Python Editors & IDE’s(Canopy, pycharm, Jupyter, Rodeo, Ipython etc…)
Understand Jupyter notebook & Customize Settings
Concept of Packages/Libraries – Important packages(NumPy, SciPy, scikit-learn, Pandas, Matplotlib, etc)
Installing & loading Packages & Name Spaces
Data Types & Data objects/structures (strings, Tuples, Lists, Dictionaries)
List and Dictionary Comprehensions
Variable & Value Labels – Date & Time Values
Basic Operations – Mathematical – string – date
Reading and writing data
Simple plotting
Control flow & conditional statements
Debugging & Code profiling
How to create class and modules and how to call them?

Scientific Distributions Used In Python For Data Science

NumPy, pandas, scikit-learn, stat models, nltk

Accessing/Importing And Exporting Data Using Python Modules

Importing Data from various sources (Csv, txt, excel, access etc)
Database Input (Connecting to database)
Viewing Data objects – subsetting Data, methods
Exporting Data to various formats
Important python modules: Pandas, beautiful soup

Data Manipulation – Cleansing – Munging using python modules

Cleansing Data with Python
Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting, derived variables, sampling, Data type conversions, renaming, formatting etc)
Data manipulation tools(Operators, Functions, Packages, control structures, Loops, arrays etc)
Python Built-in Functions (Text, numeric, date, utility functions)
Python User Defined Functions
Stripping out extraneous information
Normalizing data
Formatting data
Important Python modules for data manipulation (Pandas, Numpy, re, math, string, datetime etc)

Data Analysis – Visualization Using Python

Introduction exploratory data analysis
Descriptive statistics, Frequency Tables and summarization
Univariate Analysis (Distribution of data & Graphical Analysis)
Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)
Creating Graphs- Bar/pie/line chart/histogram/ boxplot/ scatter/ density etc)
Important Packages for Exploratory Analysis(NumPy Arrays, Matplotlib, seaborn, Pandas and SciPy. Stats etc)

Introduction to Statistics

Basic Statistics – Measures of Central Tendencies and Variance
Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem
Inferential Statistics -Sampling – Concept of Hypothesis Testing Statistical Methods – Z/t-tests( One sample, independent, paired), Analysis of variance, Correlations and Chi-square
Important modules for statistical methods: NumPy, SciPy, Pandas

Introduction to Predictive Modelling

Concept of model in analytics and how it is used?
Common terminology used in analytics & Modelling process
Popular modelling algorithms
Types of Business problems – Mapping of Techniques
Different Phases of Predictive Modelling

Data Exploration For Modelling

Need for structured exploratory data
EDA framework for exploring the data and identifying any problems with the data (Data Audit Report)
Identify missing data
Identify outliers data
Visualize the data trends and patterns

Data Preparation

Need of Data preparation
Consolidation/Aggregation – Outlier treatment – Flat Liners – Missing values- Dummy creation – Variable Reduction
Variable Reduction Techniques – Factor & PCA Analysis

Segmentation: Solving Segmentation Problems

Introduction to Segmentation
Types of Segmentation (Subjective Vs Objective, Heuristic Vs. Statistical)
Heuristic Segmentation Techniques (Value Based, RFM Segmentation and Life Stage Segmentation)
Behavioural Segmentation Techniques (K-Means Cluster Analysis)
Cluster evaluation and profiling – Identify cluster characteristics
Interpretation of results – Implementation on new data

Linear Regression: Solving Regression Problems

Introduction – Applications
Assumptions of Linear Regression
Building Linear Regression Model
Understanding standard metrics (Variable significance, R-square/Adjusted R-square, Global hypothesis ,etc)
Assess the overall effectiveness of the model
Validation of Models (Re running Vs. Scoring)
Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers etc.)
Interpretation of Results – Business Validation – Implementation on new data

Logistic Regression : Solving Classification Problems

Introduction – Applications
Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models
Building Logistic Regression Model (Binary Logistic Model)
Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test, Gini, KS, Misclassification, ROC Curve etc)
Validation of Logistic Regression Models (Re running Vs. Scoring)
Standard Business Outputs (Decile Analysis, ROC Curve, Probability Cut-offs, Lift charts, Model equation, Drivers or variable importance, etc)
Interpretation of Results – Business Validation – Implementation on new data

Time Series Forecasting : Solving Forecasting Problems

Introduction – Applications
Time Series Components( Trend, Seasonality, Cyclicity and Level) and Decomposition
Classification of Techniques(Pattern based – Pattern less)
Basic Techniques – Averages, Smoothening, etc
Advanced Techniques – AR Models, ARIMA, etc
Understanding Forecasting Accuracy – MAPE, MAD, MSE, etc

Machine Learning : Predictive Modelling

Introduction to Machine Learning & Predictive Modelling
Types of Business problems – Mapping of Techniques – Regression vs. classification vs. segmentation vs. Forecasting
Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning
Different Phases of Predictive Modelling (Data Pre-processing, Sampling, Model Building, Validation)
Overfitting (Bias-Variance Trade off) & Performance Metrics
Feature engineering & dimension reduction
Concept of optimization & cost function
Overview of gradient descent algorithm
Overview of Cross validation(Bootstrapping, K-Fold validation etc)
Model performance metrics (R-square, Adjusted R-square, RMSE, MAPE, AUC, ROC curve, recall, precision, sensitivity, specificity, confusion metrics )

Data Science Unsupervised Learning : Segmentation

What is segmentation & Role of ML in Segmentation?
Concept of Distance and related math background
K-Means Clustering
Expectation Maximization
Hierarchical Clustering
Spectral Clustering (DBSCAN)
Principle component Analysis (PCA)

Data Science Supervised Learning :- Decision Trees

Decision Trees – Introduction – Applications
Types of Decision Tree Algorithms
Construction of Decision Trees through Simplified Examples; Choosing the “Best” attribute at each Non-Leaf node; Entropy; Information Gain, Gini Index, Chi Square, Regression Trees
Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with Numerical Variables; other Measures of Randomness
Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as Rules
Decision Trees – Validation
Overfitting – Best Practices to avoid

Supervised Learning :- Ensemble Learning

Concept of Ensembling
Manual Ensembling Vs. Automated Ensembling
Methods of Ensembling (Stacking, Mixture of Experts)
Bagging (Logic, Practical Applications)
Random forest (Logic, Practical Applications)
Boosting (Logic, Practical Applications)
Ada Boost
Gradient Boosting Machines (GBM)
XGBoost

Supervised Learning :- Artificial Neural Network – ANN

Motivation for Neural Networks and Its Applications
Perceptron and Single Layer Neural Network, and Hand Calculations
Learning In a Multi Layered Neural Net: Back Propagation and Conjugant Gradient Techniques
Neural Networks for Regression
Neural Networks for Classification
Interpretation of Outputs and Fine tune the models with hyper parameters
Validating ANN models

Supervised Learning :- Support Vector Machines

Motivation for Support Vector Machine & Applications
Support Vector Regression
Support vector classifier (Linear & Non-Linear)
Mathematical Intuition (Kernel Methods Revisited, Quadratic Optimization and Soft Constraints)
Interpretation of Outputs and Fine tune the models with hyper parameters
Validating SVM models

Supervised Learning :-KNN

What is KNN & Applications?
KNN for missing treatment
KNN For solving regression problems
KNN for solving classification problems
Validating KNN model
Model fine tuning with hyper parameters

Supervised Learning :- Naive Bayes

Concept of Conditional Probability
Bayes Theorem and Its Applications
Naïve Bayes for classification
Applications of Naïve Bayes in Classifications

Text Mining And Analytics

Taming big text, Unstructured vs. Semi-structured Data; Fundamentals of information retrieval, Properties of words; Creating Term-Document (TxD);Matrices; Similarity measures, Low-level processes (Sentence Splitting; Tokenization; Part-of-Speech Tagging; Stemming; Chunking)
Finding patterns in text: text mining, text as a graph
Natural Language processing (NLP)
Text Analytics – Sentiment Analysis using Python
Text Analytics – Word cloud analysis using Python
Text Analytics – Segmentation using K-Means/Hierarchical Clustering
Text Analytics – Classification (Spam/Not spam)
Applications of Social Media Analytics
Metrics(Measures Actions) in social media analytics
Examples & Actionable Insights using Social Media Analytics
Important python modules for Machine Learning (SciKit Learn, stats models, scipy, nltk etc)
Fine tuning the models using Hyper parameters, grid search, piping etc.

DATASCIENCE WITH R COURSE CONTENT

What is analytics & Data Science?
Common Terms in Analytics
Analytics vs. Data warehousing, OLAP, MIS Reporting
Relevance in industry and need of the hour
Types of problems and business objectives in various industries
How leading companies are harnessing the power of analytics?
Critical success drivers
Overview of analytics tools & their popularity
Analytics Methodology & problem solving framework
List of steps in Analytics projects
Identify the most appropriate solution design for the given problem statement
Project plan for Analytics project & key milestones based on effort estimates
Build Resource plan for analytics project
Why R for data science?

Data Importing / Exporting

Introduction R/R-Studio – GUI
Concept of Packages – Useful Packages (Base & Other packages)
Data Structure & Data Types (Vectors, Matrices, factors, Data frames, and Lists)
Importing Data from various sources (txt, dlm, excel, sas7bdata, db, etc.)
Database Input (Connecting to database)
Exporting Data to various formats)
Viewing Data (Viewing partial data and full data)
Variable & Value Labels – Date Values

Data Manipulation

Data Manipulation steps
Creating New Variables (calculations & Binning)
Dummy variable creation
Applying transformations
Handling duplicates
Handling missings
Sorting and Filtering
Subsetting (Rows/Columns)
Appending (Row appending/column appending)
Merging/Joining (Left, right, inner, full, outer etc)
Data type conversions
Renaming
Formatting
Reshaping data
Sampling
Data manipulation tools
Operators
Functions
Packages
Control Structures (if, if else)
Loops (Conditional, iterative loops, apply functions)
Arrays
R Built-in Functions (Text, Numeric, Date, utility)
Numerical Functions
Text Functions
Date Functions
Utilities Functions
R User Defined Functions
R Packages for data manipulation (base, dplyr, plyr, data.table, reshape, car, sqldf, etc)

Data Analysis – Visualization

ntroduction exploratory data analysis
Descriptive statistics, Frequency Tables and summarization
Univariate Analysis (Distribution of data & Graphical Analysis)
Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)
Creating Graphs- Bar/pie/line chart/histogram/boxplot/scatter/density etc)
R Packages for Exploratory Data Analysis(dplyr, plyr, gmodes, car, vcd, Hmisc, psych, doby etc)
R Packages for Graphical Analysis (base, ggplot, lattice,etc)

Introduction To Statistics

Basic Statistics – Measures of Central Tendencies and Variance
Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem
Inferential Statistics -Sampling – Concept of Hypothesis Testing
Statistical Methods – Z/t-tests( One sample, independent, paired), Anova, Correlations and Chi-square

Predictive Modelling

Concept of model in analytics and how it is used?
Common terminology used in analytics & modelling process
Popular modelling algorithms
Types of Business problems – Mapping of Techniques
Different Phases of Predictive Modelling

Data Exploration For Modeling

Data Preparation

Need of Data preparation
Consolidation/Aggregation – Outlier treatment – Flat Liners – Missing values- Dummy creation – Variable Reduction
Variable Reduction Techniques – Factor & PCA Analysis

Segmentation: Solving Segmentation Problems

Introduction to Segmentation
Types of Segmentation (Subjective Vs Objective, Heuristic Vs. Statistical)
Heuristic Segmentation Techniques (Value Based, RFM Segmentation and Life Stage Segmentation)
Behavioral Segmentation Techniques (K-Means Cluster Analysis)
Cluster evaluation and profiling – Identify cluster characteristics
Interpretation of results – Implementation on new data

Linear Regression: Solving Regression Problems

Introduction – Applications
Assumptions of Linear Regression
Building Linear Regression Model
Understanding standard metrics (Variable significance, R-square/Adjusted R-square, Global hypothesis ,etc)
Assess the overall effectiveness of the model
Validation of Models (Re running Vs. Scoring)
Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers etc.)
Interpretation of Results – Business Validation – Implementation on new data

Logistic Regression: Solving Classification Problems

Introduction – Applications
Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models
Building Logistic Regression Model (Binary Logistic Model)
Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test, Gini, KS, Misclassification, ROC Curve etc)
Validation of Logistic Regression Models (Re running Vs. Scoring)
Standard Business Outputs (Decile Analysis, ROC Curve, Probability Cut-offs, Lift charts, Model equation, Drivers or variable importance, etc)
Interpretation of Results – Business Validation – Implementation on new data

Time Series Forecasting: Solving Forecasting Problems

Introduction – Applications
Time Series Components( Trend, Seasonality, Cyclicity and Level) and Decomposition
Classification of Techniques(Pattern based – Pattern less)
Basic Techniques – Averages, Smoothening, etc
Advanced Techniques – AR Models, ARIMA, etc
Understanding Forecasting Accuracy – MAPE, MAD, MSE, etc

Machine Learning -Predictive Modeling – Basics

Introduction to Machine Learning & Predictive Modeling
Types of Business problems – Mapping of Techniques – Regression vs. classification vs. segmentation vs. Forecasting
Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning
Different Phases of Predictive Modeling (Data Pre-processing, Sampling, Model Building, Validation)
Overfitting (Bias-Variance Trade off) & Performance Metrics
Feature engineering & dimension reduction
Concept of optimization & cost function
Overview of gradient descent algorithm
Overview of Cross validation(Bootstrapping, K-Fold validation etc)
Model performance metrics (R-square, Adjusted R-squre, RMSE, MAPE, AUC, ROC curve, recall, precision, sensitivity, specificity, confusion metrics )

Unsupervised Learning: Segmentation

What is segmentation & Role of ML in Segmentation?
Concept of Distance and related math background
K-Means Clustering
Expectation Maximization
Hierarchical Clustering
Spectral Clustering (DBSCAN)
Principle component Analysis (PCA)

Supervised Learning: Decision Trees

Decision Trees – Introduction – Applications
Types of Decision Tree Algorithms
Construction of Decision Trees through Simplified Examples; Choosing the “Best” attribute at each Non-Leaf node; Entropy; Information Gain, Gini Index, Chi Square, Regression Trees
Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with Numerical Variables; other Measures of Randomness
Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as Rules
Decision Trees – Validation
Overfitting – Best Practices to avoid

Supervised Learning: Ensemble Learning

Concept of Ensembling
Manual Ensembling Vs. Automated Ensembling
Methods of Ensembling (Stacking, Mixture of Experts)
Bagging (Logic, Practical Applications)
Random forest (Logic, Practical Applications)
Boosting (Logic, Practical Applications)
Ada Boost
Gradient Boosting Machines (GBM)
XGBoost

Supervised Learning: Artificial Neural Networks (ANN)

Motivation for Neural Networks and Its Applications
Perceptron and Single Layer Neural Network, and Hand Calculations
Learning In a Multi Layered Neural Net: Back Propagation and Conjugant Gradient Techniques
Neural Networks for Regression
Neural Networks for Classification
Interpretation of Outputs and Fine tune the models with hyper parameters
Validating ANN models

Supervised Learning: Support Vector Machines

Motivation for Support Vector Machine & Applications
Support Vector Regression
Support vector classifier (Linear & Non-Linear)
Mathematical Intuition (Kernel Methods Revisited, Quadratic Optimization and Soft Constraints)
Interpretation of Outputs and Fine tune the models with hyper parameters
Validating SVM models

Supervised Learning: KNN

What is KNN & Applications?
KNN for missing treatment
KNN For solving regression problems
KNN for solving classification problems
Validating KNN model
Model fine tuning with hyper parameters

Supervised Learning: Naïve Bayes

Concept of Conditional Probability
Bayes Theorem and Its Applications
Naïve Bayes for classification
Applications of Naïve Bayes in Classifications

Text Mining & Analytics

Taming big text, Unstructured vs. Semi-structured Data; Fundamentals of information retrieval, Properties of words; Creating Term-Document (TxD);Matrices; Similarity measures, Low-level processes (Sentence Splitting; Tokenization; Part-of-Speech Tagging; Stemming; Chunking)
Finding patterns in text: text mining, text as a graph
Natural Language processing (NLP)
Text Analytics – Sentiment Analysis using R
Text Analytics – Word cloud analysis using R
Text Analytics – Segmentation using K-Means/Hierarchical Clustering
Text Analytics – Classification (Spam/Not spam)
Applications of Social Media Analytics
Metrics(Measures Actions) in social media analytics
Examples & Actionable Insights using Social Media Analytics
Important R packages for Machine Learning (caret, H2O, Randomforest, nnet, tm etc)
Fine tuning the models using Hyper parameters, grid search, piping etc.

Project

Case Studies

DATASCIENCE TRAINING WITH S-A-S COURSE CONTENT

Introduction To Analytics

Analytics World
- Introduction to Analytics
- Concept of ETL
- S-A-S in advanced analytics
Global Certification: Induction and walk through
- Getting Started
- Software installation
- Introduction to GUI
- Different components of the language
- All programming windows
- Concept of Libraries and Creating Libraries
- Variable Attributes – (Name, Type, Length, Format, In format, Label)
- Importing Data and Entering data manually
Understanding Datasets
- Descriptor Portion of a Dataset (Proc Contents)
- Data Portion of a Dataset
- Variable Names and Values
- Data Libraries

Base S-A-S – Accessing The Data

Understanding Data Step Processing
- Data Step and Proc Step
- Data step execution
- Compilation and execution phase
- Input buffer and concept of PDV
Importing Raw Data Files
- Column Input and List Input and Formatted methods
- Delimiters, Reading missing and non standard values
- Reading one to many and many to one records
- Reading Hierarchical files
- Creating raw data files and put statement
- Formats / Informat
Importing and Exporting Data (Fixed Format / Delimited)
Proc Import / Delimited text files
Proc Export / Exporting Data
Datalines / Cards;
Atypical importing cases (mixing different style of inputs)
- Reading Multiple Records per Observation
- Reading “Mixed Record Types”
- Sub-setting from a Raw Data File
- Multiple Observations per Record
- Reading Hierarchical Files
Concept of SAS library and SAS Catalog
Variable Types in SAS
Reading Data stored external to SAS
Importing Data by using Proc Import
Data Step SAS statements
SAS Functions
Appending and Merging using SAS
SAS Procedures like proc means, proc Univariate, proc append, proc freq and proc export.
SAS SQL
SAS Macros

Hypothesis Testing and ANOVA

One Sample t-test of comparing means
Two Sample t-test of comparing means
One Way ANOVA
Assumptions of ANOVA Modeling
n-way ANOVA
ANOVA Post Hoc Studies

Measure Model Performance

Apply the principles of honest assessment to model performance measurement
Assess classifier performance using the confusion matrix
Model selection and validation using training and validation data
Create and interpret graphs (ROC, lift, and gains charts) for model comparison and selection
Establish effective decision cut-off values for scoring

Data Understanding, Managing And Manipulation

Understanding and Exploration Data
- Introduction to basic Procedures – Proc Contents, Proc Print
Understanding and Exploration Data
- Operators and Operands
- Conditional Statements (Where, If, If then Else, If then Do and select when)
- Difference between WHERE and IF statements and limitation of WHERE statements
- Labels, Commenting
- System Options (OBS, FSTOBS, NOOBS etc…)
Data Manipulation
- Proc Sort – with options / De-Duping
- Accumulator variable and By-Group processing
- Explicit Output Statements
- Nesting Do loops
- Do While and Do Until Statement
- Array elements and Range
Combining Datasets (Appending and Merging)
- Concatenation
- Interleaving
- Proc Append
- One To One Merging
- Match Merging
- IN = Controlling merge and Indicator

Data Mining With Proc SQL

Introduction to Databases
Introduction to Proc SQL
Basics of General SQL language
Creating table and Inserting Values
Retrieve & Summarize data
Group, Sort & Filter
Using Joins (Full, Inner, Left, Right and Outer)
Reporting and summary analysis
Concept of Indexes and creating Indexes (simple and composite)
Connecting S-A-S to external Databases
Implicit and Explicit pass through methods

Macros For Automation

Macro Parameters and Variables
Different types of Macro Creation
Defining and calling a macro
Using call Symput and Symget
Macros options (mprint symbolgen mlogic merror serror)

Fundamental Of Statistics

Basic Statistics – Measures of Central Tendencies and Variance
Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem
Inferential Statistics -Sampling – Concept of Hypothesis Testing
Statistical Methods – Z/t-tests( One sample, independent, paired), Anova, Correlations and Chi-square
Levels of Measurement and Variable types
Descriptive Statistics and Picturing Distributions
Confidence Interval for the Mean

Introduction To Predictive Modelling

Introduction to Predictive Modeling
Types of Business problems – Mapping of Techniques
Different Phases of Predictive Modeling

Data Preparation

Need of Data preparation
Data Audit Report and Its importance
Consolidation/Aggregation – Outlier treatment – Flat Liners – Missing values- Dummy creation – Variable Reduction
Variable Reduction Techniques – Factor & PCA Analysis

Segmentation

Introduction to Segmentation
Types of Segmentation (Subjective Vs Objective, Heuristic Vs. Statistical)
Heuristic Segmentation Techniques (Value Based, RFM Segmentation and Life Stage Segmentation)
Behavioural Segmentation Techniques (K-Means Cluster Analysis)
Cluster evaluation and profiling
Interpretation of results – Implementation on new data

Linear Regression

Introduction – Applications
Assumptions of Linear Regression
Building Linear Regression Model
Understanding standard metrics (Variable significance, R-square/Adjusted R-square, Global hypothesis ,etc)
Validation of Models (Re running Vs. Scoring)
Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers etc.)
Interpretation of Results – Business Validation – Implementation on new data

Logistic Regression

Introduction – Applications
Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models
Building Logistic Regression Model
Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test, Gini, KS, Misclassification, etc)
Validation of Logistic Regression Models (Re running Vs. Scoring)
Standard Business Outputs (Decile Analysis, ROC Curve,
Probability Cut-offs, Lift charts, Model equation, Drivers, etc)
Interpretation of Results – Business Validation -Implementation on new data

Time Series Forecasting

Introduction – Applications
Time Series Components( Trend, Seasonality, Cyclicity and Level) and Decomposition
Classification of Techniques(Pattern based – Pattern less)
Basic Techniques – Averages, Smoothening, etc
Advanced Techniques – AR Models, ARIMA, etc
Understanding Forecasting Accuracy – MAPE, MAD, MSE, etc

Introduction To Machine Learning

Statistical learning vs. Machine learning
Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning
Concept of Overfitting and Under fitting (Bias-Variance Trade off) & Performance Metrics
Types of Cross validation(Train & Test, Bootstrapping, K-Fold validation etc)

Regression & Classification Model Building

Recursive Partitioning(Decision Trees)
Ensemble Models(Random Forest, Bagging & Boosting)
K-Nearest neighbours

ADVANCED BIG DATASCIENCE COURSE CONTENT

Introduction To Data Science

What is Data Science?
Why Python for data science?
Relevance in industry and need of the hour
How leading companies are harnessing the power of Data Science with Python?
Different phases of a typical Analytics/Data Science projects and role of python
Anaconda vs. Python

Python Essentials (Core)

Overview of Python- Starting with Python
Introduction to installation of Python
Introduction to Python Editors & IDE’s(Canopy, pycharm, Jupyter, Rodeo, Ipython etc…)
Understand Jupyter notebook & Customize Settings
Concept of Packages/Libraries – Important packages(NumPy, SciPy, scikit-learn, Pandas, Matplotlib, etc)
Installing & loading Packages & Name Spaces
Data Types & Data objects/structures (strings, Tuples, Lists, Dictionaries)
List and Dictionary Comprehensions
Variable & Value Labels – Date & Time Values
Basic Operations – Mathematical – string – date
Reading and writing data
Simple plotting
Control flow & conditional statements
Debugging & Code profiling
How to create class and modules and how to call them?
Scientific distributions used in python for Data Science – Numpy, scify, pandas, scikitlearn, statmodels, nltk etc

Accessing/Importing And Exporting Data Using Python Modules

Importing Data from various sources (Csv, txt, excel, access etc)
Database Input (Connecting to database)
Viewing Data objects – subsetting, methods
Exporting Data to various formats
Important python modules: Pandas, beautifulsoup

Data Manipulation – Cleansing – Munging Using Python Modules

Cleansing Data with Python
Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting, derived variables, sampling, Data type conversions, renaming, formatting etc)
Data manipulation tools(Operators, Functions, Packages, control structures, Loops, arrays etc)
Python Built-in Functions (Text, numeric, date, utility functions)
Python User Defined Functions
Stripping out extraneous information
Normalizing data
Formatting data
Important Python modules for data manipulation (Pandas, Numpy, re, math, string, datetime etc)

Data Analysis – Visualization Using Python

Introduction exploratory data analysis
Descriptive statistics, Frequency Tables and summarization
Univariate Analysis (Distribution of data & Graphical Analysis)
Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)
Creating Graphs- Bar/pie/line chart/histogram/ boxplot/ scatter/ density etc)
Important Packages for Exploratory Analysis(NumPy Arrays, Matplotlib, seaborn, Pandas and scipy.stats etc)

Basic Statistics & Implementation Of Stats Methods In Python

Basic Statistics – Measures of Central Tendencies and Variance
Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem
Inferential Statistics -Sampling – Concept of Hypothesis Testing
Statistical Methods – Z/t-tests (One sample, independent, paired), Anova, Correlation and Chi-square
Important modules for statistical methods: Numpy, Scipy, Pandas

Python: Machine Learning -Predictive Modeling – Basics

Introduction to Machine Learning & Predictive Modeling
Types of Business problems – Mapping of Techniques – Regression vs. classification vs. segmentation vs. Forecasting
Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning
Different Phases of Predictive Modeling (Data Pre-processing, Sampling, Model Building, Validation)
Overfitting (Bias-Variance Trade off) & Performance Metrics
Feature engineering & dimension reduction
Concept of optimization & cost function
Concept of gradient descent algorithm
Concept of Cross validation(Bootstrapping, K-Fold validation etc)
Model performance metrics (R-square, RMSE, MAPE, AUC, ROC curve, recall, precision, sensitivity, specificity, confusion metrics)

Machine Learning Algorithms & Applications – Implementation In Python

Linear & Logistic Regression
Segmentation – Cluster Analysis (K-Means)
Decision Trees (CART/CD 5.0)
Ensemble Learning (Random Forest, Bagging & boosting)
Artificial Neural Networks(ANN)
Support Vector Machines(SVM)
Other Techniques (KNN, Naïve Bayes, PCA)
Introduction to Text Mining using NLTK
Introduction to Time Series Forecasting (Decomposition & ARIMA)
Important python modules for Machine Learning (SciKit Learn, stats models, scipy, nltk etc)
Fine tuning the models using Hyper parameters, grid search, piping etc.

Project – Consolidate Learnings

Applying different algorithms to solve the business problems and bench mark the results

Introduction To Big Data

Introduction and Relevance
Uses of Big Data analytics in various industries like Telecom, E- commerce, Finance and Insurance etc.
Problems with Traditional Large-Scale Systems

Hadoop(Big Data) Eco-System

Motivation for Hadoop
Different types of projects by Apache
Role of projects in the Hadoop Ecosystem
Key technology foundations required for Big Data
Limitations and Solutions of existing Data Analytics Architecture
Comparison of traditional data management systems with Big Data management systems
Evaluate key framework requirements for Big Data analytics
Hadoop Ecosystem & Hadoop 2.x core components
Explain the relevance of real-time data
Explain how to use Big Data and real-time data as a Business planning tool

Hadoop Cluster-Architecture-Configuration Files

Hadoop Master-Slave Architecture
The Hadoop Distributed File System – Concept of data storage
Explain different types of cluster setups(Fully distributed/Pseudo etc)
Hadoop cluster set up – Installation
Hadoop 2.x Cluster Architecture
A Typical enterprise cluster – Hadoop Cluster Modes
Understanding cluster management tools like Cloudera manager/Apache ambari

Hadoop-HDFS & MapReduce (YARN)

HDFS Overview & Data storage in HDFS
Get the data into Hadoop from local machine(Data Loading Techniques) – vice versa
Map Reduce Overview (Traditional way Vs. MapReduce way)
Concept of Mapper & Reducer
Understanding MapReduce program Framework
Develop MapReduce Program using Java (Basic)
Develop MapReduce program with streaming API) (Basic)

Data Integration Using Sqoop & Flume

Integrating Hadoop into an Existing Enterprise
Loading Data from an RDBMS into HDFS by Using Sqoop
Managing Real-Time Data Using Flume
Accessing HDFS from Legacy Systems

Data Analysis Using Pig

Introduction to Data Analysis Tools
Apache PIG – MapReduce Vs Pig, Pig Use Cases
PIG’s Data Model
PIG Streaming
Pig Latin Program & Execution
Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators, Pig UDF
Writing JAVA UDF’s
Embedded PIG in JAVA
PIG Macros
Parameter Substitution
Use Pig to automate the design and implementation of MapReduce applications
Use Pig to apply structure to unstructured Big Data

Data Analysis Using Hive

Apache Hive – Hive Vs. PIG – Hive Use Cases
Discuss the Hive data storage principle
Explain the File formats and Records formats supported by the Hive environment
Perform operations with data in Hive
Hive QL: Joining Tables, Dynamic Partitioning, Custom Map/Reduce Scripts
Hive Script, Hive UDF
Hive Persistence formats
Loading data in Hive – Methods
Serialization & Deserialization
Handling Text data using Hive
Integrating external BI tools with Hadoop Hive

Data Analysis Using Impala

Impala & Architecture
How Impala executes Queries and its importance
Hive vs. PIG vs. Impala
Extending Impala with User Defined functions

Introduction To Other Ecosystem Tools

NoSQL database – Hbase
Introduction Oozie

Spark: Introduction

Introduction to Apache Spark
Streaming Data Vs. In Memory Data
Map Reduce Vs. Spark
Modes of Spark
Spark Installation Demo
Overview of Spark on a cluster
Spark Standalone Cluster

Spark: Spark In Practice

Invoking Spark Shell
Creating the Spark Context
Loading a File in Shell
Performing Some Basic Operations on Files in Spark Shell
Caching Overview
Distributed Persistence
Spark Streaming Overview(Example: Streaming Word Count)

Spark: Spark Meets Hive

Analyze Hive and Spark SQL Architecture
Analyze Spark SQL
Context in Spark SQL
Implement a sample example for Spark SQL
Integrating hive and Spark SQL
Support for JSON and Parquet File Formats Implement Data Visualization in Spark
Loading of Data
Hive Queries through Spark
Performance Tuning Tips in Spark
Shared Variables: Broadcast Variables & Accumulators

Spark Streaming

Extract and analyze the data from twitter using Spark streaming
Comparison of Spark and Storm – Overview

Spark GraphX

Overview of GraphX module in spark
Creating graphs with GraphX

Introduction To Machine Learning Using Spark

Understand Machine learning framework
Implement some of the ML algorithms using Spark MLLib

Project

Consolidate all the learnings
Working on Big Data Project by integrating various key components

Projects :-

Python Projects

Random password generator	Mini
CLI based scientific calculator	Mini
Instagram bot	Mini
Expense Tracker	Mini
Site connectivity checker	Mini
Lawn Tennis Match Highlight (Can be extended to any sport)	Major
NLP library	Major

Deep Learning Projects

Churn Modelling using ANN	Mini
Image Classification	Mini
Image classification using Transfer learning	Major
Sentence Classification using RNN,LSTM,GRU	Mini
Sentence Classification using word embeddings	Major
Object Detection using yolo	Major

Machine Learning Projects

EDA on movies database	Mini
House price prediction using Regression	Mini
Predict survival on the Titanic using Classification	Mini
Image Clustering	Mini
Document Clustering	Mini
Twitter US Airline Sentiment	Major
Restaurant revenue prediction	Major
Disease Prediction	Major

Note: Depends upon Trainers above projects may vary

DataScience Demo Session : –

Learn Data Science and Data Analytics – Course in Pune with Training, Certification & Guaranteed Job Placement Assistance!

Online Batches Available for the Areas

DataQubez University creates meaningful big data & Data Science certifications that are recognized in the industry as a confident measure of qualified, capable big data experts. How do we accomplish that mission? DataQubez certifications are exclusively hands on, performance-based exams that require you to complete a set of tasks. Demonstrate your expertise with the most sought-after technical skills. Big data success requires professionals who can prove their mastery with the tools and techniques of the Hadoop stack. However, experts predict a major shortage of advanced analytics skills over the next few years. At DataQubez, we’re drawing on our industry leadership and early corpus of real-world experience to address the big data & Data Science talent gap.

How To Become Certified Data Science Professional Engineer

Certification Code – DQCP – 501

Certification Description – DataQubez Certified Professional Data Science Engineer

Exam Objectives

Configuration :-

Define and deploy a rack topology script, Change the configuration of a service using Apache Hadoop, Configure the Capacity Scheduler, Create a home directory for a user and configure permissions, Configure the include and exclude DataNode files

Troubleshooting :-

Restart an Cluster service, View an application’s log file, Configure and manage alerts Troubleshoot a failed job

High Availability :-

Configure NameNode, Configure ResourceManager, Copy data between two clusters, Create a snapshot of an HDFS directory, Recover a snapshot, Configure HiveServer2

Data Ingestion – with Sqoop & Flume :-

Import data from a table in a relational database into HDFS, Import the results of a query from a relational database into HDFS, Import a table from a relational database into a new or existing Hive table, Insert or update data from HDFS into a table in a relational database, Given a Flume configuration file, start a Flume agent, Given a configured sink and source, configure a Flume memory channel with a specified capacity

Data Transformation Using Pig :-

Write and execute a Pig script, Load data into a Pig relation without a schema, Load data into a Pig relation with a schema, Load data from a Hive table into a Pig relation, Use Pig to transform data into a specified format, Transform data to match a given Hive schema, Group the data of one or more Pig relations, Use Pig to remove records with null values from a relation, Store the data from a Pig relation into a folder in HDFS, Store the data from a Pig relation into a Hive table, Sort the output of a Pig relation, Remove the duplicate tuples of a Pig relation, Specify the number of reduce tasks for a Pig MapReduce job, Join two datasets using Pig, Perform a replicated join using Pig

Data Analysis Using Hive :-

Write and execute a Hive query, Define a Hive-managed table, Define a Hive external table, Define a partitioned Hive table, Define a bucketed Hive table, Define a Hive table from a select query, Define a Hive table that uses the ORCFile format, Create a new ORCFile table from the data in an existing non-ORCFile Hive table, Specify the storage format of a Hive table Specify the delimiter of a Hive table, Load data into a Hive table from a local directory Load data into a Hive table from an HDFS directory, Load data into a Hive table as the result of a query, Load a compressed data file into a Hive table, Update a row in a Hive table, Delete a row from a Hive table, Insert a new row into a Hive table, Join two Hive tables, Set a Hadoop or Hive configuration property from within a Hive query.

Data Processing through Spark & Spark SQL& Python :-

Frame big data analysis problems as Apache Spark scripts, Optimize Spark jobs through partitioning, caching, and other techniques, Develop distributed code using the Scala programming language, Build, deploy, and run Spark scripts on Hadoop clusters, Transform structured data using SparkSQL and DataFrames

Recomandtion Engine using Spark MLLIB & Python :-

Using MLLib to Produce Recomandation Engine, Run Page rank algorithem, using dataframes with mllib, Machine Learning with Spark

Stream Data Processing using Spark Streaming& Python :-

Process Stream Data using spark streaming.

Regression with Spark& Python :-

Introduction to Linear Regression, Introduction to Regression Section, Linear Regression Documentation Alternate Linear Regression Data CSV File, Linear Regression Walkthrough , Linear Regression Project

Classification with Spark & Python :-

Classification, Classification Documentation, Spark Classification – Logistic Regression , Logistic Regression Amendments, Classification Project

Clustering with Spark & Python :-

Clustering with Spark & Python, KMeans, Example of KMeans with Spark & Python, Clustering Project

Model Evaluation & Python :-

Model Evaluation, Spark Model Evaluation, Spark – Model Evaluation – Regression

R Programming :-

Program in R, Create Data Visualizations, Use R to manipulate data easily, Use R for Data Science, Use R for Data Analysis, Use R to handle csv,excel,SQL files or web scraping, Use R for Machine Learning Algorithms, Machine Learning with R – Linear Regression, Machine Learning with R – Logistic Regression

For Exam Registration of DataQubez Certified Professional Data Science Engineer, Click here:

Trainer for Big data & Data science course is having 11 years of exp. in the same technologies, he is industry expert. Trainer itself cloudera certified along with AWS (Solution Architecture) and GCP (Google Cloud Platform) certified. And also he is certified data scientist from The University of Chicago.

Training By 11+ Years experienced Real Time Trainer
A pool of 200+ real time Practical Sessions on Data Science and Analytics
Scenarios and Assignments to make sure you compete with current Industry standards
World class training methods
Training until the candidate get placed
Certification and Placement Support until you get certified and placed
All training in reasonable cost
10000+ Satisfied candidates
5000+ Placement Records
Corporate and Online Training in reasonable Cost
Complete End-to-End Project with Each Course
World Class Lab Facility which facilitates I3 /I5 /I7 Servers and Cisco UCS Servers

Covers Topics other than from Books which is required for the IT Industry
Resume And Interview preparation with 100% Hands-on Practical sessions
Doubt clearing sessions any time after the course
Happy to help you any time after the course

In classroom we solve real time problem, and also push students to create at-least a demo model and push his/her code into GIT, also in class we solve real time problem or data world problems.

Radical technologies, we believe that the best way to learn job-skills is from industry professionals. So, we are building an alternate higher education system, when you can learn job-skills from industry experts and get certified by companies. we complete the course as in classroom method with 85% Practical scenarios complete hands-on on each and every point of the course. and if student faces any issue in future he/she can join also in next batch. These courses are delivered through a live interactive classroom platform

We provide in classroom for solving real time problem, and also trying push to students at least create a demo model and push his/her code into GIT, also in class we solve real time Kaggle problem or data world problems.

Big Data with Cloud Computing (AWS) – Amazon Web Services

Big Data with Cloud Computing (GCP) – Google Cloud Platform

Big Data & Data Science with Cloud Computing (AWS) – Amazon Web Services

Big Data & Data Science with Cloud Computing (GCP) – Google Cloud Platform

Data Science with R & Spark with Python & Scala

Machine Learning with Google Cloud Platform with Tensor Flow

Data Science and Data Analytics

100% Placement Guarantee for the Right Candidate

10+ Years Real Time Experienced Trainers

Learn from Industry Experts, Hands-on labs

Flexible Options: online, instructor-led, self-paced

14+ Years of Industry Recognitions

1 Lakh+ Students Trained

50,000+ Students Placed

Guaranteed 5+ Interview Calls

Top MNCs - Associated with 800+ Recruiters

Free Internship Project & Certification

Monthly Job Fair - Virtual as well as Physica

5000+ Reviews & Ratings

Linear Regression: Solving Regression Problems

Machine Learning -Predictive Modeling – Basics

Supervised Learning: Ensemble Learning

Supervised Learning: Naïve Bayes

Text Mining & Analytics

Project

Learn Data Science and Data Analytics – Course in Pune with Training, Certification & Guaranteed Job Placement Assistance!

Exam Objectives

Configuration :-

Troubleshooting :-

High Availability :-

Data Ingestion – with Sqoop & Flume :-

Data Transformation Using Pig :-

Data Analysis Using Hive :-

Data Processing through Spark & Spark SQL& Python :-

Recomandtion Engine using Spark MLLIB & Python :-

Stream Data Processing using Spark Streaming& Python :-

Regression with Spark& Python :-

Classification with Spark & Python :-

Clustering with Spark & Python :-

Model Evaluation & Python :-

R Programming :-

Our Courses

Drop A Query

Get a Call Back from Our Career Assistance Team

Enquire Now

Enquire Now

Decourse fees

Enquire Now & Get 10% Off!

(Our Team will call you to discuss the Fees)

Enquire Now & Get 10% Off!

(Our Team will call you to discuss the Fees)

Get a Call Back from Our Career Assistance Team

Enquire Now