**Data Science and Data Analytics Using Spark | Scala | R | Python**

Duration : 3 Months – Weekends 3 Hours on Saturday and Sundays

Real Time Projects , Assignments , scenarios are part of this course

Data Sets , installations , Interview Preparations , Repeat the session until 6 months are all attractions of this particular course

Trainer :- Experienced DataScience Consultant

Satisfied Learners

One time class room registraion to click here Fee 1000/-
## Event batch schedules:

## Online training batch schedules:

Location | Day/Duration | Start Date | ₹ Price | Book Seat |
---|---|---|---|---|

Pune | 2 days | 31/03/2018 | ₹ 12000.00 | Enroll Now |

Mode | Day/Duration | Start Date | End Date | ₹ Price | Book Seat |
---|---|---|---|---|---|

Online | 8 Weeks, 4 Days | 30/07/2018 | 27/09/2018 | ₹ 20000.00 | Enroll Now |

**Data Science and Data Analytics Using Spark | Scala | R | Python**

Duration : 3 Months – Weekends 3 Hours on Saturday and Sundays

Real Time Projects , Assignments , scenarios are part of this course

Data Sets , installations , Interview Preparations , Repeat the session until 6 months are all attractions of this particular course

Trainer :- Experienced DataScience Consultant

**Want to be Future Data Scientist **

**Introduction: ** This course does not require a prior quantitative or mathematics background. It starts by introducing basic concepts such as the mean, median mode etc. and eventually covers all aspects of an analytics (or) data science career from analyzing and preparing raw data to visualizing your findings. If you’re a programmer or a fresh graduate looking to switch into an exciting new career track, or a data analyst looking to make the transition into the tech industry – this course will teach you the basic to Advance techniques used by real-world industry data scientists.

**Data Science, Statistics with R & Python: **This course is an introduction to Data Science and Statistics using the R programming language with Python. It covers both the theoretical aspects of Statistical concepts and the practical implementation using R and Python. If you’re new to Python, don’t worry – the course starts with a crash course. If you’ve done some programming before or you are new in Programming, you should pick it up quickly. This course shows you how to get set up on Microsoft Windows-based PC’s; the sample code will also run on MacOS or Linux desktop systems.

**What’s Spark? **If you are an analyst or a data scientist, you’re used to having multiple systems for working with data. SQL, Python, R, Java, etc. With Spark, you have a single engine where you can explore and play with large amounts of data, run machine learning algorithms and then use the same system to productionize your code.

**Scala: **Scala is a general purpose programming language – like Java or C++. It’s functional programming nature and the availability of a REPL environment make it particularly suited for a distributed computing framework like Spark.

**Analytics: **Using Spark and Scala you can analyze and explore your data in an interactive environment with fast feedback. The course will show how to leverage the power of RDDs and Dataframes to manipulate data with ease.

**Machine Learning and Data Science : **Spark’s core functionality and built-in libraries make it easy to implement complex algorithms like Recommendations with very few lines of code. We’ll cover a variety of datasets and algorithms including PageRank, MapReduce and Graph datasets.

**Real life examples: **Every concept is explained with the help of examples, case studies and source code in R wherever necessary. The examples cover a wide array of topics and range from A/B testing in an Internet company context to the Capital Asset Pricing Model in a quant finance context. ** **

**What am I going to get from this course?**

- Harness R and R packages to read, process and visualize data
- Understand linear regression and use it confidently to build models
- Understand the intricacies of all the different data structures in R
- Use Linear regression in R to overcome the difficulties of LINEST() in Excel
- Draw inferences from data and support them using tests of significance
- Use descriptive statistics to perform a quick study of some data and present results
- Use Spark for a variety of analytics and Machine Learning tasks
- Understand functional programming constructs in Scala
- Implement complex algorithms like PageRank or Music Recommendations
- Work with a variety of datasets from Airline delays to Twitter, Web graphs, Social networks and Product Ratings
- Use all the different features and libraries of Spark : RDDs, Dataframes, Spark SQL, MLlib, Spark Streaming and GraphX
- Write code in Scala REPL environments and build Scala applications with an IDE
- Course Completion Certificate.

**Target audience?**

- Engineering/Management Graduate or Post-graduate Fresher Students who want to make their career in Data Science Industry or want to be future Data Scientist.
- Engineers who want to use a distributed computing engine for batch or stream processing or both
- Analysts who want to leverage Spark for analyzing interesting datasets
- Data Scientists who want a single engine for analyzing and modelling data as well as productionizing it.
- MBA Graduates or business professionals who are looking to move to a heavily quantitative role.
- Engineering Graduate/Professionals who want to understand basic statistics and lay a foundation for a career in Data Science
- Working Professional or Fresh Graduate who have mostly worked in Descriptive analytics or not work anywhere and want to make the shift to being modelers or data scientists
- Professionals who’ve worked mostly with tools like Excel and want to learn how to use R for statistical analysis.

**Course Curriculum**

**Introduction to Spark**

What does Donald Rumsfeld have to do with data analysis?

Why is Spark so cool?

An introduction to RDDs – Resilient Distributed Datasets

Built-in libraries for Spark

Installing Spark

The PySpark Shell

Transformations and Actions

See it in Action: Munging Airlines Data with PySpark – I

[For Linux/Mac OS Shell Newbies] Path and other Environment Variables

**Resilient Distributed Datasets**

RDD Characteristics: Partitions and Immutability

RDD Characteristics: Lineage, RDDs know where they came from

What can you do with RDDs?

Create your first RDD from a file

Average distance travelled by a flight using map() and reduce() operations

Get delayed flights using filter(), cache data using persist()

Average flight delay in one-step using aggregate()

Frequency histogram of delays using countByValue()

Project – Analyzing Airlines Data with PySpark

**Advanced RDDs: Pair Resilient Distributed Datasets**

Special Transformations and Actions

Average delay per airport, use reduceByKey(), mapValues() and join()

Average delay per airport in one step using combineByKey()

Get the top airports by delay using sortBy()

Lookup airport descriptions using lookup(), collectAsMap(), broadcast()

Project : Analyzing Airlines Data with PySpark – III

**Advanced Spark: Accumulators, Spark Submit, MapReduce , Behind The Scenes**

Get information from individual processing nodes using accumulators

See it in Action : Using an Accumulator variable

Long running programs using spark-submit

See it in Action : Running a Python script with Spark-Submit

Behind the scenes: What happens when a Spark script runs?

Running MapReduce operations

Activity : MapReduce with Spark

**Java and Spark**

The Java API and Function objects

Pair RDDs in Java

Running Java code

Installing Maven

Practical Activity : Running a Spark Job with Java

**PageRank: Ranking Search Results**

What is PageRank?

The PageRank algorithm

Implement PageRank in Spark

Join optimization in PageRank using Custom Partitioning

See it Action : The PageRank algorithm using Spark

**Spark SQL**

Dataframes: RDDs + Tables

Dataframes and Spark SQL

**MLlib in Spark: Build a recommendations engine**

Collaborative filtering algorithms

Latent Factor Analysis with the Alternating Least Squares method

Music recommendations using the Audioscrobbler dataset

Implement code in Spark using MLlib

**Spark Streaming**

Introduction to streaming

Implement stream processing in Spark using Dstreams

Stateful transformations using sliding windows

See it in Action : Spark Streaming

**Graph Libraries**

The Marvel social network using Graphs

**Descriptive Statistics**

Descriptive Statistics : Mean, Median, Mode

Our first foray into R : Frequency Distributions

Draw your first plot : A Histogram

Computing Mean, Median, Mode in R

What is IQR (Inter-quartile Range)?

Box and Whisker Plots

The Standard Deviation

Computing IQR and Standard Deviation in R

**Inferential Statistics**

Drawing inferences from data

Random Variables are ubiquitous

The Normal Probability Distribution

Sampling is like fishing

Sample Statistics and Sampling Distributions

**Case studies in Inferential Statistics**

**Case Study 1 :** Football Players (Estimating Population Mean from a Sample)

**Case Study 2 :** Election Polling (Estimating Population Proportion from a Sample)

**Case Study 3 :** A Medical Study (Hypothesis Test for the Population Mean)

**Case Study 4 :** Employee Behavior (Hypothesis Test for the Population Proportion)

**Case Study 5:** A/B Testing (Comparing the means of two populations)

**Case Study 6:** Customer Analysis (Comparing the proportions of 2 populations)

**Diving into R**

Harnessing the power of R

Assigning Variables

Printing an output

Numbers are of type numeric

Characters and Dates

Logicals

**Vectors**

Data Structures are the building blocks of R

Creating a Vector, The Mode of a Vector

Vectors are Atomic

Doing something with each element of a Vector

Aggregating Vectors

Operations between vectors of the same length

Operations between vectors of different length

Generating Sequences

Using conditions with Vectors

Find the lengths of multiple strings using Vectors

Generate a complex sequence (using recycling)

Vector Indexing (using numbers)

Vector Indexing (using conditions)

Vector Indexing (using names)

**Arrays**

Creating an Array

Indexing an Array

Operations between 2 Arrays

Operations between an Array and a Vector

Outer Products

**Matrices**

A Matrix is a 2-Dimensional Array

Creating a Matrix

Matrix Multiplication

Merging Matrices

Solving a set of linear equations

**Factors**

What is a factor?

Find the distinct values in a dataset (using factors)

Replace the levels of a factor

Aggregate factors with table()

Aggregate factors with tapply()

**Lists and Data Frames**

Introducing Lists

Introducing Data Frames

Reading Data from files

Indexing a Data Frame

Aggregating and Sorting a Data Frame

Merging Data Frames

**Regression quantifies relationships between variables**

Introducing Regression

What is Linear Regression?

A Regression Case Study : The Capital Asset Pricing Model (CAPM)

**Linear Regression in Excel**

Linear Regression in Excel : Preparing the data

Linear Regression in Excel : Using LINEST()

**Linear Regression in R**

Linear Regression in R : Preparing the data

Linear Regression in R : lm() and summary()

Multiple Linear Regression

Adding Categorical Variables to a linear model

Robust Regression in R : rlm()

Parsing Regression Diagnostic Plots

**Data Visualization in R**

Data Visualization

The plot() function in R

Control color palettes with RColorbrewer

Drawing barplots

Drawing a heatmap

Drawing a Scatterplot Matrix

Plot a line chart with ggplot2

**Introducing Scala**

Introducing Scala Java’s Cool Cousin

Installing Scala

Hello world

Mutable and Immutable ‘variables’

Type Inference

String Operations

A Unified Type System

Emptiness in Scala

Type Operations

**Expressions or Statements?**

Module Outline Loops and Conditionals

Statements v Expressions

Defining Values and Variables via Expressions

Nested Scopes in Expression Blocks

If/Else expression blocks

match expressions

match expressions Pattern guards & ORed expressions

match expressions catchall to matchall

match expressions down casting with Pattern Variables

for loops can be expressions OR statements

for loops types of iterators

for loops with if conditions Pattern Guards

while/dowhile Loops Pure Statements

**First Class Functions**

Module Outline Functions

First Class Functions

Functions v Methods

Functions are named, reusable expressions

Assigning Methods to Values

Invoking Functions with Tuples as Parameters

Named Function Parameters

Parameter Default Values

Type Parameters Parametric Polymorphism

Vararg Parameters

Procedures are named, reusable statements

Functions with No Inputs

Nested Functions

Higher Order Functions

Anonymous Functions (aka Function Literals)

Placeholder Syntax

Partially Applied Functions

Currying

ByName Parameters

Closures

**Collections**

Module Outline Collections, Tuples

Creating Lists, Simple List Operations, Higher Order Functions Introduced

Scan, ScanFold,ScanReduce

Fold, FoldLeft, FoldRight

Reduce,ReduceLeft,ReduceRight

Other, Simpler Reduce Operations

Sets and Maps

Mutable Collections, and Arrays

Option Collections

Error handling with utilTry

**Classes and Objects**

Module Outline Classes, Classes, Primary v Auxiliary Constructors

Inheritance from Classes

Abstract Classes

Anonymous Classes

Type Parameters

Lazy Values

Default Methods with apply

Operators

Access Modifiers

Singleton Objects

Companion Objects

Traits, Case Classes, Self Types

**Getting Started With Python**

Introduction

Activity] Getting What You Need

[Activity] Installing Enthought Canopy

Python Basics, Part 1

[Activity] Python Basics, Part 2

Running Python Scripts

**Statistics and Probability Refresher, and Python Practise**

Types of Data

Mean, Median, Mode

[Activity] Using mean, median, and mode in Python

[Activity] Variation and Standard Deviation

Probability Density Function; Probability Mass Function

Common Data Distributions

[Activity] Percentiles and Moments

[Activity] A Crash Course in matplotlib

[Activity] Covariance and Correlation

[Exercise] Conditional Probability

Exercise Solution: Conditional Probability of Purchase by Age

Bayes’ Theorem

**Predictive Models**

[Activity] Linear Regression

[Activity] Polynomial Regression

[Activity] Multivariate Regression, and Predicting Car Prices

Multi-Level Models

**Machine Learning with Python**

Supervised vs. Unsupervised Learning, and Train/Test

[Activity] Using Train/Test to Prevent Overfitting a Polynomial Regression

Bayesian Methods: Concepts

[Activity] Implementing a Spam Classifier with Naive Bayes

K-Means Clustering

[Activity] Clustering people based on income and age

Measuring Entropy

[Activity] Install GraphViz

Decision Trees: Concepts

[Activity] Decision Trees: Predicting Hiring Decisions

Ensemble Learning

Support Vector Machines (SVM) Overview

[Activity] Using SVM to cluster people using scikit-learn

Summary

Reviewer

Kavitha Jadav

Review Date

Reviewed Item

I have just completed training for DataScience from Radical institute . Trainer has indepth knowledge and excellent teaching skill. Sir can identify the learning capacity of each student and trained from grass root. Sir starts training from very basic points and he gets all the prerequisites ready so no one is in trouble. He is very attentive. He always kept the sessions interesting and interactive, explain the concept with real time industry scenario. I just want to say that sir you are the best because you brought out the best in us. Highly recommend his Python training for the same. I also thank to Radical Technologies for keeping such good and experienced faculties through which we are getting high quality training.

Author Rating

**CCA Data Analyst**

**Cloudera Certified Professional (CCP)**

CCP Data Engineer

Trainer for Big data & Data science course is having 11 years of exp. in the same technologies, he is industry expert. Trainer itself cloudera certified along with AWS (Solution Architecture) and GCP (Google Cloud Platform) certified. And also he is certified data scientist from The University of Chicago.

- Training By 11+ Years experienced Real Time Trainer
- A pool of 200+ real time Practical Sessions on Data Science and Analytics
- Scenarios and Assignments to make sure you compete with current Industry standards
- World class training methods
- Training until the candidate get placed
- Certification and Placement Support until you get certified and placed
- All training in reasonable cost
- 10000+ Satisfied candidates
- 5000+ Placement Records
- Corporate and Online Training in reasonable Cost
- Complete End-to-End Project with Each Course
- World Class Lab Facility which facilitates I3 /I5 /I7 Servers and Cisco UCS Servers
- Covers Topics other than from Books which is required for the IT Industry
- Resume And Interview preparation with 100% Hands-on Practical sessions
- Doubt clearing sessions any time after the course
- Happy to help you any time after the course

In classroom we solve real time problem, and also push students to create at-least a demo model and push his/her code into GIT, also in class we solve real time problem or data world problems.

Radical technologies, we believe that the best way to learn job-skills is from industry professionals. So, we are building an alternate higher education system, when you can learn job-skills from industry experts and get certified by companies. we complete the course as in classroom method with 85% Practical scenarios complete hands-on on each and every point of the course. and if student faces any issue in future he/she can join also in next batch. These courses are delivered through a live interactive classroom platform

We provide in classroom for solving real time problem, and also trying push to students at least create a demo model and push his/her code into GIT, also in class we solve real time Kaggle problem or data world problems.

Big Data with Cloud Computing (AWS) – Amazon Web Services

Big Data with Cloud Computing (GCP) – Google Cloud Platform

Big Data & Data Science with Cloud Computing (AWS) – Amazon Web Services

Big Data & Data Science with Cloud Computing (GCP) – Google Cloud Platform

Data Science with R & Spark with Python & Scala

Machine Learning with Google Cloud Platform with Tensor Flow

Quick Enquiry

Summary

Reviewer

Kavitha Jadav

Review Date

Reviewed Item

I have just completed training for DataScience from Radical institute . Trainer has indepth knowledge and excellent teaching skill. Sir can identify the learning capacity of each student and trained from grass root. Sir starts training from very basic points and he gets all the prerequisites ready so no one is in trouble. He is very attentive. He always kept the sessions interesting and interactive, explain the concept with real time industry scenario. I just want to say that sir you are the best because you brought out the best in us. Highly recommend his Python training for the same. I also thank to Radical Technologies for keeping such good and experienced faculties through which we are getting high quality training.

Author Rating