Overview
R is a very popular, open source environment for statistical computing, data analytics and graphics. This course introduces R programming language to students. It covers language fundamentals, libraries and advanced concepts. Advanced data analytics and graphing with real world data.
Audience
Developers / data analytics
Duration
3 days
Format
Lectures and Hands-on
Requirements
- Basic programming background is preferred
Setup
- A modern laptop
- Latest R studio and R environment installed
Course Outline
Day One: Language Basics
- Course Introduction
- About Data Science
- Data Science Definition
- Process of Doing Data Science.
- Introducing R Language
- Variables and Types
- Control Structures (Loops / Conditionals)
- R Scalars, Vectors, and Matrices
- Defining R Vectors
- Matricies
- String and Text Manipulation
- Character data type
- File IO
- Lists
- Functions
- Introducing Functions
- Closures
- lapply/sapply functions
- DataFrames
- Labs for all sections
Day Two: Intermediate R Programming
- DataFrames and File I/O
- Reading data from files
- Data Preparation
- Built-in Datasets
- Visualization
- Graphics Package
- plot() / barplot() / hist() / boxplot() / scatter plot
- Heat Map
- ggplot2 package (qplot(), ggplot())
- Exploration With Dplyr
- Labs for all sections
Day Three: Advanced Programming With R
- Statistical Modeling With R
- Statistical Functions
- Dealing With NA
- Distributions (Binomial, Poisson, Normal)
- Regression
- Introducing Linear Regressions
- Recommendations
- Text Processing (tm package / Wordclouds)
- Clustering
- Introduction to Clustering
- KMeans
- Classification
- Introduction to Classification
- Naive Bayes
- Decision Trees
- Training using caret package
- Evaluating Algorithms
- R and Big Data
- Connecting R to databases
- Big Data Ecosystem
- Labs for all sections