Overview
R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.
Requirements
This course is part of the Data Scientist skill set (Domain: Analytical Techniques and Methods)
Course Outline
Introduction to Data mining and Machine Learning
- Statistical learning vs. Machine learning
- Iteration and evaluation
- Bias-Variance trade-off
Regression
- Linear regression
- Generalizations and Nonlinearity
- Exercises
Classification
- Bayesian refresher
- Naive Bayes
- Dicriminant analysis
- Logistic regression
- K-Nearest neighbors
- Support Vector Machines
- Neural networks
- Decision trees
- Exercises
Cross-validation and Resampling
- Cross-validation approaches
- Bootstrap
- Exercises
Unsupervised Learning
- K-means clustering
- Examples
- Challenges of unsupervised learning and beyond K-means
Advanced topics
- Ensemble models
- Mixed models
- Boosting
- Examples
Multidimensional reduction
- Factor Analysis
- Principal Component Analysis
- Examples