Overview
Objective:
Delegates be able to analyse big data sets, extract patterns, choose the right variable impacting the results so that a new model is forecasted with predictive results.
Course Outline
-
Data preprocessing
- Data Cleaning
- Data integration and transformation
- Data reduction
- Discretization and concept hierarchy generation
-
Statistical inference
- Probability distributions, Random variables, Central limit theorem
- Sampling
- Confidence intervals
- Statistical Inference
- Hypothesis testing
-
Multivariate linear regression
- Specification
- Subset selection
- Estimation
- Validation
- Prediction
-
Classification methods
- Logistic regression
- Linear discriminant analysis
- K-nearest neighbours
- Naive Bayes
- Comparison of Classification methods
-
Neural Networks
- Fitting neural networks
- Training neural networks issues
-
Decision trees
- Regression trees
- Classification trees
- Trees Versus Linear Models
-
Bagging, Random Forests, Boosting
- Bagging
- Random Forests
- Boosting
-
Support Vector Machines and Flexible disct
- Maximal Margin classifier
- Support vector classifiers
- Support vector machines
- 2 and more classes SVM’s
- Relationship to logistic regression
-
Principal Components Analysis
-
Clustering
- K-means clustering
- K-medoids clustering
- Hierarchical clustering
- Density based clustering
-
Model Assesment and Selection
- Bias, Variance and Model complexity
- In-sample prediction error
- The Bayesian approach
- Cross-validation
- Bootstrap methods