Overview
Data Cleaning or Data Cleansing refers to the process of detecting and fixing issues in a data set before analyzing it.
This instructor-led, live training (online or onsite) is aimed at data scientists, data analysts, and business analysts who wish to clean and process data effectively.
By the end of this training, participants will be able to:
- Develop an effective data cleaning strategy.
- Implement useful tools for data cleaning.
- Get a result more efficiently.
- Learn and apply data cleaning best practices.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Requirements
- An understanding of data analytics concepts.
Audience
- Data Scientists
- Data Analysts
- Business Analysts
Course Outline
Introduction
Overview of Data Cleaning
- Why is Data Cleaning Important?
Case Study: When Big Data Is Dirty
Developing A Thorough Data Cleaning Strategy
Common Data Cleaning Tools
- Drake
- OpenRefine
- Pandas (for Python)
- Dplyr (for R)
Achieving High Data Integrity
- Complete
- Correct
- Accurate
- Relevant
- Consistent
Automating the Data Cleaning Process
Monitoring Your Data Cleaning System
Summary and Conclusion