Getting and Cleaning Data

About This Course

Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.

What You’ll Learn

Understand common data storage systems

Apply data cleaning basics to make data “tidy”

Apply data cleaning basics to make data “tidy”

Obtain usable data from the web, APIs, and databases

                                                 Skills You’ll Gain

Data Manipulation

Regular Expression (REGEX)


R Programming

Data Cleansing

What you will learn from this course

Module 1 – Obtaining data motivation

 Raw and Processed Data

 Components of Tidy Data

 Downloading Files

 Reading Local Files

 Reading Excel Files

 Reading XML

 Reading JSON

 The data. Table Package

Module 2 – Data storage systems

Reading from MySQL

Reading from HDF5

Reading from The Web

Reading From APIs

Reading From Other Sources

Module 3 – Organizing , merging, and managing Data

Summarizing Data

Creating New Variables

Reshaping Data

Managing Data Frames with dplyr – Introduction

Managing Data Frames with dplyr – Basic Tools

Merging Data

Module 4 – Text and Date manipulation in R

Editing text variables

Regular Expressions

Regular Expressions

Working with Dates

Data Resources