What is Exploratory Data Analysis?
Bard College | Introduction to Data Analytics
Get to know your data
What is the structure of the data?
Patterns and relationships within and between variables
Identify errors, outliers
Identify what matters for further exploration and modeling
Univariate analysis
Numerical variables
Categorical variables
Time and space
Time series line plots
Spatial mapping
Multivariate analysis
Correlation matrix
Principal components analysis
Spatial interaction and dependence tests
Time series (ARIMA, etc.)
Core EDA steps
- Understand the data
- Data source/prodcution: Who produced the data? Why?
- Data structure and variables included (numerical, categorical, text)
- Any limitations identifiable at this stage?
- Import and evaluate
- Load data into R
- Observations and variables: what rows represent and what is measured?
- Missing values?
Core EDA steps
- Identify missing data
- Why is the data missing?
- Remove or fill in missing values?
- How to fill in missing values (imputation methods like K-nearest neighbors, etc.)
- How might missing data affect analysis?
- Summarize variables
- Distribution plots
- Central tendency
- Variation: Identify spread or variation (standard deviation, box plots)
- Identify outliers and potential errors (plots and filtering)
Core EDA steps
- Wrangle/transform dataset
- Set factor levels
- Scaling and normalizing
- standardize variables
- Transform using log scale as needed
- Create new variables (derived variables)
- Group or aggregate to create new data
- Explore relationships
- Numerical
- Scatter plots and side-by-side box plots (violin plots)
- Correlation coefficients (e.g., Pearson, Spearman)
- Correlation matrix
- Categorical: Bar plots and frequency tables (
count(), etc.)
Core EDA steps
- Deal with outliers
- Use your understanding or research on the topic to undertand which values make sense for the variables in your dataset
- Explore the interquartile range (the box in a box plot), Z-scores
- Remove outliers? Interpolate?
- Communicate!
- Provide context. Be critical! Ask why?
- What are your key findings?
- What evidence is needed to communicate your findings?
- What are the limitations of your analysis?
- What should we explore next? What questions should we be asking of the data now?