Doing Data Analysis

Intro to Data Analytics

Data Analysis

  • Key steps:
    • Define/refine the question
    • Explore the data
    • Develop (formal/statistical) models
    • Interpret results
    • Communicate results

Six types of research questions

  • Descriptive: summarize a characteristic of a set of data

  • Exploratory: analyze to see if there are patterns, trends, or relationships between variables (hypothesis generating)

  • Inferential: analyze patterns, trends, or relationships in representative data from a population

  • Predictive: make predictions for individuals or groups of individuals

  • Causal: whether changing one factor will change another factor, on average, in a population

  • Mechanistic: explore “how” as opposed to whether

Six types of research questions

  • Descriptive - summarize characteristics
  • Exploratory - analyze patterns
  • Inferential - infer properties (w/statistical tests)
  • Predictive - make predictions
  • Causal - does a change b?
  • Mechanistic - how does a change b?

Ex: COVID-19 and Vitamin D

  • Descriptive: frequency of hospitalizations due to COVID-19 in a set of data collected from a group of individuals
  • Exploratory: examine relationships between a range of dietary factors and COVID-19 hospitalizations
  • Inferential: examine whether any relationship between taking Vitamin D supplements and COVID-19 hospitalizations found in the sample hold for the population at large
  • Predictive: what types of people will take Vitamin D supplements during the next year
  • Causal: whether people with COVID-19 who were randomly assigned to take Vitamin D supplements or those who were not are hospitalized
  • Mechanistic: how increased vitamin D intake leads to a reduction in the number of viral illnesses

Starting a data analysis

  • Do you have the data to answer the research questions?
  • Are there confounding variables?
  • Was there any bias in the data collection?

Exploratory data analysis steps

  • Generate question
  • Read in data
  • Check data dimensions
  • Head/tail of data
  • Plot (a bunch)
  • Validate with outside data
  • Iterate

Communicating for your audience

  • Avoid:
    • Jargon, uninterpreted results, lengthy output
  • Pay attention to:
    • Organization, presentation, flow
  • Don’t forget about:
    • Code style, coding best practices, meaningful commits
  • Be open to:
    • Suggestions, feedback, taking (calculated) risks