Course Introduction

Intro to Data Analytics

Jordan Ayala

Introduction to Data Analytics & R Programming

Welcome!

Data Analysis in Action

Eviction KC Data Project

Data Analysis in Action

Eviction KC Data Project

A full scale data science workflow

  1. Ask an interesting question

  2. Get the data

  3. Explore the data

  4. Model the data

  5. Communicate and visualize the results

Data

  • Administrative data set

    • Web-scraping
    • Data cleaning
    • Data wrangling
  • Put into conversation with a standard reference data set, U.S. Census American Community Survey estimates and existing research.

The Scale of Evictions

From 2015-2019, an average of 35 renter households in the Kansas City metropolitan area (MO) received an eviction filing each day.

What can we say about evictions in the study area based on this chart?

In our study area, eviction filings peaked in 2018, increasing by 5.4% from 2015. The largest increase during this time period, eviction filings in the City of Kansas City, MO increased by 6.2%. In the state of Missouri, eviction filings increased by 4.5%.

Combing data sets for a better understanding

Kansas City metro (MO) renter households were listed as defendants on 28% of all Missouri eviction filings during this time period, even though only 1 in 5 Missouri renters were located in the KC metro area (MO).

What can we say about evictions in the study area based on this chart?

The concentration of evictions in the core of the metropolitan area (Kansas City, MO proper) is evident when comparing filing rates as a percentage of renter occupied housing units.

  • On average, 1 in 15 renter households (7%) living on the Missouri side of the Kansas City metropolitan area faced an eviction filing each year from 2015 to 2019.

  • In comparison, 1 in 12 renter households (8.1%) in the city of KCMO faced an eviction filing each year during the same time period

Combining data sets for a better understanding

The annual rate of eviction filings per renter households in the Kansas City metropolitan area (MO) is greater than the estimated eviction filing rates in Chicago, IL (3.7%) or Boston, MA (2.9%) and smaller than the estimated eviction filing rates in Washington, D.C. (11%) or Richmond, VA (11%).

Spatial distribution of evictions

The geographic concentration of formal eviction filings.

Analysis in context

Evictions are concentrated to the east of the racial dividing line of Troost Avenue and in eastern inner suburbs — with additional hot spots in northeastern Jackson County, northern Cass County, and the northernmost area of Kansas City, MO in Clay county. Troost Avenue marks a dividing line between Black and white majority neighborhoods in the city, which coincides with lower average life expectancy, income inequality, and unhealthy housing conditions.

Temporal distribution

Temporal distribution is the pattern or trend of a phenomenon over time.

What does this chart tell us about evictions throughout the year?

Analysis in context: The data alone do not explain the underlying reason for this summer peak

  • June, July, and August are known to be the busiest months for moving across the United States due to the school year schedule, longer daylight hours, flexibility of work during peak vacation months in the U.S., more college students moving for the school year, moving seasonal summer workers, and peak home buying from May through August.
  • 36% of annual average evictions in the study area, from 2015-2019, occurred between May and August. This time of year is the most likely to be the end of an annual lease, and when a property owner or management company will find interested renters. Seasonal tenants who might relocate for the winter may also play a role.

Some key findings

  • Overall, out of 63,252 cases heard during the period, there were 43,354 court-ordered evictions, and 76,808 residents were evicted between 2015-2019 in the study area.

  • More than two-thirds (71%) of court-ordered evictions are the result of default judgments, where a tenant is not present at the hearing.

  • While 86% of landlords were represented by attorneys, only 2.3% of defendants had attorney representation

Is the burden of eviction spread evenly across social groups in a city?

  • Why/why not?

Who is being evicted?

  • The burden of eviction is unevenly distributed
  • Eviction rates are systematically lower in areas where a higher share of the population is white.

Black or African American defendants were least likely to have their case dismissed and most likely to have to make a payment as a part of the court-ordered eviction compared to white and Latino/a/e defendants.

Exploratory regression analysis

An increased vacancy rate is associated with a seven percentage point increase in the eviction filing rate. Vacancy rate = more homes in an area are vacant as a percentage of total housing units.

  • This association suggests a possible relationship between neighborhood conditions in high vacancy areas and eviction rates.

  • Areas of Kansas City, MO with high vacancy rates are predominantly neighborhoods with poor housing and infrastructure conditions, a higher share of the non-white and immigrant population, and a history of disinvestment after white flight from the city in the mid-twentieth century.

In exploratory regression analysis we find that areas that have higher rents are associated with a higher eviction filing rate.

  • This suggests a possible relationship where, as neighborhood rents increase, more residents are displaced.

What is our role as data analysts?

What is our role as data analysts?

  • To tell the stories that data supports, but be sure that we bring out all the stories, all the relevant pieces, make all the connections.

  • The most important aspect of working with data is the communication we do about it – which requires accuracy and clarity.

  • As we’ll see, what we eventually want to do is write with data, to construct a story that data helps flesh out.

  • The data alone are useless, we have to think about it, probe in different ways to see what it can tell us.

All that good first day stuff

“There is nothing in the realm of work — no matter how interesting or exciting or desired — that does not entail, at some point, the experience of frustration, self-doubt, loneliness, and anxiety. Experiences that most of us (realistically, all of us) flee from, especially when we’re by ourselves. Our goal shouldn’t be to eliminate this discomfort. We need to teach students that it’s part of the process, and develop strategies for coping with it. But for students to really get that — to believe it, to feel it — they have to do the work.” (The End of the Take-Home Essay?” in The Chronicle of Higher Education; edited for brevity.)

Course FAQ

Q - What data science background does this course assume?
A - None.

Q - Is this an intro stat course?
A - While statistics \(\ne\) data science, they are very closely related and have tremendous of overlap. Hence, this course is a great way to get started with statistics. However, this course is not your typical high school statistics course.

Q - Will we be doing computing?
A - Yes.

Course FAQ

Q - Is this an intro CS course?
A - No, but many themes are shared.

Q - What computing language will we learn?
A - R.

Q: Why not language X?
A: We can talk about that some time if you want.

Software

Data science life cycle

# A tibble: 5 × 2
  date             season
  <chr>            <chr> 
1 23 January 2017  winter
2 4 March 2017     spring
3 14 June 2017     summer
4 1 September 2017 fall  
5 ...              ...   

Let’s dive in!

minecr.shinyapps.io/unvotes