Data Loading & Final Processing

owners_renters = read.csv("owners_renters.csv", header = TRUE)
home_sale_prices = read_excel("sales_prices.xlsx")

#last bit of pre-processing
library(tidyverse)
home_sale_prices = home_sale_prices %>%
                    rename(
                      price = ...2
                    )
#merge datasets
sales_tenancy = merge(home_sale_prices, owners_renters, by = "county")

Creating a new variable with the dplyr package:

#create new variable. percentage of home owners and percentage of home renters

sales_tenancy = sales_tenancy %>%
                  mutate(owner / total.households)

sales_tenancy = sales_tenancy %>%
      mutate(tenant / total.households)

sales_tenancy = sales_tenancy %>%
      rename("percent_renting" = "tenant/total.households", 
             "percent_owners" = "owner/total.households")


Bird’s Eye

image.png

As we can see, there is a fairly large right skew.

Correlation Analysis

Running an analysis between home ownership rates and housing prices to gain insights into where and why home ownership rates vary widely between counties.

library(ggpubr)
library(scales)
library(ggplot2)

The packages ggpubr, scales, and ggplot were used to investigate a possible causal relationship between high home prices and home ownership.

Visualising Total of Renters and Average Home Price

ggscatter(sales_tenancy, x = "tenant", y = "price", 
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "kendall",
          xlab = "Home Renters", ylab = "Average Sales Price of Home",
          xlim = c(0,20000))

As we can see, there is not a strong correlation between the amount of renters and home prices

As we can see, there is not a strong correlation between the amount of renters and home prices