1 Preface

This project was written during a special topic elective course available to upper division students called R and Data Engineering, taken under Dr. Carsten Lange at Cal Poly Pomona during the 2021 Winter intersession. This project was of personal interest to the author, it is not meant to showcase research methodologies, technical writing nor statistical inference. There are many areas of improvement that are needed, and the author intends to extend the project further using better programming conventions. Key areas of improvement include fixing the map shading, adding more observations for future forecasts and caching local copies of the Census Bureau tables. The author would also like to update the code to process the migration data using regular expressions instead of hard-coded column indices with the intent of creating a comparison tool for user selected years and states. As this class was taken before completing his econometrics course, the author would also like to include this processed data into future econometric models that could potentially explain the population movements. The author would also like to extend the project with simple forecasts to compare with future census data when available. It is the author’s hope that the project can provide a properly normalized data set before the pandemic with properly catalogued post pandemic figures. The following includes the written submission of the project as shown for 02/10/2022 generated using R markdown with a slight revision in formatting for clarity added. Please keep in mind that the purpose of the project was to showcase the use of R, the maps and mapdata libraries were not covered in the course. It is the author’s own admission that he is absolutely obsessed with maps and geographic data.

2 Introduction

A curious empirical “fact” that has arisen within the locals of California has often been stated, that there is an increasing amount of out of state transplants. Over the years this colloquialism has been repeated with the amount of out of state resident’s population estimates varying wildly. I have always been curious to know just exactly what these figures are and how many people were really from out of state. Could it be true that more residents are choosing to settle into California than ever before? The focus of this research will be on the estimated migration from a previously marked state of residence as measured by the prior year the data was gathered. The analysis of the population estimates would only consist of a small subset of total interstate migration, it does not account for estimated legal or extralegal international immigration.

3 Data Sources

Below are the data sources used to create the migration tables along with overall population estimates from the U.S. Census Bureau. The estimated state migration tables were imported directly and processed using R:

Interstate migration data gathered from main index page here.

Years: 2015, 2019

Please note that the margin of error provided for the interstate migration data was not used in any of the calculations, only the state estimates column was used.

The following data was also downloaded from the U.S. Census bureau using the table search tool for a simple data set containing total population estimates ordered by state and by year. Population estimates are based on the 2010 census.

The recommended pivot table for total state data was not downloaded, the pivot table was recreated in Excel under the Prep_Data sheet. This sheet was lightly processed by adding the region column and selecting the years 2015 through 2019 for future calculations and was placed in a ready to import sheet named Clean_Data.

4 Code Review and Data Processing

library(readxl)
library(tidyverse)
library(rio)
library(dplyr)
library(ggplot2)
library(maps)
library(mapdata)
library(plotly)

ColumnOrder <- c("region","year","migration")

US_Pop_Total = import("C:/Users/Migue/Documents/GDrive_Sync/R//PEPPOP2019.PEPANNRES_data_with_overlays_2021-11-06T111859.xls", sheet = "Clean_Data") %>% as_tibble()
CensusImport2015 <- import("https://www2.census.gov/programs-surveys/demo/tables/geographic-mobility/2015/state-to-state-migration/State_to_State_Migrations_Table_2015.xls", col_names = FALSE)
CensusImport2019 <- import("https://www2.census.gov/programs-surveys/demo/tables/geographic-mobility/2019/state-to-state-migration/State_to_State_Migrations_Table_2019.xls", col_names = FALSE)

The libraries above are needed to process and import the data, note how on line #10 we create a data frame that sets the processed table column names to look more appropriate and have easier to use name references. The total U.S. Population Table is imported into excel from the local working directory, CensusImport2015 and CensusImport2019 are the working data frames for direct URL import, no file is needed.

Note the code itself is stupid and inefficient as it is repeated again for the year of 2019. The author could not figure out how to set variable names to be programmatically updated using a loop and properly setting the scope for references to global variables.

Line #1 first removes the merged columns that contain descriptor information for the census tables. Additionally the next line begins by selecting the row for California along with the beginning of the migration estimate columns starting with Alabama up to the last estimate column, Wyoming.

CleanedImport<- tail(CensusImport2015, -6) 
CleanedImport <- CleanedImport[c(1,2,10),(10:121)]

The majority of the data processing was working around the inconsistency of using the na.omit() command if ran immediately on the previously selected columns. This took a lot of trial and error as a simple na.omit() function before the assignment of column name headers would delete the state names and only leave behind the estimate data, row 16 in sample output below. Since a simple colnames function cannot be used if there are NULL values in the rows, the if statement was used to “skip” over the NULL cell values and force the column names to be assigned to the state names contained in row 1. This step was crucial in properly assigning the column names as shown in the next command.

#removing if else na check for some reason failing here
#if(is.na(CleanedImport)){print("skip rename")}else{
  names(CleanedImport) <-CleanedImport[1,]#}
view(CleanedImport)
TempView1 <-CleanedImport[c(1:3),(1:5)]
head(TempView1)
##     Alabama   NA       NA.1   Alaska NA.2
## 7   Alabama <NA>       <NA>   Alaska <NA>
## 8  Estimate  MOE       <NA> Estimate  MOE
## 16     5273 1963 California     6022 2203

Then we can run the na.omit() command to remove all blank rows but not null columns, which for some reason also removes rows 1 and 2 entirely. This was the cause of many failures as the command would sometimes work without removing rows 1 & 2 and sometimes it would delete them, hence the command above to assign column names BEFORE Null removal. A majority of failures happened here with the inconsistent removal of Null values. There are better ways to complete this but with my limited experience with R, this workaround was consistent in its intended function. As of 2023 this bug has been fixed and is therefore commented out in line 2 above.

CleanedImport <- na.omit(CleanedImport)
TempView2 <-CleanedImport[c(1:3),(1:5)]
head(TempView2)
##      Alabama   NA       NA.1 Alaska NA.2
## 16      5273 1963 California   6022 2203
## NA      <NA> <NA>       <NA>   <NA> <NA>
## NA.1    <NA> <NA>       <NA>   <NA> <NA>

In dealing with the issues mentioned and shown above, line #7 assigns the state names to the column names for the import data, removing all blank columns as the na.omit() function did not do this. It is not well understood why this was the case, and the line was written by trial and error which was basically just a lucky fluke. Then we create a new data frame with the extracted state names from the import data, which will be useful for merging the longitude and latitude data provided by the maps library. Do note that on line #11, in order to merge the mapping data provided by the maps library, we need to append the state names in lower case as a final row. An interesting issue came on line #27, due to the quirks of transposing frames in R, the conversion of the population column was initially imported as a character, if converted to numerical data, transposing would have overwritten the data type back to string. Converting to numeric after the data frame was processed is the final key step to making the population data usable for calculations.

StateNames <- colnames(CleanedImport)

#The list has Null values that need to be removed
StateNames <- na.omit(StateNames)


CleanedImport <- CleanedImport[c(1),(StateNames)]



RegionNames <- tolower(StateNames)

#Removes California from region names
RegionNames <- RegionNames[c(1:4,6:51)]

#inserts the year of the data set 
YearData <- rep(c(2015),each = 50)

#removing California from final set
CleanedImport <- CleanedImport[,!grepl("^California",names(CleanedImport))]

CleanedImport <- rbind(CleanedImport,RegionNames,YearData)

#sets row names for transposing data as the new column names
rownames(CleanedImport) <- c("migration", "region", "year")

TransposedFrame <- as_tibble(t(CleanedImport), stringsAsFactors = FALSE)
TransposedFrame[] <- lapply(TransposedFrame, type.convert, as.is = TRUE)
row.names(TransposedFrame) <- NULL
FinalExport <- TransposedFrame[,ColumnOrder]

#Main data frame used in subsequent calculations and assignment.
Export2015<-FinalExport

Finally in order to free up memory and keep things running smoothly, the data frames below are removed:

#clean up objects from memory
rm(CleanedImport)
rm(CensusImport2015)
rm(CensusImport2019)
rm(FinalExport)
rm(TransposedFrame)
rm(YearData)
rm(ColumnOrder)
rm(TempView1)
rm(TempView2)

5 Key Plots:

The following bar graphs show the estimated count of migrants for the given year. This is meant as a quick overview for which states had the highest or lowest number of migrants and will be discussed in further detail under the Key Findings section. Please note that the charts are interactive, you can view detailed information on hover and selectively compare bars using the box and lasso select tools thanks to plotly.

By viewing these yearly bar graphs side-by-side one can kind of see the slight variations in the totals. Primarily what seems to stick out is the marked increase in the population amount that moved from Massachusetts and Maryland, there is a slight increase there as well for Georgia and Hawaii. There is a clear marked increase in the data, which will be discussed in further detail, is Virginia. Over the course of 2015 to 2019 Virginia had an initial 15,009 out-of-state movers which increased to 24,506. The total increase of 9,497 was the largest recorded for the period.

It is difficult to see but Minnesota also had a large increase in its population moving out of state during the measured periods, which in 2015 4,887 people left and increased to 8,951 in 2019.

In 2015 the total Interstate migration from Massachusetts was 12,620 which changed to 16,158.

In 2019 Georgia had a migration of 11,630 in 2015 up to 14,496 in 2019.

Maryland had an Interstate migration population of 9,981 in 2015 increasing to 11,775 in 2019.

Hawaii had 10,475 migrants in 2015 increasing to 11,985 in 2019.

The order of the mentioned states above is for the top 6 states in which the overall migration increased.

6 Maps Overview

Interactive maps for 2015 and 2019 are shown with total migrants for the year appearing in the tooltip. This illustration is used to provide a quick glance at the regions where migration was highest compared to distance from California.

Seemingly not much changed in the years between 2015 and 2019, but if we take a closer look some interesting patterns begin to emerge. Interestingly enough, there seems to be a deficit of migrants in the upper Midwest and Southeastern regions. What could be the cause for the lowered figures from these areas?

7 Key Findings

What can actually be concluded after reviewing the data? A few interesting points seem to stand out, especially the counterclaim to the empirical statement of increasing out of state transplants “ruining” California, as it is colloquially stated. What makes this project interesting is that all of the figures are tabulated pre-Covid 2020, meaning that these calculations can provide a baseline for the massive impacts in population changes, of course normalized properly, due to the pandemic. Indeed, it will be interesting to see how the pandemic affected these figures.

Total migration into California for 2015:

## [1] 514477

Total migration into California for 2019

## [1] 480204

Difference:

## [1] -34273

Since 2015 out of state immigrants have been decreasing from 514,477 to 480,204 in 2019, a difference of 34,273 total.

So what was the percentage of people for 2015 California that were from out of state?

##   pop2015
## 1    1.32

And how about for 2019?

##   pop2019
## 1    1.22

What was the change of percentage over time from 2015 to 2019?

##   pop2015
## 1     0.1

Simply stated, the overall percentage of population for California in 2015 that were out of state migrants was 1.32% and decreased in 2019 to 1.22%, a 0.10% difference. Although a small figure, the trend shown in the plots above and in the overall figures show that there has been a decrease!

So which states had the largest number of movers? The table below shows the top 10 states sorted by total migrants for 2015:

## # A tibble: 50 × 3
##    region      year migration
##    <chr>      <int>     <int>
##  1 texas       2015     41713
##  2 new york    2015     36896
##  3 arizona     2015     34204
##  4 washington  2015     33131
##  5 illinois    2015     26189
##  6 florida     2015     25638
##  7 nevada      2015     25332
##  8 colorado    2015     21157
##  9 oregon      2015     20838
## 10 virginia    2015     15009
## # … with 40 more rows

Unsurprisingly high population states had the largest amount of movers along with the nearest neighbors to California.

For 2015 which states had the lowest amount of migration?

## # A tibble: 6 × 3
##   region         year migration
##   <chr>         <int>     <int>
## 1 iowa           2015      1394
## 2 delaware       2015      1118
## 3 north dakota   2015      1035
## 4 vermont        2015       766
## 5 south dakota   2015       610
## 6 west virginia  2015       413

The top movers to California in 2019:

## # A tibble: 50 × 3
##    region         year migration
##    <chr>         <int>     <int>
##  1 new york       2019     37567
##  2 texas          2019     37063
##  3 washington     2019     31882
##  4 arizona        2019     28226
##  5 nevada         2019     26433
##  6 virginia       2019     24506
##  7 illinois       2019     24085
##  8 florida        2019     22692
##  9 oregon         2019     17265
## 10 massachusetts  2019     16158
## # … with 40 more rows

Taking a closer look at what changed between these tables shows that Virginia had a significant increase in migrants, enough to move it from the 10th spot in 2015 to the 6th spot for 2019.

## # A tibble: 6 × 3
##   region         year migration
##   <chr>         <int>     <int>
## 1 south dakota   2019      1345
## 2 wyoming        2019      1159
## 3 vermont        2019       784
## 4 north dakota   2019       710
## 5 new hampshire  2019       709
## 6 west virginia  2019       303

Unsurprisingly lower population states had the least amount of migrants, but West Virginia had the consistently lowest figures. How odd, given that their neighbor Virginia ended up doing the opposite. It would be interesting to understand the reasons for this curious outlier, but that is outside the scope of this project.

Looking more closely, we want to see which state had the largest increases and decreases for the measured period. Top movers to California by delta from 2015 to 2019:

##           region delta_migration
## 46      virginia            9497
## 23     minnesota            4064
## 21 massachusetts            3530
## 10       georgia            2866
## 20      maryland            1794
## 11        hawaii            1510

And we can also see which states had the lowest movers from 2015 to 2019:

##      region delta_migration
## 14  indiana           -3727
## 35     ohio           -3738
## 36 oklahoma           -4603
## 43    texas           -4650
## 3   arizona           -5978
## 5  colorado           -6072

What about by the largest respective state population proportion that moved in 2015?

## # A tibble: 6 × 5
##   region                year migration pop2015 percentagepop
##   <chr>                <int>     <int>   <dbl>         <dbl>
## 1 nevada                2015     25332 2866939         0.884
## 2 alaska                2015      6022  737498         0.817
## 3 hawaii                2015     10475 1422052         0.737
## 4 district of columbia  2015      3908  675400         0.579
## 5 oregon                2015     20838 4015792         0.519
## 6 arizona               2015     34204 6829676         0.501

The least amount of migration by state population proportion in 2015:

## # A tibble: 6 × 5
##   region         year migration pop2015 percentagepop
##   <chr>         <int>     <int>   <dbl>         <dbl>
## 1 michigan       2015      8852 9931715        0.0891
## 2 kentucky       2015      3781 4425976        0.0854
## 3 mississippi    2015      2539 2988471        0.0850
## 4 south dakota   2015       610  853988        0.0714
## 5 iowa           2015      1394 3120960        0.0447
## 6 west virginia  2015       413 1842050        0.0224

We can also see the state with the largest population proportion that moved in 2019:

## # A tibble: 6 × 5
##   region                year migration pop2019 percentagepop
##   <chr>                <int>     <int>   <dbl>         <dbl>
## 1 nevada                2019     26433 3080156         0.858
## 2 hawaii                2019     11985 1415872         0.846
## 3 alaska                2019      5064  731545         0.692
## 4 district of columbia  2019      3075  705749         0.436
## 5 washington            2019     31882 7614893         0.419
## 6 oregon                2019     17265 4217737         0.409

And the least amount of proportion that moved in 2019:

## # A tibble: 6 × 5
##   region         year migration pop2019 percentagepop
##   <chr>         <int>     <int>   <dbl>         <dbl>
## 1 alabama        2019      3310 4903185        0.0675
## 2 michigan       2019      6406 9986857        0.0641
## 3 iowa           2019      1956 3155070        0.0620
## 4 kentucky       2019      2606 4467673        0.0583
## 5 new hampshire  2019       709 1359711        0.0521
## 6 west virginia  2019       303 1792147        0.0169

So what do all these tables actually mean? Well for starters Colorado, Arizona, Texas, Oklahoma and Ohio actually decreased their choice from moving to California, whether it was for the traditionally high cost of living or other reasons.

Meanwhile, Virginia, Minnesota, Massachusetts, Georgia, Maryland and Hawaii had the highest change in people choosing to move to California as mentioned in the plotted graphs section.

Adjusted for population proportion for the respective years measured, it seems like Alaska Nevada and Hawaii had the consistently highest number of movers staying within the top three spots.

8 Summary

Simply stated, “West Coast transplants” have actually been decreasing, albeit slowly, over the years going against empirical observations. It will be interesting to calculate the population estimates post Covid and this template would hopefully provide a baseline for comparison. Using regression, we could estimate the decreases if these trends would have continued unimpeded, and use the prediction as a factor for comparing against the Covid impacts. As of 2022 under quick empirical “observations” there has been a large exodus out of California due to the high cost of living, Covid unemployment and potentially out of state migrants being homesick. It remains to be seen, and calculated, what the impacts of the pandemic have done to interstate migration.