This project was written during a special topic elective course available to upper division students called R and Data Engineering, taken under Dr. Carsten Lange at Cal Poly Pomona during the 2021 Winter intersession. This project was of personal interest to the author, it is not meant to showcase research methodologies, technical writing nor statistical inference. There are many areas of improvement that are needed, and the author intends to extend the project further using better programming conventions. Key areas of improvement include fixing the map shading, adding more observations for future forecasts and caching local copies of the Census Bureau tables. The author would also like to update the code to process the migration data using regular expressions instead of hard-coded column indices with the intent of creating a comparison tool for user selected years and states. As this class was taken before completing his econometrics course, the author would also like to include this processed data into future econometric models that could potentially explain the population movements. The author would also like to extend the project with simple forecasts to compare with future census data when available. It is the author’s hope that the project can provide a properly normalized data set before the pandemic with properly catalogued post pandemic figures. The following includes the written submission of the project as shown for 02/10/2022 generated using R markdown with a slight revision in formatting for clarity added. Please keep in mind that the purpose of the project was to showcase the use of R, the maps and mapdata libraries were not covered in the course. It is the author’s own admission that he is absolutely obsessed with maps and geographic data.
A curious empirical “fact” that has arisen within the locals of California has often been stated, that there is an increasing amount of out of state transplants. Over the years this colloquialism has been repeated with the amount of out of state resident’s population estimates varying wildly. I have always been curious to know just exactly what these figures are and how many people were really from out of state. Could it be true that more residents are choosing to settle into California than ever before? The focus of this research will be on the estimated migration from a previously marked state of residence as measured by the prior year the data was gathered. The analysis of the population estimates would only consist of a small subset of total interstate migration, it does not account for estimated legal or extralegal international immigration.
Below are the data sources used to create the migration tables along with overall population estimates from the U.S. Census Bureau. The estimated state migration tables were imported directly and processed using R:
Interstate migration data gathered from main index page here.
Please note that the margin of error provided for the interstate migration data was not used in any of the calculations, only the state estimates column was used.
The following data was also downloaded from the U.S. Census bureau using the table search tool for a simple data set containing total population estimates ordered by state and by year. Population estimates are based on the 2010 census.
The recommended pivot table for total state data was not downloaded, the pivot table was recreated in Excel under the Prep_Data sheet. This sheet was lightly processed by adding the region column and selecting the years 2015 through 2019 for future calculations and was placed in a ready to import sheet named Clean_Data.
library(readxl)
library(tidyverse)
library(rio)
library(dplyr)
library(ggplot2)
library(maps)
library(mapdata)
library(plotly)
ColumnOrder <- c("region","year","migration")
US_Pop_Total = import("C:/Users/Migue/Documents/GDrive_Sync/R//PEPPOP2019.PEPANNRES_data_with_overlays_2021-11-06T111859.xls", sheet = "Clean_Data") %>% as_tibble()
CensusImport2015 <- import("https://www2.census.gov/programs-surveys/demo/tables/geographic-mobility/2015/state-to-state-migration/State_to_State_Migrations_Table_2015.xls", col_names = FALSE)
CensusImport2019 <- import("https://www2.census.gov/programs-surveys/demo/tables/geographic-mobility/2019/state-to-state-migration/State_to_State_Migrations_Table_2019.xls", col_names = FALSE)
The libraries above are needed to process and import the data, note how on line #10 we create a data frame that sets the processed table column names to look more appropriate and have easier to use name references. The total U.S. Population Table is imported into excel from the local working directory, CensusImport2015 and CensusImport2019 are the working data frames for direct URL import, no file is needed.
Note the code itself is stupid and inefficient as it is repeated again for the year of 2019. The author could not figure out how to set variable names to be programmatically updated using a loop and properly setting the scope for references to global variables.
Line #1 first removes the merged columns that contain descriptor information for the census tables. Additionally the next line begins by selecting the row for California along with the beginning of the migration estimate columns starting with Alabama up to the last estimate column, Wyoming.
The majority of the data processing was working around the inconsistency of using the na.omit() command if ran immediately on the previously selected columns. This took a lot of trial and error as a simple na.omit() function before the assignment of column name headers would delete the state names and only leave behind the estimate data, row 16 in sample output below. Since a simple colnames function cannot be used if there are NULL values in the rows, the if statement was used to “skip” over the NULL cell values and force the column names to be assigned to the state names contained in row 1. This step was crucial in properly assigning the column names as shown in the next command.
#removing if else na check for some reason failing here
#if(is.na(CleanedImport)){print("skip rename")}else{
names(CleanedImport) <-CleanedImport[1,]#}
view(CleanedImport)
TempView1 <-CleanedImport[c(1:3),(1:5)]
head(TempView1)
## Alabama NA NA.1 Alaska NA.2
## 7 Alabama <NA> <NA> Alaska <NA>
## 8 Estimate MOE <NA> Estimate MOE
## 16 5273 1963 California 6022 2203
Then we can run the na.omit() command to remove all blank rows but not null columns, which for some reason also removes rows 1 and 2 entirely. This was the cause of many failures as the command would sometimes work without removing rows 1 & 2 and sometimes it would delete them, hence the command above to assign column names BEFORE Null removal. A majority of failures happened here with the inconsistent removal of Null values. There are better ways to complete this but with my limited experience with R, this workaround was consistent in its intended function. As of 2023 this bug has been fixed and is therefore commented out in line 2 above.
## Alabama NA NA.1 Alaska NA.2
## 16 5273 1963 California 6022 2203
## NA <NA> <NA> <NA> <NA> <NA>
## NA.1 <NA> <NA> <NA> <NA> <NA>
In dealing with the issues mentioned and shown above, line #7 assigns the state names to the column names for the import data, removing all blank columns as the na.omit() function did not do this. It is not well understood why this was the case, and the line was written by trial and error which was basically just a lucky fluke. Then we create a new data frame with the extracted state names from the import data, which will be useful for merging the longitude and latitude data provided by the maps library. Do note that on line #11, in order to merge the mapping data provided by the maps library, we need to append the state names in lower case as a final row. An interesting issue came on line #27, due to the quirks of transposing frames in R, the conversion of the population column was initially imported as a character, if converted to numerical data, transposing would have overwritten the data type back to string. Converting to numeric after the data frame was processed is the final key step to making the population data usable for calculations.
StateNames <- colnames(CleanedImport)
#The list has Null values that need to be removed
StateNames <- na.omit(StateNames)
CleanedImport <- CleanedImport[c(1),(StateNames)]
RegionNames <- tolower(StateNames)
#Removes California from region names
RegionNames <- RegionNames[c(1:4,6:51)]
#inserts the year of the data set
YearData <- rep(c(2015),each = 50)
#removing California from final set
CleanedImport <- CleanedImport[,!grepl("^California",names(CleanedImport))]
CleanedImport <- rbind(CleanedImport,RegionNames,YearData)
#sets row names for transposing data as the new column names
rownames(CleanedImport) <- c("migration", "region", "year")
TransposedFrame <- as_tibble(t(CleanedImport), stringsAsFactors = FALSE)
TransposedFrame[] <- lapply(TransposedFrame, type.convert, as.is = TRUE)
row.names(TransposedFrame) <- NULL
FinalExport <- TransposedFrame[,ColumnOrder]
#Main data frame used in subsequent calculations and assignment.
Export2015<-FinalExport
Finally in order to free up memory and keep things running smoothly, the data frames below are removed:
#clean up objects from memory
rm(CleanedImport)
rm(CensusImport2015)
rm(CensusImport2019)
rm(FinalExport)
rm(TransposedFrame)
rm(YearData)
rm(ColumnOrder)
rm(TempView1)
rm(TempView2)
The following bar graphs show the estimated count of migrants for the given year. This is meant as a quick overview for which states had the highest or lowest number of migrants and will be discussed in further detail under the Key Findings section. Please note that the charts are interactive, you can view detailed information on hover and selectively compare bars using the box and lasso select tools thanks to plotly.
By viewing these yearly bar graphs side-by-side one can kind of see the slight variations in the totals. Primarily what seems to stick out is the marked increase in the population amount that moved from Massachusetts and Maryland, there is a slight increase there as well for Georgia and Hawaii. There is a clear marked increase in the data, which will be discussed in further detail, is Virginia. Over the course of 2015 to 2019 Virginia had an initial 15,009 out-of-state movers which increased to 24,506. The total increase of 9,497 was the largest recorded for the period.
It is difficult to see but Minnesota also had a large increase in its population moving out of state during the measured periods, which in 2015 4,887 people left and increased to 8,951 in 2019.
In 2015 the total Interstate migration from Massachusetts was 12,620 which changed to 16,158.
In 2019 Georgia had a migration of 11,630 in 2015 up to 14,496 in 2019.
Maryland had an Interstate migration population of 9,981 in 2015 increasing to 11,775 in 2019.
Hawaii had 10,475 migrants in 2015 increasing to 11,985 in 2019.
The order of the mentioned states above is for the top 6 states in which the overall migration increased.
Interactive maps for 2015 and 2019 are shown with total migrants for the year appearing in the tooltip. This illustration is used to provide a quick glance at the regions where migration was highest compared to distance from California.
Seemingly not much changed in the years between 2015 and 2019, but if we take a closer look some interesting patterns begin to emerge. Interestingly enough, there seems to be a deficit of migrants in the upper Midwest and Southeastern regions. What could be the cause for the lowered figures from these areas?
What can actually be concluded after reviewing the data? A few interesting points seem to stand out, especially the counterclaim to the empirical statement of increasing out of state transplants “ruining” California, as it is colloquially stated. What makes this project interesting is that all of the figures are tabulated pre-Covid 2020, meaning that these calculations can provide a baseline for the massive impacts in population changes, of course normalized properly, due to the pandemic. Indeed, it will be interesting to see how the pandemic affected these figures.
Total migration into California for 2015:
## [1] 514477
Total migration into California for 2019
## [1] 480204
Difference:
## [1] -34273
Since 2015 out of state immigrants have been decreasing from 514,477 to 480,204 in 2019, a difference of 34,273 total.
So what was the percentage of people for 2015 California that were from out of state?
## pop2015
## 1 1.32
And how about for 2019?
## pop2019
## 1 1.22
What was the change of percentage over time from 2015 to 2019?
## pop2015
## 1 0.1
Simply stated, the overall percentage of population for California in 2015 that were out of state migrants was 1.32% and decreased in 2019 to 1.22%, a 0.10% difference. Although a small figure, the trend shown in the plots above and in the overall figures show that there has been a decrease!
So which states had the largest number of movers? The table below shows the top 10 states sorted by total migrants for 2015:
## # A tibble: 50 × 3
## region year migration
## <chr> <int> <int>
## 1 texas 2015 41713
## 2 new york 2015 36896
## 3 arizona 2015 34204
## 4 washington 2015 33131
## 5 illinois 2015 26189
## 6 florida 2015 25638
## 7 nevada 2015 25332
## 8 colorado 2015 21157
## 9 oregon 2015 20838
## 10 virginia 2015 15009
## # … with 40 more rows
Unsurprisingly high population states had the largest amount of movers along with the nearest neighbors to California.
For 2015 which states had the lowest amount of migration?
## # A tibble: 6 × 3
## region year migration
## <chr> <int> <int>
## 1 iowa 2015 1394
## 2 delaware 2015 1118
## 3 north dakota 2015 1035
## 4 vermont 2015 766
## 5 south dakota 2015 610
## 6 west virginia 2015 413
The top movers to California in 2019:
## # A tibble: 50 × 3
## region year migration
## <chr> <int> <int>
## 1 new york 2019 37567
## 2 texas 2019 37063
## 3 washington 2019 31882
## 4 arizona 2019 28226
## 5 nevada 2019 26433
## 6 virginia 2019 24506
## 7 illinois 2019 24085
## 8 florida 2019 22692
## 9 oregon 2019 17265
## 10 massachusetts 2019 16158
## # … with 40 more rows
Taking a closer look at what changed between these tables shows that Virginia had a significant increase in migrants, enough to move it from the 10th spot in 2015 to the 6th spot for 2019.
## # A tibble: 6 × 3
## region year migration
## <chr> <int> <int>
## 1 south dakota 2019 1345
## 2 wyoming 2019 1159
## 3 vermont 2019 784
## 4 north dakota 2019 710
## 5 new hampshire 2019 709
## 6 west virginia 2019 303
Unsurprisingly lower population states had the least amount of migrants, but West Virginia had the consistently lowest figures. How odd, given that their neighbor Virginia ended up doing the opposite. It would be interesting to understand the reasons for this curious outlier, but that is outside the scope of this project.
Looking more closely, we want to see which state had the largest increases and decreases for the measured period. Top movers to California by delta from 2015 to 2019:
## region delta_migration
## 46 virginia 9497
## 23 minnesota 4064
## 21 massachusetts 3530
## 10 georgia 2866
## 20 maryland 1794
## 11 hawaii 1510
And we can also see which states had the lowest movers from 2015 to 2019:
## region delta_migration
## 14 indiana -3727
## 35 ohio -3738
## 36 oklahoma -4603
## 43 texas -4650
## 3 arizona -5978
## 5 colorado -6072
What about by the largest respective state population proportion that moved in 2015?
## # A tibble: 6 × 5
## region year migration pop2015 percentagepop
## <chr> <int> <int> <dbl> <dbl>
## 1 nevada 2015 25332 2866939 0.884
## 2 alaska 2015 6022 737498 0.817
## 3 hawaii 2015 10475 1422052 0.737
## 4 district of columbia 2015 3908 675400 0.579
## 5 oregon 2015 20838 4015792 0.519
## 6 arizona 2015 34204 6829676 0.501
The least amount of migration by state population proportion in 2015:
## # A tibble: 6 × 5
## region year migration pop2015 percentagepop
## <chr> <int> <int> <dbl> <dbl>
## 1 michigan 2015 8852 9931715 0.0891
## 2 kentucky 2015 3781 4425976 0.0854
## 3 mississippi 2015 2539 2988471 0.0850
## 4 south dakota 2015 610 853988 0.0714
## 5 iowa 2015 1394 3120960 0.0447
## 6 west virginia 2015 413 1842050 0.0224
We can also see the state with the largest population proportion that moved in 2019:
## # A tibble: 6 × 5
## region year migration pop2019 percentagepop
## <chr> <int> <int> <dbl> <dbl>
## 1 nevada 2019 26433 3080156 0.858
## 2 hawaii 2019 11985 1415872 0.846
## 3 alaska 2019 5064 731545 0.692
## 4 district of columbia 2019 3075 705749 0.436
## 5 washington 2019 31882 7614893 0.419
## 6 oregon 2019 17265 4217737 0.409
And the least amount of proportion that moved in 2019:
## # A tibble: 6 × 5
## region year migration pop2019 percentagepop
## <chr> <int> <int> <dbl> <dbl>
## 1 alabama 2019 3310 4903185 0.0675
## 2 michigan 2019 6406 9986857 0.0641
## 3 iowa 2019 1956 3155070 0.0620
## 4 kentucky 2019 2606 4467673 0.0583
## 5 new hampshire 2019 709 1359711 0.0521
## 6 west virginia 2019 303 1792147 0.0169
So what do all these tables actually mean? Well for starters Colorado, Arizona, Texas, Oklahoma and Ohio actually decreased their choice from moving to California, whether it was for the traditionally high cost of living or other reasons.
Meanwhile, Virginia, Minnesota, Massachusetts, Georgia, Maryland and Hawaii had the highest change in people choosing to move to California as mentioned in the plotted graphs section.
Adjusted for population proportion for the respective years measured, it seems like Alaska Nevada and Hawaii had the consistently highest number of movers staying within the top three spots.
Simply stated, “West Coast transplants” have actually been decreasing, albeit slowly, over the years going against empirical observations. It will be interesting to calculate the population estimates post Covid and this template would hopefully provide a baseline for comparison. Using regression, we could estimate the decreases if these trends would have continued unimpeded, and use the prediction as a factor for comparing against the Covid impacts. As of 2022 under quick empirical “observations” there has been a large exodus out of California due to the high cost of living, Covid unemployment and potentially out of state migrants being homesick. It remains to be seen, and calculated, what the impacts of the pandemic have done to interstate migration.