Visualizing relationships five different ways

One data set, Five visualizations (technically two data sets)

Before you do anything at all…

Update your software

Make sure you have the latest versions of R and RStudio installed.

A lot of these new tools we’re using depend on the latest builds of R and RStudio to work correctly.

No, seriously, do it.

Install packages

Copy and paste the following lines of code into your RStudio console.

# Checking if the packages you need are installed -- if not, it will install for you
packages <- c("tidyverse", "usethis", "devtools", 
              "remotes", "lubridate", "knitr", 
              "janitor", "geofacet", "ggrepel", 
              "png", "gifski", "sf", "tigris", 
              "cowplot", "biscale", "ggtext")

if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())), 
                   repos = "https://cran.us.r-project.org")  
}

devtools::install_github('wpinvestigative/arcos')

usethis::use_course("https://github.com/r-journalism/nicar-2023-fancier-viz/archive/master.zip")

The last line of code you ran will walk you through installing the class materials to your desktop. Select yes at the right points.

When you’re done downloading the files, you can move on to geofacet.

Later on you can follow these steps on how to download and transform the data used in these walkthroughs below.

Data for the walkthroughs

The opioid transaction data from the DEA and opioid deaths data from the CDC needed for these walkthroughs have already been downloaded and transformed. The steps below exist for transparency.

Preparing state data

Get annual opioid sales data from ARCOS api

The Washington Post published a significant portion of a database that tracks the path of every opioid pain pill, from manufacturer to pharmacy, in the United States between 2006 and 2014. The data below is pulled from an API that the Post created to make it easier to query the data.

We’ll pull opioid sales data by year.

library(tidyverse)
library(knitr)
library(janitor)

devtools::install_github('wpinvestigative/arcos')
library(arcos)

states <- combined_buyer_annual(key="WaPo") %>% 
  clean_names()

states %>% 
  head() %>% 
  kable()

buyer_dea_no	buyer_bus_act	buyer_county	buyer_state	year	dosage_unit
A90777889	RETAIL PHARMACY	KINGS	NY	2006	110700
A90777889	RETAIL PHARMACY	KINGS	NY	2007	119100
A90777889	RETAIL PHARMACY	KINGS	NY	2008	100300
A90777889	RETAIL PHARMACY	KINGS	NY	2009	92630
A90777889	RETAIL PHARMACY	KINGS	NY	2010	65860
A90777889	RETAIL PHARMACY	KINGS	NY	2011	72030

Transform the data

Next, we’ll summarize the data by state and year.

state_names <- c(state.name, "District of Columbia")
state_abbr <- c(state.abb, "DC")

state_names <- data.frame(state=state_names, buyer_state=state_abbr)

annual_states <- states %>% 
  group_by(buyer_state, year) %>% 
  summarize(pills=sum(dosage_unit)) %>% 
  left_join(state_names) %>% 
  filter(!is.na(state))

annual_states %>% 
  head() %>% 
  kable()

buyer_state	year	pills	state
AK	2006	15667010	Alaska
AK	2007	17272433	Alaska
AK	2008	18707811	Alaska
AK	2009	20160949	Alaska
AK	2010	21012703	Alaska
AK	2011	22444775	Alaska

Get annual opioid overdose data from the CDC

NCHS data is compiled in the CDC WONDER online tool. To retrieve these data from the tool:

• Select the Multiple Cause of Death (Detailed Mortality) query system; • Select table layout (for example, by year, state, county); and • Supply the appropriate underlying codes (X and Y code) and contributing codes (T codes)

X40 - X44
X60 - X64
X85
Y10 - Y14
T40.0 - T40.4, T40.6

The data imported below was exported from the CDC Wonder data portal.

state_cdc <- read_tsv("data/od_deaths_state.csv") %>% 
  filter(is.na(Notes)) %>% 
  clean_names()

state_cdc %>% 
  head() %>% 
  kable()

notes	state	state_code	year	year_code	deaths	population	crude_rate
NA	Alabama	01	2006	2006	124	4628981	2.7
NA	Alabama	01	2007	2007	165	4672840	3.5
NA	Alabama	01	2008	2008	185	4718206	3.9
NA	Alabama	01	2009	2009	206	4757938	4.3
NA	Alabama	01	2010	2010	187	4779736	3.9
NA	Alabama	01	2011	2011	176	4802740	3.7

Join the state data

Bringing together the sales data with the deaths data.

state_joined <- annual_states %>% 
  left_join(state_cdc) %>% 
  select(-notes, -state_code, -year_code)

Joining with `by = join_by(year, state)`

state_joined %>% 
  head() %>% 
  kable()

buyer_state	year	pills	state	deaths	population	crude_rate
AK	2006	15667010	Alaska	29	675302	4.3
AK	2007	17272433	Alaska	15	680300	Unreliable
AK	2008	18707811	Alaska	88	687455	12.8
AK	2009	20160949	Alaska	90	698895	12.9
AK	2010	21012703	Alaska	62	710231	8.7
AK	2011	22444775	Alaska	66	722718	9.1

Wrangle the data

Let’s calculate rates.

Additionally, let’s add in state categories and divisions.

Then, we’ll save it.

state_joined <- state_joined %>% 
  mutate(pills_per_person=round(pills/population,1)) %>% 
  mutate(death_per_1m=round(deaths/population*1000000,1)) %>% 
 # rename(death_rate=crude_rate) %>% 
  pivot_longer(cols=c("death_per_1m", "pills_per_person"),
               names_to="type",
               values_to="rate")

regions <- read_csv("https://github.com/cphalpert/census-regions/raw/master/us%20census%20bureau%20regions%20and%20divisions.csv") %>% 
  clean_names()

Rows: 51 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): State, State Code, Region, Division

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

state_joined <- left_join(state_joined, regions)

Joining with `by = join_by(state)`

write_csv(state_joined, "data/opioids_states.csv", na="")

County data annual

Let’s do the same thing but with county levle data.

Get annual opioid sales data from ARCOS api

Once again, we’ll pull from the Post ARCOS api.

pharm_c <- summarized_county_annual(key="WaPo") %>% 
  clean_names()

pharm_c %>% 
  head() %>% 
  kable()

buyer_county	buyer_state	year	count	dosage_unit	countyfips
ABBEVILLE	SC	2006	877	363620	45001
ABBEVILLE	SC	2007	908	402940	45001
ABBEVILLE	SC	2008	871	424590	45001
ABBEVILLE	SC	2009	930	467230	45001
ABBEVILLE	SC	2010	1197	539280	45001
ABBEVILLE	SC	2011	1327	566560	45001

Transform the data

Get annual opioid overdose data from the CDC

Followed the same steps as we did for state data but for counties.

county_cdc <- read_tsv("data/od_deaths_county.csv") %>% 
  filter(is.na(Notes)) %>% 
  clean_names() %>% 
  rename(countyfips=county_code)

county_cdc %>% 
  head() %>% 
  kable()

notes	state	state_code	county	countyfips	year	year_code	deaths	population	crude_rate	age_adjusted_rate
NA	Alabama	01	Baldwin County, AL	01003	2006	2006	12	168121	Unreliable	Unreliable
NA	Alabama	01	Baldwin County, AL	01003	2007	2007	14	172404	Unreliable	Unreliable
NA	Alabama	01	Baldwin County, AL	01003	2008	2008	14	175827	Unreliable	Unreliable
NA	Alabama	01	Baldwin County, AL	01003	2009	2009	10	179406	Unreliable	Unreliable
NA	Alabama	01	Baldwin County, AL	01003	2010	2010	10	182265	Unreliable	Unreliable
NA	Alabama	01	Baldwin County, AL	01003	2011	2011	11	186717	Unreliable	Unreliable

Join the county data

county_joined <- pharm_c %>% 
  full_join(county_cdc) %>% 
  select(-notes, -state_code, -year_code, -age_adjusted_rate)

Joining with `by = join_by(year, countyfips)`

county_joined %>% 
  head() %>% 
  kable()

buyer_county	buyer_state	year	count	dosage_unit	countyfips	state	county	deaths	population	crude_rate
ABBEVILLE	SC	2006	877	363620	45001	NA	NA	NA	NA	NA
ABBEVILLE	SC	2007	908	402940	45001	NA	NA	NA	NA	NA
ABBEVILLE	SC	2008	871	424590	45001	NA	NA	NA	NA	NA
ABBEVILLE	SC	2009	930	467230	45001	NA	NA	NA	NA	NA
ABBEVILLE	SC	2010	1197	539280	45001	NA	NA	NA	NA	NA
ABBEVILLE	SC	2011	1327	566560	45001	NA	NA	NA	NA	NA

Wrangle the data

Calculating rates and saving the data.

# first have to bring in accurate population counts for counties annually
county_pops <- county_population(key = "WaPo") %>% 
  select(countyfips, year, population)

county_joined <- county_joined %>% 
  select(-population) %>% 
  left_join(county_pops) %>% 
  mutate(pills_per_person=round(dosage_unit/population,1)) %>% 
  mutate(death_per_100k=round(deaths/population*100000,1)) %>% 
 # rename(death_rate=crude_rate) %>% 
  pivot_longer(cols=c("death_per_100k", "pills_per_person"),
               names_to="type",
               values_to="rate")

Joining with `by = join_by(year, countyfips)`

write_csv(county_joined, "data/opioids_counties.csv", na="")

County data combined

Let’s do it one more time but for all deaths between 2006 and 2014 combined.

Get annual opioid sales data from ARCOS api

This time we’ll summarize it.

pharm_c <- summarized_county_annual(key="WaPo") %>% 
  clean_names()

pharm_c_combined <- pharm_c %>% 
  group_by(buyer_county, buyer_state, countyfips) %>% 
  summarize(count=sum(count, na.rm=T),
            dosage_unit=sum(dosage_unit, na.rm=T))

`summarise()` has grouped output by 'buyer_county', 'buyer_state'. You can
override using the `.groups` argument.

pharm_c_combined %>% 
  head() %>% 
  kable()

buyer_county	buyer_state	countyfips	count	dosage_unit
ABBEVILLE	SC	45001	10749	4591000
ACADIA	LA	22001	55952	22197708
ACCOMACK	VA	51001	19536	6660470
ADA	ID	16001	290484	136806828
ADAIR	IA	19001	5752	1783165
ADAIR	KY	21001	21985	8783100

Get annual opioid overdose data from the CDC

county_cdc <- read_tsv("data/od_deaths_counties_combined.csv") %>% 
  filter(is.na(Notes)) %>% 
  clean_names() %>% 
  rename(countyfips=county_code)

county_cdc %>% 
  head() %>% 
  kable()

notes	state	state_code	county	countyfips	deaths	population	crude_rate
NA	Alabama	01	Baldwin County, AL	01003	105	1651181	6.4
NA	Alabama	01	Bibb County, AL	01007	11	203479	Unreliable
NA	Alabama	01	Blount County, AL	01009	48	514537	9.3
NA	Alabama	01	Calhoun County, AL	01015	10	1053553	Unreliable
NA	Alabama	01	Chilton County, AL	01021	10	391161	Unreliable
NA	Alabama	01	Cullman County, AL	01043	13	722726	Unreliable

Join the county data

county_joined <- pharm_c_combined %>% 
  full_join(county_cdc) %>% 
  select(-notes, -state_code, -crude_rate)

Joining with `by = join_by(countyfips)`

county_joined %>% 
  head() %>% 
  kable()

buyer_county	buyer_state	countyfips	count	dosage_unit	state	county	deaths	population
ABBEVILLE	SC	45001	10749	4591000	NA	NA	NA	NA
ACADIA	LA	22001	55952	22197708	Louisiana	Acadia Parish, LA	39	554207
ACCOMACK	VA	51001	19536	6660470	Virginia	Accomack County, VA	22	303140
ADA	ID	16001	290484	136806828	Idaho	Ada County, ID	269	3555029
ADAIR	IA	19001	5752	1783165	NA	NA	NA	NA
ADAIR	KY	21001	21985	8783100	Kentucky	Adair County, KY	14	168051

Wrangle the data

Calculating rates.

county_pops <- county_pops %>% 
  filter(year==2014) %>% 
  select(countyfips, year, population)

county_joined <- county_joined %>% 
  select(-population) %>% 
  left_join(county_pops) %>% 
  mutate(pills_per_person=round(dosage_unit/population,1)) %>% 
  mutate(death_per_100k=round(deaths/population*100000,1)) %>% 
 # rename(death_rate=crude_rate) %>% 
  pivot_longer(cols=c("death_per_100k", "pills_per_person"),
               names_to="type",
               values_to="rate")

Joining with `by = join_by(countyfips)`

write_csv(county_joined, "data/opioids_counties_combined.csv", na="")

Next, we’ll move on to first steps in visualizing these data sets with geofacet.