Tidy Tuesday: US Populated Places

R
TidyTuesday
R-code
data visualization
openxlsx
stringr
fuzzyjoin
mapview
sf
TidyTuesday: Historic Neighborhoods of Arlington Virginia
Author
Published

June 27, 2023

Modified

November 3, 2023

Today’s TidyTuesday is about place names as recorded by the US Board on Geographic Names. The dataset has been cleaned to include only populated places.

This week will involve more libraries than normal, since I am going to play with mapping.

library(tidyverse) # who doesn't want to be tidy?
library(ggthemes) # more themes for ggplot
library(gt) # For nice tables
library(ggrepel) # to help position labels in ggplot graphs
library(openxlsx) # importing excel files from a URL
library(fuzzyjoin) # for joining on inexact matches
library(sf) # for handling geo data
library(mapview) # quick interactive mapping
library(leaflet) # more mapping

Load dataset in the usual way.

tuesdata <- tidytuesdayR::tt_load(2023, week = 26) 
us_place_names <- tuesdata$`us_place_names` 
us_place_history <- tuesdata$`us_place_history`

I’d like to look at the places local to me. The dataset contains two dataframes- one with geographic details about the location and the other with some commentary like description and history.

va <- us_place_names %>% filter(state_name == "Virginia")
va <- va %>% filter(county_name == "Arlington")
va_joined <- va %>% left_join(us_place_history, by = join_by(feature_id))

I don’t need city, state, and county number since I am dealing with a single city/county. So I am removing them from the dataset and then viewing what I have.

va_joined %>% select(-state_name,-county_name,-county_numeric) %>%
  gt()
feature_id feature_name date_created date_edited prim_lat_dec prim_long_dec description history
1471986 Overlee Knolls 1979-09-28 2022-06-07 38.88956 -77.14776 NA NA
1492448 Addison Heights 1979-09-28 2022-06-07 38.85567 -77.06026 NA NA
1492455 Alcova Heights 1979-09-28 2022-06-07 38.86456 -77.09720 NA NA
1492483 Arlington Forest 1979-09-28 2022-06-07 38.86872 -77.11303 NA NA
1492484 Arlington Heights 1979-09-28 2022-06-07 38.86956 -77.09220 NA NA
1492485 Arlington Village 1979-09-28 2022-06-07 38.86178 -77.08526 NA NA
1492487 Arna Valley 1979-09-28 2022-06-07 38.84428 -77.07637 NA NA
1492496 Aurora Hills 1979-09-28 2022-06-07 38.85150 -77.06414 NA NA
1492512 Barcroft 1979-09-28 2022-06-07 38.85595 -77.10387 NA NA
1492597 Bluemont Junction 1979-09-28 2022-06-07 38.87483 -77.13331 NA NA
1492606 Bon Air 1979-09-28 2022-06-07 38.87317 -77.12665 NA NA
1492659 Buckingham 1979-09-28 2022-06-07 38.87345 -77.10665 NA NA
1492771 Claremont 1979-09-28 2022-06-07 38.84317 -77.10470 NA NA
1492797 Columbia Forest 1979-09-28 2022-06-07 38.85400 -77.11026 NA NA
1492798 Columbia Heights 1979-09-28 2022-06-07 38.85761 -77.12109 NA NA
1492877 Douglass Park 1979-09-28 2022-06-07 38.84983 -77.09303 NA NA
1492958 Fort Barnard Heights 1979-09-28 2022-06-07 38.84650 -77.08942 NA NA
1493006 Glencarlyn 1979-09-28 2011-05-11 38.86178 -77.12915 NA NA
1493353 North Fairlington 1979-09-28 2022-06-07 38.83650 -77.09720 NA NA
1493397 Parkglen 1979-09-28 2022-06-07 38.85595 -77.11637 NA NA
1493586 Shirlington 1979-09-28 2022-06-07 38.84178 -77.08831 NA NA
1493630 South Fairlington 1979-09-28 2022-06-07 38.83261 -77.08970 NA NA
1493744 Virginia Heights 1979-09-28 2022-06-07 38.85095 -77.11637 NA NA
1493745 Virginia Highlands 1979-09-28 2022-06-07 38.85845 -77.06470 NA NA
1493784 Westmont 1979-09-28 2022-06-07 38.86261 -77.09192 NA NA
1495188 Allencrest 1979-09-28 2022-06-07 38.89344 -77.15026 NA NA
1495260 Berkshire 1979-09-28 2022-06-07 38.89789 -77.15137 NA NA
1495429 Country Club Hills 1979-09-28 2022-06-07 38.91400 -77.13081 NA NA
1495430 Country Club Manor 1979-09-28 2022-06-07 38.91372 -77.13776 NA NA
1495438 Crescent Hills 1979-09-28 2022-06-07 38.90483 -77.14581 NA NA
1495472 Dominion Hills 1979-09-28 2022-06-07 38.87595 -77.14109 NA NA
1495490 East Falls Church 1979-09-28 2022-06-07 38.88733 -77.15442 NA NA
1495579 Garden City 1979-09-28 2022-06-07 38.90011 -77.13526 NA NA
1495641 Halls Hill 1979-09-28 2022-06-07 38.89761 -77.12859 NA NA
1495692 Highview Park 1979-09-28 2022-06-07 38.89372 -77.12748 NA NA
1495804 Lacey Forest 1979-09-28 2022-06-07 38.88289 -77.12915 NA NA
1495821 Larchmont 1979-09-28 2022-06-07 38.88650 -77.12776 NA NA
1495887 Madison Manor 1979-09-28 2022-06-07 38.88039 -77.14720 NA NA
1496037 Oakwood 1979-09-28 2022-06-07 38.89733 -77.16248 NA NA
1496271 Stratford Hills 1979-09-28 2022-06-07 38.90872 -77.14053 NA NA
1496293 Tara 1979-09-28 2022-06-07 38.89039 -77.13498 NA NA
1496368 Walker Chapel 1979-09-28 2022-06-07 38.92150 -77.12942 NA NA
1496386 West Arlington 1979-09-28 2022-06-07 38.89400 -77.16831 NA NA
1496394 Westover 1979-09-28 2022-06-07 38.88706 -77.13942 NA NA
1496421 Williamsburg Village 1979-09-28 2022-06-07 38.90511 -77.15498 NA NA
1496434 Woodland Acres 1979-09-28 2022-06-07 38.91261 -77.14526 NA NA
1499060 Arlingwood 1979-09-28 2022-06-07 38.92761 -77.12192 NA NA
1499086 Ballston 1979-09-28 2011-05-11 38.88011 -77.11387 NA NA
1499108 Beechwood Hills 1979-09-28 2022-06-07 38.90900 -77.10998 NA NA
1499116 Bellevue Forest 1979-09-28 2022-06-07 38.91428 -77.11359 NA NA
1499157 Brandon Village 1979-09-28 2022-06-07 38.87567 -77.11581 NA NA
1499172 Broyhill Forest 1979-09-28 2022-06-07 38.91539 -77.12248 NA NA
1499245 Cherrydale 1979-09-28 2022-06-07 38.89706 -77.10831 NA NA
1499266 Clarendon 1979-09-28 2022-06-07 38.88595 -77.09692 NA NA
1499290 Colonial Village 1979-09-28 2022-06-07 38.89317 -77.08609 NA NA
1499313 Crystal Spring Knolls 1979-09-28 2022-06-07 38.90344 -77.10498 NA NA
1499349 Dominion Heights 1979-09-28 2022-06-07 38.89289 -77.10776 NA NA
1499354 Dover 1979-09-28 2022-06-07 38.90678 -77.10581 NA NA
1499439 Fort Myer Heights 1979-09-28 2022-06-07 38.89206 -77.07942 NA NA
1499560 Highlands 1979-09-28 2022-06-07 38.89817 -77.08303 NA NA
1499652 Lee Heights 1979-09-28 2022-06-07 38.90206 -77.11720 NA NA
1499696 Lyon Park 1979-09-28 2022-06-07 38.88067 -77.09026 NA NA
1499697 Lyon Village 1979-09-28 2022-06-07 38.89483 -77.09498 NA NA
1499930 Radnor Heights 1979-09-28 2022-06-07 38.88900 -77.07303 NA NA
1499964 Rivercrest 1979-09-28 2022-06-07 38.92206 -77.11915 NA NA
1499969 Riverwood 1979-09-28 2022-06-07 38.90539 -77.10248 NA NA
1499990 Rosslyn 1979-09-28 2022-06-07 38.89678 -77.07248 NA NA
1500349 Woodmont 1979-09-28 2022-06-07 38.90067 -77.09498 NA NA
1779110 Brockwood 1998-02-05 2022-06-07 38.87761 -77.12887 NA NA
1779112 Country Club Grove 1998-02-05 2022-06-07 38.91956 -77.12942 NA NA
1779118 East Arlington (historical) 1998-02-05 2022-06-07 38.87345 -77.06220 NA NA
1779119 Green Valley 1998-02-05 2022-06-07 38.85511 -77.08859 NA NA
1779147 Millburn Terrace 1998-02-05 2022-06-07 38.90067 -77.13831 NA NA
1783506 Arlington 1998-03-02 2022-06-07 38.89039 -77.08414 NA NA
2646878 Crystal City 2010-08-26 2018-11-14 38.85535 -77.05090 NA NA

There is no historical or descriptive data for any of the features in Arlington. Many of these are historical sites or are otherwise of interest. I’d like to augment this data with some context. Arlington has 23 neighborhoods that are on the National Register of Historic Places. The National Register does have scanned applications available for post 2012 applications, but most of the historic neighborhoods were designated prior to that. The National Register does also have a spreadsheet with links to the National archives, which contains the pre-2012 applications.

I normally like to use tidyverse packages, but read_excel won’t work with URLs. There are workarounds, but it is easier just to use the openxlsx package. The read.xlsx function works as you’d expect but you do need to specify the sheet to read in.

national_historic <-
  read.xlsx(
    'https://www.nps.gov/subjects/nationalregister/upload/national-register-listed-20230119.xlsx' ,
    sheet = 1
  )

Taking only my local historic sites. This dataset is annoying because some entries are in all CAPS (like state), but others are in titlecase (like City/County). Some, like building category are in both. To use the entire dataset some string cleaning and formating might be necessary, but for this case, I don’t need to do this.

arlington_historic <- national_historic %>%
  filter(State == "VIRGINIA" & County == "Arlington")

Looking at the data, it neighborhoods seem to be encoded as districts.

arlington_historic_districts <- arlington_historic %>%
  filter(Category.of.Property == "DISTRICT")

Arlington County has a website listing historic neighborhoods, and I know there should be 23. The National Register has 29 local entries. I should also note that only 17 of the Arlington neighborhoods appeared in our place names dataset.

On to figure out what the extra 3 historic places are. Apparently forts are also districts. There are also applications for boundary increases. To do this I am going to use the stringr function str_detect to find “Boundary Increase” and “Fort” and use the negate = TRUE flag to return everything that doesn’t match.

arlington_historic_districts2 <- arlington_historic_districts %>%
  filter(str_detect(Property.Name, "Boundary Increase", negate = TRUE)) %>%
  filter(str_detect(Property.Name, "Fort", negate = TRUE))  


arlington_historic_districts2 %>% gt()
Reference.number Property.Name Status Request.Type Restricted.Address Category.of.Property State County City Street.&.Number External.Link Federal.Agencies Level.of.Significance.-.International Level.of.Significance.-.Local Level.of.Significance.-.National Level.of.Significance.-.Not.Indicated Level.of.Significance.-.State Listed.Date Name.of.Multiple.Property.Listing NHL.Designated.Date Other.Names Park.Name Status.Date Area.of.Significance
_05001344 Arlington Forest Historic District Listed Multiple FALSE DISTRICT VIRGINIA Arlington Arlington Bounded by Carlin Springs Rd., George Mason Dr., Henderson Rd., Aberdeen St., Columbus St., Granada, Galveston and 2nd https://catalog.archives.gov/id/77834749 NA FALSE TRUE FALSE FALSE FALSE 38688 NA NA VDHR File No.000-7808 NA 38688 ARCHITECTURE; COMMUNICATIONS
_08000063 Arlington Heights Historic District Listed Multiple FALSE DISTRICT VIRGINIA Arlington Arlington Bounded by Arlington Blvd., S. Fillmore St., S. Walter Reed Dr., columbia Pk., & S. Glebe Rd. https://catalog.archives.gov/id/41678540 NA FALSE TRUE FALSE FALSE FALSE 39499 Garden Apartments, Apartment Houses and Apartment Complexes in Arlington County, Virginia MPS NA 000-3383 NA 39499 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT
_14000146 Arlington National Cemetery Historic District Listed Single FALSE DISTRICT VIRGINIA Arlington Arlington 1 Memorial Ave. NA DEPARTMENT OF THE ARMY FALSE FALSE TRUE FALSE FALSE 41740 NA NA Arlington National Cemetery; DHR #000-0042 NA 41740 MILITARY; LANDSCAPE ARCHITECTURE; POLITICS/GOVERNMENT; ARCHITECTURE
_03000215 Arlington Village Historic District Listed Single FALSE DISTRICT VIRGINIA Arlington Arlington S 13th St., S 13 Rd., S 16th St., S Barton S., S. Cleveland St. and Edgewood St. https://catalog.archives.gov/id/41679618 NA FALSE TRUE FALSE FALSE FALSE 37722 NA NA 000-0024 NA 37722 COMMUNITY PLANNING AND DEVELOPMENT
_03000561 Ashton Heights Historic District Listed Single FALSE DISTRICT VIRGINIA Arlington Arlington Roughly bounded by Wilson Bvd., N. Irving St., Arlington Bvd., N. Oxford St., N. Piedmont & N. Oakland Sts. https://catalog.archives.gov/id/41679598 NA FALSE TRUE FALSE FALSE FALSE 37795 NA NA 000-7819 NA 37795 ARCHITECTURE; COMMERCE
_08001018 Aurora Highlands Historic District Listed Multiple FALSE DISTRICT VIRGINIA Arlington Arlington Bounded by 16th St. S., S. Eads St., 26th St. S., and S. Joyce St. https://catalog.archives.gov/id/77834759 NA FALSE TRUE FALSE FALSE FALSE 39743 NA NA 000-9706 NA 39743 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT
_98001649 Buckingham Historic District Listed Single FALSE DISTRICT VIRGINIA Arlington Arlington Roughly bounded by N. 5th, N. Oxford, and N. 2nd Sts., and N. Glebe Rd. https://catalog.archives.gov/id/41679602 NA FALSE TRUE FALSE FALSE FALSE 36181 NA NA DHR File # 00-0025 NA 36181 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT; LANDSCAPE ARCHITECTURE
_03000461 Cherrydale Historic District Listed Single FALSE DISTRICT VIRGINIA Arlington Arlington Roughly bounded by Lorcom Ln., N. Utah and N. Taylor Sts., and I-66 https://catalog.archives.gov/id/41679592 NA FALSE TRUE FALSE FALSE FALSE 37763 NA NA VDHR File Number 000-7821 NA 37763 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT
_06000751 Claremont Historic District Listed Multiple FALSE DISTRICT VIRGINIA Arlington Arlington Bounded by S. Dinwiddie St., S. Chesterfield Rd., S. Buchanan St., 25th St. S, 24th St. S, 23rd St. S and 22nd St. S https://catalog.archives.gov/id/77834757 NA FALSE TRUE FALSE FALSE FALSE 38960 NA NA 000-9700 NA 38960 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT
_04000047 Columbia Forest Historic District Listed Single FALSE DISTRICT VIRGINIA Arlington Arlington Bounded by 11th, S. Edison, S. Dinwiddie, S. Columbus, S. George Mason, and S. Frederick St. https://catalog.archives.gov/id/41679620 NA FALSE TRUE FALSE FALSE FALSE 38028 NA NA VDHR # 000-9416 NA 38028 COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE
_12000239 Dominion Hills Historic District Listed Multiple FALSE DISTRICT VIRGINIA Arlington Arlington Roughly bounded by N. Four Mile Run Dr., N. McKinley Rd., N. Larrimore, N. Madison, N. Montana Sts., & 9th St. N. https://catalog.archives.gov/id/77834753 NA FALSE TRUE FALSE FALSE FALSE 41023 Historic Residential Suburbs in the United States, 1830-1960 MPS NA VDHR FILE NUMBER: 000-4212 NA 41023 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT
_99000368 Fairlington Historic District Listed Single FALSE DISTRICT VIRGINIA Arlington Arlington Roughly bounded by Quaker Lane, King St., I-395, S. Walter Reed Dr., and S. Abingdon St. https://catalog.archives.gov/id/41679636 NA FALSE FALSE TRUE FALSE FALSE 36248 NA NA DHR File No. 000-5772 NA 36248 MILITARY; COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE
_04000049 Glebewood Village Historic District Listed Single FALSE DISTRICT VIRGINIA Arlington Arlington N. Brandywine St. Bet. Lee Hwy and 10th Place N, 21St Rd. bet. N. Brandywine St. and N. Glebe Rd. https://catalog.archives.gov/id/41679622 NA FALSE TRUE FALSE FALSE FALSE 38028 NA NA 000-9414 NA 38028 COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE
_08000910 Glencarlyn Historic District Listed Multiple FALSE DISTRICT VIRGINIA Arlington Arlington Bounded by S. Carlin Springs Rd., Arlington Blvd., 5th Rd. S., Glencarlyn Park https://catalog.archives.gov/id/77834761 NA FALSE TRUE FALSE FALSE FALSE 39709 NA NA 000-9704 NA 39709 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT
_11000548 Highland Park-Overlee Knolls Listed Multiple FALSE DISTRICT VIRGINIA Arlington Arlington Roughly bounded by 22nd St. N., N. Lexington St., 16th St. N., N. Longfellow St., McKinley Rd., I-66 & N. Quantico St. https://catalog.archives.gov/id/77834763 NA FALSE TRUE FALSE FALSE FALSE 40773 Historic Residential Suburbs in the United States, 1830-1960 MPS NA Fostoria/VDHR File Number OOO-9703 NA 40773 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT
_04000109 Lee Gardens North Historic District Listed Multiple FALSE DISTRICT VIRGINIA Arlington Arlington 2300-2341 N. 11th St. https://catalog.archives.gov/id/41678536 NA FALSE TRUE FALSE FALSE FALSE 38043 Garden Apartments, Apartment Houses and Apartment Complexes in Arlington County, Virginia MPS NA 000-9411; Woodbury Park Apartments NA 38043 COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE
_03000437 Lyon Park Historic District Listed Single FALSE DISTRICT VIRGINIA Arlington Arlington Roughly bounded by 10th St. N, Arlington Blvd., and N. Irving St. https://catalog.archives.gov/id/41679594 NA FALSE TRUE FALSE FALSE FALSE 37937 NA NA 000-7820 NA 37937 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT
_02000512 Lyon Village Historic District Listed Single FALSE DISTRICT VIRGINIA Arlington Arlington Roughly bounded by Lee Hwy, N. Veitch St., N. Franklin Rd., N. Highland St., N. Fillmore St., and N. Kirkwood Rd. https://catalog.archives.gov/id/41679590 NA FALSE TRUE FALSE FALSE FALSE 37386 NA NA VDHR File No. 000-7822 NA 37386 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT
_03000460 Maywood Historic District Listed Single FALSE DISTRICT VIRGINIA Arlington Arlington Roughly bounded by Lorcom Ln., Spout Run Parkway, I-66, Lee Highway, N. Oakland St., N. Nelson St., and N. Lincoln St. https://catalog.archives.gov/id/41679596 NA FALSE TRUE FALSE FALSE FALSE 37763 NA NA VDHR File Number 000-5056 NA 37763 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT
_08000064 Monroe Courts Historic District Listed Multiple FALSE DISTRICT VIRGINIA Arlington Arlington 1041-1067 N. Nelson and 1036-1062 & 1033-1055 N. Monroe Sts. https://catalog.archives.gov/id/77834751 NA FALSE TRUE FALSE FALSE FALSE 39499 NA NA 000-4105 NA 39499 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT
_04000112 Penrose Historic District Listed Single FALSE DISTRICT VIRGINIA Arlington Arlington Roughly bounded by Arlington Blvd., S. Courthouse Rd., S. Fillmore St., S. Barton St. S, and Columbia Pike https://catalog.archives.gov/id/41679600 NA FALSE TRUE FALSE FALSE FALSE 38306 NA NA VDHR File Number 000-8823 NA 38306 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT; BLACK
_08000065 Virginia Heights Historic District Listed Multiple FALSE DISTRICT VIRGINIA Arlington Arlington Bounded by 10th Pl. S., S. Frederick St. & S. George Mason Dr. https://catalog.archives.gov/id/77834755 NA FALSE TRUE FALSE FALSE FALSE 39499 NA NA 000-9701 NA 39499 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT
_03000451 Walter Reed Gardens Historic District Listed Multiple FALSE DISTRICT VIRGINIA Arlington Arlington 2900-2906 13th St. S, 2900-2914 13th Rd S, 1301-1319 S. Walter Reed Dr. https://catalog.archives.gov/id/41678548 NA FALSE TRUE FALSE FALSE FALSE 37763 Garden Apartments, Apartment Houses and Apartment Complexes in Arlington County, Virginia MPS NA Commons of Arlington; 000-8824 NA 37763 COMMUNITY PLANNING AND DEVELOPMENT
_04000111 Waverly Hills Historic District Listed Single FALSE DISTRICT VIRGINIA Arlington Arlington Roughly bounded by 20th Rd. N, N. Utah St, I-66, N. Glebe Rd. and N. Vermont St. https://catalog.archives.gov/id/41679624 NA FALSE TRUE FALSE FALSE FALSE 38043 NA NA VDHR File Number 000-9413 NA 38043 COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE
_06000345 Westover Historic District Listed Multiple FALSE DISTRICT VIRGINIA Arlington Arlington Bounded by McKinley Rd., N. Washington Blvd., N. 16th St., N. Jefferson St., N. 11th St. and N. Fairfax Dr. https://catalog.archives.gov/id/41678538 NA FALSE TRUE FALSE FALSE FALSE 38839 Garden Apartments, Apartment Houses and Apartment Complexes in Arlington County, Virginia MPS NA 000-0032 NA 38839 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT

I still have too many entries. It turns out that Arlington National Cemetary is also encoded as a DISTRICT. There is also an entry for Walter Reed Gardens Historic District. Arlington County has this listed as a building on their site (and the other entries like Calvert Manor are noted as buildings in the National Register.)

I could remove these two items manually, but they will be removed when I join it to the place names dataset, since neither one appears in the populated place names.

Joining the two datasets will require some sort of string manipulation since the place names are not the same. The place names dataset contains just the place names (“Addison Heights”), while the historic sites data contains the phrase “Historic District” appended to the end. In addition, some place names don’t exactly match the historic district names (“Overlee Knolls” and “Highland Park/ Overlee Knolls Historic district”).

So I want to do some fuzzy matching and luckily (of course!) there is an R package for that.

However, the populated place names data contains “Arlington” which will match to a ton of different neighborhoods (Arlington Forest, Arlington Heights, etc.) I’m going to change Arlington to Arlington County.

va_joined2 <- va_joined %>%
  mutate(feature_name = ifelse(feature_name == "Arlington", "Arlington County", feature_name))

I also know that North and South Fairlington, while separate places in the populated place names, are a single historic district called Fairlington. I’m going to make both North and South Fairlington entry in the historical sites dataframe. I’m not removing the original Fairlington entry because I know I’m going to filter it out with my joins later. But this is the kind of thing that could lead to errors/ extraneous entries later on, so if you do something like this, just make sure you do clean it up later.

south_fairlington <- arlington_historic_districts2 %>% 
  filter(Property.Name == "Fairlington Historic District") %>%
  mutate(Property.Name = "South Fairlington")

north_fairlington <- arlington_historic_districts2 %>%
  filter(Property.Name == "Fairlington Historic District") %>%
  mutate(Property.Name = "North Fairlington")

arlington_historic_districts3 <- arlington_historic_districts2 %>%
  rbind(south_fairlington) %>%
  rbind(north_fairlington)

Okay, on to fuzzyjoining. The name from the populated places names dataset should be a subset of the name from the historic district dataset. I’m going to illustrate this in a very simply way using str_detect(). “Overlee Knolls” is the first entry in the populated places dataset. I’m going to use this as the pattern to search for in the Historic places dataset. The expected returned neighborhood is “Highland Park/ Overlee Knolls Historic district”.

va_joined2$feature_name[1]
[1] "Overlee Knolls"
arlington_historic_districts %>%
  filter(str_detect(Property.Name, va_joined2$feature_name[1])) %>%
  gt()
Reference.number Property.Name Status Request.Type Restricted.Address Category.of.Property State County City Street.&.Number External.Link Federal.Agencies Level.of.Significance.-.International Level.of.Significance.-.Local Level.of.Significance.-.National Level.of.Significance.-.Not.Indicated Level.of.Significance.-.State Listed.Date Name.of.Multiple.Property.Listing NHL.Designated.Date Other.Names Park.Name Status.Date Area.of.Significance
_11000548 Highland Park-Overlee Knolls Listed Multiple FALSE DISTRICT VIRGINIA Arlington Arlington Roughly bounded by 22nd St. N., N. Lexington St., 16th St. N., N. Longfellow St., McKinley Rd., I-66 & N. Quantico St. https://catalog.archives.gov/id/77834763 NA FALSE TRUE FALSE FALSE FALSE 40773 Historic Residential Suburbs in the United States, 1830-1960 MPS NA Fostoria/VDHR File Number OOO-9703 NA 40773 ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT

I’ve decided I only want to look at the historic areas in the populated place names. I’m choosing an inner join so I will only get entries that exist in BOTH the populated places and the historic register. This is 17 items (from manually comparing the populated places to the Arlington County website). I’m going to map these places on top of current Arlington County neighborhood groups/civic associates. I’m interested in how current neighborhood compare to the historic districts. (Note that I could have done this without the populated places dataset at all, but this is the Tidytuesday dataset and it is what lead me to my question.)

There are a few different ways to use fuzzyjoins. I found this discussion on stackoverflow to be a good starting point. I chose to use the match_fun version, since I had already prototyped with str_detect. The only thing that wasn’t clear to me is which dataframe would be sent to str_detect as the pattern and which was the string. That is, for

fuzzy_inner_join(x, y, by = c(x$name1 = y$name2), match_fun = str_detect) would I get

str_detect(string = x$name1, pattern = y$name2)

or

str_detect(string = y$name2, pattern = x$name1)

?

Maybe it is clear to others from the stackoverflow example or the fuzzyjoin manual, but it wasn’t clear to me, so I ended up trying it both ways. It turns out that the dataframes are passed to str_detect in the order they are listed, which makes sense (and is probably the convention, but I had never seen it explicitly stated). [To be absolutely clear, what happens is the first case (str_detect(string = x$name1, pattern = y$name2))]

historic_pop_places <-
  arlington_historic_districts3 %>% fuzzy_inner_join(va_joined2,
                                                     by = c("Property.Name" = "feature_name"),
                                                     match_fun = str_detect)

For what I plan to do, I need the place name and the location. I want the reason the place is important and a link to the historic registry application. I started this project wanting to know why these places were important! I’m leaving in both sets of place names, just so I can visually check that my dataset is correct.

historic_pop_places <- historic_pop_places %>%
  select(
    Property.Name,
    feature_name,
    Area.of.Significance,
    prim_lat_dec,
    prim_long_dec,
    External.Link
  )
gt(historic_pop_places)
Property.Name feature_name Area.of.Significance prim_lat_dec prim_long_dec External.Link
Arlington Forest Historic District Arlington Forest ARCHITECTURE; COMMUNICATIONS 38.86872 -77.11303 https://catalog.archives.gov/id/77834749
Arlington Heights Historic District Arlington Heights ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT 38.86956 -77.09220 https://catalog.archives.gov/id/41678540
Arlington Village Historic District Arlington Village COMMUNITY PLANNING AND DEVELOPMENT 38.86178 -77.08526 https://catalog.archives.gov/id/41679618
Aurora Highlands Historic District Highlands ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT 38.89817 -77.08303 https://catalog.archives.gov/id/77834759
Buckingham Historic District Buckingham ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT; LANDSCAPE ARCHITECTURE 38.87345 -77.10665 https://catalog.archives.gov/id/41679602
Cherrydale Historic District Cherrydale ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT 38.89706 -77.10831 https://catalog.archives.gov/id/41679592
Claremont Historic District Claremont ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT 38.84317 -77.10470 https://catalog.archives.gov/id/77834757
Columbia Forest Historic District Columbia Forest COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE 38.85400 -77.11026 https://catalog.archives.gov/id/41679620
Dominion Hills Historic District Dominion Hills ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT 38.87595 -77.14109 https://catalog.archives.gov/id/77834753
Glencarlyn Historic District Glencarlyn ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT 38.86178 -77.12915 https://catalog.archives.gov/id/77834761
Highland Park-Overlee Knolls Overlee Knolls ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT 38.88956 -77.14776 https://catalog.archives.gov/id/77834763
Lyon Park Historic District Lyon Park ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT 38.88067 -77.09026 https://catalog.archives.gov/id/41679594
Lyon Village Historic District Lyon Village ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT 38.89483 -77.09498 https://catalog.archives.gov/id/41679590
Virginia Heights Historic District Virginia Heights ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT 38.85095 -77.11637 https://catalog.archives.gov/id/77834755
Westover Historic District Westover ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT 38.88706 -77.13942 https://catalog.archives.gov/id/41678538
South Fairlington South Fairlington MILITARY; COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE 38.83261 -77.08970 https://catalog.archives.gov/id/41679636
North Fairlington North Fairlington MILITARY; COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE 38.83650 -77.09720 https://catalog.archives.gov/id/41679636

Aurora Highlands and Highlands are the same place- the description of Aurora Highlands from Wikipedia matches the description in the application to be entered on the National Historic Register.

Now, I found a map of all the Civic associations in Arlington on the county’s open data page. Data can be downloaded in a variety of formats, including shape files or geoJSON. I chose to download the shapefile and extracted the zip to my project directory (not shown).

The R Graph Gallery (which is a great resource and source of inspiration) has a great section on mapping, but unfortunately one of the needed packages is being retired. The code below still works but you will get a very long message telling you to migrate away from rgal.

# library(sp)
# library(rgdal)
# my_spdf <- readOGR( 
#  dsn = "Civic_poly.shp" , 
#  verbose=FALSE
#)

So, here is another way to read in the shape file using the sf package. This contains the polygons that define the boundaries of modern neighborhoods in Arlington. There are a lot of neighborhoods!

arlington_polygons <- st_read(dsn = "Civic_poly.shp")

Mapping points (which is what we have in our TidyTuesday dataset- we have the lat/long of the “official feature location”) and polygons from the Arlington County dataset involved a few steps. Shape files can be encoded using different coordinate reference systems (CRS) and care needs to be taken that all the map layers are using the same CRS. I found the mapview package invaluable during this process, as it is simple to create an interactive map. This made trouble shooting incredibly easy.

Generally, the first step for handling shape files in R is to convert them to simple features objects. Here, I’m using the sf_package. With a shape file, you generally don’t need to pass the coordinates or CRS, since that data is encoded in the shape file in a way that is easily detectable by the function.

arlington_polygons_sf <- st_as_sf(arlington_polygons)
mapview(arlington_polygons_sf)

The generated map looks perfect. Arlington is in the right place in the world mapview(arlington_polygons_sf)and the civic association map looks as it should.

For the point data, the conversion does require additional parameters (description of parameters here). Specifically, the coordinates for point data need to be specified. The order for this is longitude, latitude, which I did not do properly the first name, since in spoken English, you usually say latitude/longitude. The mapview map made that very easy to troubleshoot when I saw my points were all in Antarctica. The pop up made it clear that latitude and longitude were flipped. I also need the CRS for this dataset if I’m going to map it with the polygon data. (You also need a CRS for mapview to place your points on a map- without a CRS you get the pattern, but not the geolocation.)

The CRS is not specified in the data dictionary for TidyTuesday. There are two likely choices, 4326 and 4269. In this application, there isn’t actually a significant difference. With the mapview data you can select and deselect the layers and see both sets of points are in the same place on this map.

historic_4269 <- st_as_sf(historic_pop_places, coords = c(5:4), crs = 4269)
historic_4326 <- st_as_sf(historic_pop_places, coords = c(5:4), crs = 4326)

mapview(historic_4269) + mapview(historic_4326) + mapview(arlington_polygons_sf)

Going back to the original data source, it notes that “Datum is NAD83”. This means that the CRS = 4269 as found at the EPGS registry.

The Arlington polygons dataset is also NAD83/ 4269, so you can go directly to plotting. If they were different CRSs then you would need to transform them to the same projection, such as with:

points_transformed <- sf::st_transform(points_wrong, crs = sf::st_crs(arlington_polygons_sf))

mapview(arlington_polygons_sf ,
  col.regions = "purple") + mapview(historic_4269, col.regions = "blue")

So many of these historic neighborhoods don’t appear to correspond strongly to modern day neighborhoods. Several of them appear the borders of multiple neighborhood groups. And for example, the location of the Westover feature from the populated places names is actually in the Tara-Leeway Heights neighborhood according to the county’s description of civic association boundaries.

Now I’m going to make a static ggplot map. Mapview is great for exploratory data analysis, but it isn’t as highly customizable as other graphing packages. I’m displaying the populated place names/ historic district using geom_sf_text(). This needs to be passed both the data and the label, and despite being passed the data it still needs the full variable name (historic_4269$feature_name not feature_name). The units for nudging the text are depend on what crs is used? I just played around until I got the label to move and 800 was not the kind of number I was expecting. (I was thinking 0 to 1 like for hjust.)

ggplot() +
  geom_sf(data = arlington_polygons_sf) +
  geom_sf(data = historic_4269, alpha = 0.5) +
  theme_void() +
  geom_sf_text(
    data = historic_4269,
    label = historic_4269$feature_name,
    size = 2.0,
    nudge_y = 850
  ) +
  labs(title = "Historic Districs in Arlington compared to modern neighborhoods") +
  labs(caption = "Data from: US Board of Geographic Names,  \nArlington County, VA - Official GIS Open Data Portal,  \nand the National Register of Historic Places") +
  theme(plot.title = element_text(size = 10),
        plot.caption = element_text(size = 6, hjust = 0))

As a further project, I’d like to make an interactive map with a pop-up giving a clickable link to the National Archives page on the historic district application. The mapview version does have the pop-up, but the link isn’t live.

I found tutorial here to make the pop-up URL using leaflet, but I can’t figure out how to add my polygons. I add the points just fine. It also fails causes the quarto document to fail during render, though it works just fine as regular code.

Citation

BibTeX citation:
@online{sinks2023,
  author = {Sinks, Louise E.},
  title = {Tidy {Tuesday:} {US} {Populated} {Places}},
  date = {2023-06-27},
  url = {https://lsinks.github.io/posts/2023-06-27-tidytuesday-US-populated-places/arlington-neighborhoods.html},
  langid = {en}
}
For attribution, please cite this work as:
Sinks, Louise E. 2023. “Tidy Tuesday: US Populated Places.” June 27, 2023. https://lsinks.github.io/posts/2023-06-27-tidytuesday-US-populated-places/arlington-neighborhoods.html.