library(tidyverse) # who doesn't want to be tidy?
library(gt) # For nice tables
library(openxlsx) # importing excel files from a URL
library(fuzzyjoin) # for joining on inexact matches
library(sf) # for handling geo data
library(leaflet) # mapping
library(here) # reproducible file paths
library(magick) # makes panel pictures
Data Cleaning for the Tombstone Project
Project Overview
I’m working on a project for my father that will culminate in a website for his genealogy research. There are a couple of different parts that I’m working on independently. This part involves linking photos of family gravestones to an Excel sheet that records the GPS location of the tombstones. This combined dataset is used to generate a leaflet map. This portion focused on data cleaning and the photo matching. I do generate a leaflet map at the end, but it is not the final map. I’ll do the styling of the map in a separate post.
This post is intended both to document what I did for my father so he understands any changes to the data and what results were obtained, but also as a tutorial on how to approach a messy problem. I’ve been solving problems using code for a long time. There are a ton of tutorials that focus on how to solve a specific problem, but fewer that show how to approach an undefined problem. And even fewer tutorials show mistakes and false starts. But these things happen when you are solving real world problems. Constantly checking your results against what you expect to get is critical and then figuring out how you messed up and fixed it is also important. The hard errors to find and fix are the logic errors. Everything runs fine. You get an output that may look right. But you still might not be getting the correct result. You have to approach every output critically and check your work carefully.
I generally write my posts in a “code-along” style. I include almost everything I do, including dead ends. I could present more polished posts, where I write everything after I achieved the end result. This style of post would only include the steps that directly lead to the end result. I don’t do that because I don’t think the mechanics of getting to an end result is necessarily the hard part. Thinking your way through and self-checking the work is the hard part. If you know what you are trying to do, you can always find some code snippets to achieve that result. If you don’t know what you are trying to do, then all the code snippets in the world won’t help.
Some sections I do omit mistakes and go to the final product, just so this tutorial doesn’t end up being 5 million pages long. Generally, the first time I do something, I will go into more detail than following times. For the data cleaning portion, the Cleaning Up Cemetery Names section shows the entire process, including mistakes. For the matching part, everything before Round 2 is in detail, including mistakes, and then the other rounds are much less detailed.
I did also code a simplified version of this project all the way through using only one round of matching and 30 photos, just to make sure the basic elements were working. That isn’t shown here.
If for some reason you want to run this yourself, you can get a zipped copy of all the photos from here. I don’t upload the photos in this repo because the files size is too large.
Setting Up
Loading Libraries
I’ll include more info and reference information about the packages at the code blocks where I use them.
File Folder Names and Loading Data
Here set-up some variables that I use for the file/ folder structure and I read in the spreadsheet.
# folder names
<- "posts/2023-08-04-data-cleaning-tombstone"
blog_folder <- "Photos"
photo_folder <- "Archived Photos"
archive_folder <- "Unmatched Photos"
unmatched_folder <- "Matched_Round1"
match1 <- "Matched_Round2"
match2 <- "Matched_Round3"
match3 <- "Matched_Round4"
match4
#data_file <- "Tombstone_Data_small.xlsx"
<- "Tombstone Data.xlsx"
data_file # read in excel sheet
<-
tombstones_raw read.xlsx(here(blog_folder, data_file),
sheet = 1
)
The here
Package for Reproducible File Structures
I have folder structure that reflected the sequential nature of the matching, so photos get moved into different folders depending on what round they were matched in. I am use here
to generate the paths. Quarto documents start the file path in the folder where the document resides, while r files start in the project folder. here always starts in the project folder, so it allows for easy recycling of code between r files and Quarto files and generally prevents you from getting lost in your file structure. It also allows me to easily move between an independent project and the project that is my website without having to recode all the folder names in the code. All I need to do is setup the sub-folder structure and names (as I did above) and then use them to generate file paths relative to here
. You can see that usage in the loading of the excel sheet.
Reformat and Clean the Data
Cleaning the data is an iterative process. A quick scan of the data reveals a bunch of really obvious issues, but as the analysis proceeds, other errors pop up that can be traced back to improperly cleaned data. Continually checking the results against expected results is critical to find the mistakes. This is part of the reason I have temporary variables (tombstone_1, tombstone_2, etc.). If I’m not sure about something, I’ll store the results in the temporary variable, so I don’t have to rerun everything from the start to get a clean copy to work with. I can just go back one or two code blocks and regenerate from a working partially cleaned version.
Deciding on ground rules for what you will and will not correct is important. For this project, I decided I would not change any photo file names. I’m working with a copy of his photo archive; he has his own filing and naming scheme, and he also corresponds with other genealogists and shares information. Changing photo names on my copy would lead to a set of photos that no longer matched those out there in various places. This decision will lead to missed matches since some photos do appear to have typos in the names such as Octava instead of Octavia. Other photos seem to not follow his normal naming convention of last name first name middle name. Some use first name last name. This again is something that could be corrected programatically, but I won’t because of my ground rules. For another project, a different decision might make more sense. (I’d definitely correct file names if it were my own data!)
I also decided that any inferred data in the spreadsheet (usually denoted in [] here) would not be used. Everything going into the map is data directly from the photos.
The tidyverse packages stringr
and tidyr
both have very powerful tools for data cleaning and tidying. For most tasks, there are multiple ways to accomplish the goal. I’ll illustrate several different ways to perform tasks; there is likely one that is best suited for your application so it is good to know the various methods.
Fixing the GPS data
The GPS data is stored as a string representing degrees, minutes, and seconds of latitude and longitude. I’m going to want this as a decimal lat/long (numerical) as I know that is accepted by many mapping programs. Dealing with this data has two parts: cleaning up the typos/ formatting and then converting to the decimal number.
Viewing the GPS Data (strings)
When you view the GPS data you can see a couple of issues.
%>%
tombstones_raw select(Surname, N, W) %>%
gt() %>%
tab_options(container.height = px(300), container.padding.y = px(24))
Surname | N | W |
---|---|---|
Anderson | 36o56.472 | 86 86.961 |
Anderson | 36 56.472 | 86 86.961 |
Anderson | 37 53.396 | 88 41.321 |
Anderson | 37 52.856 | 88 39.163 |
Anderson | 37 52.856 | 88 39.163 |
Anderson | 37 52.855 | 88 39.163 |
Anderson | 37 52.853 | 88 39.164 |
Anderson | 37 52.853 | 88 39.167 |
Anderson | 37 52.852 | 88 39.165 |
Appleton | 36 29.552 | 86 46.793 |
Baldwin | 38 33.025 | 87 06.328 |
Baldwin | 38 33.025 | 87 06.328 |
Baggett | 36 29.553 | 86 46.793 |
Beasley | 36 35.891 | 86 43.204 |
Beasley | 36 36.755 | 86 43.145 |
Beasley | 36 36.755 | 86 43.145 |
Bell | 36 15.064 | 86 11.669 |
Bell | 36 15.064 | 86 11.669 |
Brazelton | 35 09.411 | 86 03.624 |
Brazelton | 35 09.410 | 86 03.624 |
Brown | 40O 40.760’ | 75O 31.705' |
Brown | 40O 40.760’ | 75O 31.705' |
Bundy | 37 45.623 | |
Bundy | 37 53.380 | 88 44.474 |
Bundy | 37 53.380 | 88 44.474 |
Bundy | 37 53.380 | 88 44.474 |
Bundy | 37 53.379 | 88 44.474 |
Bundy | 37 52.875 | 88 39.118 |
Bundy | 37 52.875 | 88 39.118 |
Bundy | 37 52.873 | 88 39.188 |
Bundy | 37 52.873 | 88 39.188 |
Burgess | 37 49.224 | 88 54.527 |
Burgess | 37 49.224 | 88 54.527 |
Clayton | 37 50.788 | 88 50.968 |
Clayton | 37 50.788 | 88 50.968 |
Clayton | 37 50.795 | 88 50.977 |
Clayton | 37 50.795 | 88 50.977 |
Chapman | 37 29.894 | 88 54.045 |
Chapman | 37 29.894 | 88 54.046 |
Chapman | 37 25.692 | 88 53.951 |
Chapman | 37 25.691 | 88 53.949 |
Chapman | 37 25.691 | 88 53.949 |
Chapman | 37 25.692 | 88 53.951 |
Chapman | 37 25.692 | 88 53.951 |
Chapman | 37 25.694 | 88 53.951 |
Chapman | 38 33.026 | 87 06.327 |
Crockett | 36 22.801 | 86 45.985 |
Crockett | 36 22.804 | 86 45.984 |
Davis | 37 44.682 | 88 55.994 |
Davis | 37 44.683 | 88 55.993 |
Davis | 36 14.260 | 86 43.129 |
Dolch | 38 44.563 | 82 58.988 |
Dolch | 38 44.584 | 82 58.987 |
Dolch | 38 44.564 | 82 58.987 |
Doley | 38 44.615 | 82 58.882 |
Doley | 38 44.615 | 82 58.882 |
Doley | 38 44.615 | 82 58.882 |
Doley | 38 44.615 | 82 58.882 |
NA | 38 44.618 | 82 58.884 |
NA | 38 44.618 | 82 58.884 |
NA | 38 44.618 | 82 58.885 |
NA | 38 44.618 | 82 58.885 |
NA | 38 44.618 | 82 58.886 |
Doley | 38 44.615 | 82 58.882 |
Doley | 38 44.610 | 82 58.923 |
Doley | 38 44.611 | 82 58.922 |
Doley | 38 44.611 | 82 59.012 |
Doley | 38 44.611 | 82 59.013 |
Doley | 37 49.907 | 88 35.306 |
Doley | 37 49.907 | 88 35.306 |
Doley | 37 49.907 | 88 35.306 |
Doley | 37 58.810 | 88 55.084 |
Doley | 37 58.810 | 88 55.084 |
Doley | 37 58.751 | 88 55.161 |
Doley | 37 58.751 | 88 55.161 |
Dorris | 36o28.798’ | 86o46.011’ |
Dorris | 36 28.811 | 86 46.008 |
Dorris | 36 28.812 | 86 46.008 |
Dorris | 36 28.812 | 86 46.008 |
Dorris | 36 28.813 | 86 46.007 |
Dorris | NA | NA |
Dorris | NA | NA |
Dorris | 36 26.485 | 86 48.329 |
Dorris | 36 26.484 | 86 48.329 |
Dorris | 38 07.067 | 88 51.870 |
Dorris | 38 07.067 | 88 51.870 |
Dorris | 38 07.067 | 88 51.870 |
Dorris | 38 07.081 | 88 51.903 |
Dorris | 38 07.081 | 88 51.903 |
Dorris | 37 54.310 | 88 58.084 |
Dorris | 37 54.309 | 88 58.083 |
Dorris | 37 54.309 | 88 58.083 |
Dorris | 37 54.310 | 88 58.084 |
Dorris | 37 54.310 | 88 58.084 |
Dorris | 37 58.746 | 88 55.204 |
Dorris | 37 58.749 | 88 55.205 |
Dorris | 37 47.990 | 88 53.488 |
Dorris | 37 47.988 | 88 53.489 |
Dorris | 37 51.571 | 88 54.939 |
Dorris | 37 51.571 | 88 54.939 |
Dorris | 37 50.787 | 88 50.972 |
Dorris | 37 50.788 | 88 50.971 |
Dorris | 37 50.794 | 88 50.975 |
Dorris | 37 50.794 | 88 50.975 |
Dorris | 37 50.786 | 88 50.974 |
Dorris | 37 50.771 | 88 50.986 |
Dorris | 37 50.783 | 88 50.983 |
Dorris | 37 50.775 | 88 50.986 |
Dorris | 37 50.775 | 88 50.986 |
Dorris | 37 50.775 | 88 50.980 |
Dorris | 37 50.775 | 88 50.980 |
Dorris | 524 | 783 |
Dorris | 528 | 783 |
Drake | 36 35.870 | 86 43.184 |
Dreisbach | 40o 44.177' | 75 29.596' |
Dreisbach | 40o 44.177' | 75 29.593' |
Everett | 38 33.026 | 87 06.327 |
Farris | 37 24.687 | 88 50.538 |
Farris | 37 24.678 | 88 50.538 |
Finch | 44 34.662' | 37 27.129' |
Follis | 37 51.764' | 88 56.897 |
Follis | 37 51.758' | 88 56.894 |
Follis | 37 51.761' | 88 56.893 |
Follis | 37 51.761' | 88 56.896 |
Follis | 37 51.759' | 88 56.895 |
Follis | 37 51. | 88 56 |
Follis | 37 51.758' | 88 56.896 |
Follis | 37 51.758' | 88 56.901' |
Follis | 37 51.758' | 88 56.901' |
Follis | 37 51.758' | 88 56.904 |
Ford | 37 52.851 | 88 39.161 |
Fox | 37 48.023 | 88 53.449 |
Frost | 37 17.909 | 87 28.852 |
NA | 37 17.910 | 87 28.852 |
Frost | 37 17.909 | 87 28.854 |
Fuqua | 36 38.189 | 86 51.516 |
Gregory | 38 44.609 | 82 58.922' |
Gregory | 38 44.611' | 82 58.922' |
Hart | 37 51.757' | 88 56.900 |
Hess | 37 25.687 | 88 53.947 |
Hess | 37 25.687 | 88 53.949 |
Hess | 37 25.688 | 88 53.952 |
Hess | 37 25.688 | 88 53.952 |
Hess | 37 25.688 | 88 53.952 |
Hess | 37 25.687 | 88 53.947 |
Hess | 37 25.688 | 88 53.952 |
Hess | 37 25.687 | 88 53.948 |
Hess | 37 25.689 | 88 53.952 |
Hess | 37 25.689 | 88 53.952 |
Hess | 37 25.693 | 88 53.949 |
Hess | 37 25.693 | 88 53.947 |
Hess | 37 25.693 | 88 53.949 |
Hess | 37 25.690 | 88 53.951 |
Holt | NA | NA |
Holt | NA | NA |
Horlacher | 40o 30.928' | 75o 25.072' |
Horlacher | 40o30.930’ | 75o 25.070’ |
Horrall | 37 54.090 | 88 54.218 |
Horrall | 38 33.026 | 87 06.326 |
Horrall | 38 36.963 | 87 11.369 |
Hurt | 36 28.804 | 86 46.007 |
Jacobs | 38 21.315' | 85 41.307' |
Jacobs | 38 21.317' | 85 41.306' |
Johnson | 37 52.872 | 88 39.183 |
Johnson | 37 52.872 | 88 39.183 |
Jones | 37 47.994 | 88 53.504 |
Jones | 37 47.994 | 88 53.504 |
Jones | 37 47.997 | 88 53.483 |
Jones | 37 47.995 | 88 53.483 |
Jones | 37 47.995 | 88 53.483 |
Jones | 37 48.024 | 88 53.451 |
Jones | 37 48.024 | 88 53.451 |
Jones | 37 48.020 | 88 53.465 |
Jones | 37 48.020 | 88 53.465 |
Jones | 37 51.747 | 88 52.933 |
Karnes | 37 58.749 | 88 55.161 |
Karnes | 37 58.749 | 88 55.161 |
Keith | NA | NA |
Keth | 35 09.410 | 86 03.624 |
Kleppinger | 40o 44.178' | 75 29.601' |
Lipsey | 38 33.917 | 89 07.571 |
Lockwood | NA | NA |
Lockwood | NA | NA |
Loomis | 37 36.925 | 89 12.220 |
Mensch | 40o 39.557' | 75 25.586' |
Merrell | 35 43.945 | 80 18.669 |
Merrell | 35 43.942 | 80 18.671 |
Meredith | 39O 41.114’ | 76O 35.858' |
Meredith | 39O 41.115’ | 76O 35.855' |
Meredith | 39O 41.116’ | 76O 35.855' |
Meredith | 39O 41.116’ | 76O 35.854' |
Meredith | 39O 41.117’ | 76O 35.853' |
Bell | 39O 41.117’ | 76O 35.853' |
John | 39O 41.117’ | 76O 35.853' |
Meredith | 39O 41.118’ | 76O 35.852' |
Meredith | 39O 41.112’ | 76O 35.857' |
Meredith | 39O 41.112’ | 76O 35.857' |
Meredith | 39O 41.112’ | 76O 35.856' |
Meredith | 39O 41.112’ | 76O 35.856' |
Meredith | 39O 41.112’ | 76O 35.855' |
Meredith | 39O 41.112’ | 76O 35.854' |
Meredith | 39O 41.113’ | 76O 35.855' |
Meredith | 39O 41.113’ | 76O 35.855' |
Tipton | 39O 41.114’ | 76O 35.855' |
Meredith | 39O 41.114’ | 76O 35.854' |
Meredith | 39O 41.114’ | 76O 35.854' |
Mildenberger | 40o 44.194’ | 75O 29.608 |
Mildenberger | 40o 44.179 | 75 29.574 |
Miller | 37 48.023 | 88 53.449 |
Minnich | 40o 40.757’ | 75O 31.679' |
Minnich | 40o 40.759’ | 75O 31.679' |
Mory | 40o 33.585' | 75 23.776' |
Mory | 40o 33.586' | 75 23.776' |
Mory | 40o 33.586' | 75 23.774' |
Mory | 40o 33.585' | 75 23.776' |
Nagel | 40o 33.585' | 75 23.745' |
Nagel | 40o 44.191' | 75 29.603' |
Nagel | 41 13.033' | 75 57.329' |
Nagel | 40o39.575' | 75o25.555' |
Nagel | 40o39.577' | 75o25.549' |
Nagel | 40o 44.197' | 75O 29.605’ |
Nagel | 41 13.031' | 75 57.333' |
Nagel | 40o 39.575' | 75 25.555' |
Nagle | 38 44.582' | 82 58.978' |
Nagle | 38 44.582' | 82 58.978' |
Nagel | 38 44.582' | 82 58.978' |
Nagel | 38 44.582' | 82 58.978' |
Nagel | 38 44.582' | 82 58.978' |
Nagel | 38 44.582' | 82 58.978' |
NA | NA | NA |
Nutty | 37 25.674 | 88 54.020 |
Nutty | 37 25.682 | 88 54.020 |
Nutty | 37 25.678 | 88 54.020 |
Ritter | 37 52.861 | 88 39/178 |
Ritter | 37 52.861 | 88 39/178 |
Odom | 37 58.794 | 88 55.324 |
Odom | 37 58.795 | 88 55.326 |
Odom | 37 47.993 | 88 53.510 |
Odom | 37 47.994 | 88 53.510 |
NA | 37 47.992 | 88 53.506 |
Odum | NA | NA |
Odum | 37 47.187 | 88 50.175 |
Odum | 37 47.187 | 88 50.175 |
Peters | 37 47.244 | 88 55.354 |
Peters | 37 47.244 | 88 55.354 |
Pickard | 38 04.918 | 88 52.028 |
Pickard | 38 04.919 | 88 52.028 |
Pickard | 38 04.917 | 88 52.028 |
Pletz | 37 44.684 | 88 55.998 |
Russell | 37 44.683 | 88 55.998 |
Pickard | 38 04.918 | 88 54.028 |
Pickard | 38 04.919 | 88 54.028 |
Pickard | 38 04.917 | 88 54.028 |
Pulliam | 37 25.697 | 88 53.922 |
Pulliam | 37 25.697 | 88 53.922 |
Rex | 37 45.776 | 88 55.111 |
Rex | 37 45.776 | 88 55.111 |
Rex | 37 45.777 | 88 55.110 |
Rex | 37 45.777 | 88 55.115 |
Rex | 37 45.776 | 88 55.112 |
Rex | 37 45.774 | 88 55.109 |
Rex | 37 45.776 | 88 55.108 |
Rex | 37 45.776 | 88 55.108 |
Rex | 37 45.776 | 88 55.108 |
Rex | 32 22 549 | 90 52.100 |
Rex | 37 44.784 | 88 55.855 |
Rex | 37 44.785 | 88 55.855 |
Richardson | 37 44.766 | 88 55.776 |
Richardson | 37 44.787 | 88 55.775 |
Riegel | 37 49.828 | 88 35.346 |
Riegel | 37 49.828 | 88 35.346 |
Ritter | 37 52.853 | 88 39.174 |
Ritter | 37 52.853 | 88 39.174 |
Rockel | 40o 39.556' | 75 25.585' |
Rockel | 40o 39.555' | 75 25.585' |
Rockel | 40o 39.560' | 75 25.560' |
Rockel | 40o 39.560' | 75 25.559' |
Ross | 37 58.752 | 88 58.162 |
Ross | 37 58.752 | 88 58.162 |
Ruckel | NA | NA |
Ruckel | NA | NA |
Russell | 37 44.681 | 88 55.998 |
Russell | 37 44.682 | 88 55.998 |
NA | 37 44.682 | 88 55.994 |
NA | 37 44.682 | 88 55.994 |
NA | 37 44.682 | 88 55.994 |
Siliven | 37 28.189 | 88 48.007 |
Sinks | 36 14.451' | 86 43.526' |
Sinks | 37 54.081' | 88 54.293' |
Sinks | 37 54.081' | 88 54.293' |
Sinks | 37 54.089' | 88 54.207' |
Sinks | 37 52.619 | 88 55.430 |
Sinks | 37 52.619 | 88 55.430 |
Sinks | 37 52.619 | 88 55.430 |
Sinks | 37 47.989 | 88 53.489 |
Sinks | 37 47.989 | 88 53.489 |
Sinks | 37 47.986 | 88 53.489 |
Sinks | 37 47.985 | 88 53.491 |
Sinks | 37 47.984 | 88 53.490 |
Sinks | 37 47.984 | 88 53.490 |
Sinks | 37 47.984 | 88 53.488 |
Sinks | 37 47.982 | 88 53.491 |
Sinks | 37 48.024 | 88 53.464 |
Sinks | 37 48.024 | 88 53.463 |
Sinks | 37 48.020 | 88 53.463 |
Sinks | 37 44.702 | 88 55.998 |
Sweet | 37 44.704 | 88 55.995 |
Sinks | 37 44.702 | 88 55.997 |
Sinks | 37 44.704 | 88 55.998 |
Sinks | 37 44.704 | 88 55.998 |
Sinks | 38 33.836 | 89 07.580 |
Sinks | 38 33.837 | 89 07.579 |
Sinks | 38 33.917 | 89 07.572 |
Sinks | 38 02.272 | 88 50.161 |
Sinks | 37 44.770 | 88 55.779 |
Sinks | 37 44.770 | 88 55.779 |
Solt | 40o 48.686' | 75 37.120 |
Solt | 40o 48.693' | 75 37.119 |
Solt | 40o 48.690' | 75 37.113 |
Sfafford | 37 52.608 | 88 55.434 |
Sfafford | 37 52.608 | 88 55.434 |
Steen | 38 33.025 | 87 06.328 |
VanCleve | 37 25.694 | 88 53.921 |
VanCleve | 37 25.694 | 88 53.921 |
VanCleve | 37 33.397 | 88 46.363 |
VanCleve | 37 33.397 | 88 46.363 |
VanCleve | 37 33.397 | 88 46.363 |
VanCleave | 38 04.924 | 88 52.030 |
VanCleave | 38 04.924 | 88 52.030 |
Veach | 37 29.916 | 86 54.044 |
Veach | 37 29.916 | 86 54.044 |
Veach | 37 29.895 | 86 54.023 |
Veach | 37 29.895 | 86 54.022 |
Veach | 37 29.895 | 86 54.020 |
Veach | 37 25.692 | 88 53.942 |
Veach | 37 26.692 | 88 53.942 |
Veatch | 37 26.692 | 88 50.527 |
Veatch | 37 26.692 | 88 50.527 |
Veach | 37 26.693 | 88 50.540 |
Veach | 37 26.692 | 88 50.531 |
Veach | 37 26.692 | 88 50.531 |
Veach | 37 29.895' | 88 54.022 |
Veach | 37 29.897' | 88 54.022 |
Veatch | 37 28.187' | 88 49.000' |
Veatch | 37 28.187' | 88 48.999' |
Veatch | 37 28.187' | 88 49.001' |
Veatch | 37 28.186 | 88 48.005 |
Veach-Nutty | 37 25.682 | 88 54.017 |
Veach | 37 25.681 | 88 54.017 |
Veach | 37 25.679 | 88 54.017 |
Veach | 37 25.682 | 88 54.017 |
Ware | 37 52.856 | 88 39.186 |
Ware | 37 52.856 | 88 39.186 |
Ware | 37 52.867 | 88 39.176 |
Ware | 37 52.867 | 88 39.176 |
Webber | 37 49.829 | 88 35.336 |
Webber | 37 49.826 | 88 35.336 |
Weir | 36 15.064 | 86 11.669 |
Weir | 36 15.064 | 86 11.669 |
Wier | 37 49.208 | 88 46.787 |
Whiteside | 37 26.743 | 88 50.534 |
Whiteside | 37 26.743 | 88 50.534 |
Willis | 36 35.889 | 86 43.203 |
Wilson | 36 26.350 | 86 47.072 |
Wilson | 36 26.361 | 86 47.070 |
Wilson | 36 29.553 | 86 46.791 |
Wilson | 36 28.812 | 86 46.023 |
Wilson | 36 28.803 | 86 46.007 |
Wilson | NA | NA |
Wilson | 37 48.034 | 88 53.443 |
Wilson | 37 48.034 | 88 53.443 |
Wilson | 36 26.351 | 86 47.070 |
Wilson | 36 26.351 | 86 47.070 |
Wilson | 36 26.351 | 86 47.070 |
Wilson | 36 26.350 | 86 47.073 |
Wilson | 36 26.350 | 86 47.073 |
Wilson | 36 26.351 | 86 47.071 |
Wilson | NA | NA |
Wilson | NA | NA |
Wilson | NA | NA |
Wilson | NA | NA |
Wise | 37 50.352 | 88 31.612 |
Wollard | 37 54.076' | 88 54.322' |
Woolard | 37 54.075' | 88 54.322' |
Woolard | 37 58.721 | 88 55.211 |
Woolard | 37 58.721 | 88 55.211 |
Woolard | 37 58.721 | 88 55.211 |
Woolard | 37 58.723 | 88 55.212 |
Woolard | 37 58.721 | 88 55.213 |
Woolard | 37 58.720 | 88 55.213 |
Woolard | 37 58.720 | 88 55.213 |
Woolard | 37 51.394 | 88 41.745 |
Woolard | 37 51.395 | 88 41.746 |
Woolard | 37 51.396 | 88 41.747 |
Woolard | 37 51.391 | 88 41.742 |
Woolard | 37 51.397 | 88 41.741 |
Woolard | 37 52.853 | 88 39.160 |
Woolard | 37 52.853 | 88 39.161 |
Woolard | 37 52.853 | 88 39.160 |
Woolard | 37 52.853 | 88 39.159 |
Woolard | 37 52.854 | 88 39.160 |
Woolard | 37 51.742 | 88 52.935 |
Woolard | 37 51.742 | 88 52.935 |
Latitude and longitude data contains some stray degree and minute symbols. The degree symbol appears both as a straight and curved apostrophe and the degree symbols appear both as o and O. This cleaning needs to be done on both N and W columns. The str_replace_all()
function from stringr looks at a string, finds a pattern, and replaces it with a replacement. Here, the pattern is each of those symbols and the replacement is a space.
Styling Tables with gt
I’m using the gt package to format my tables. Here I’m not doing much styling, but it is super easy to make really nice tables with just a few lines of code.
I write and code in RStudio using Quarto. This allows you to alternate text and code chunks. You can run all the code chunks normally in RStudio or you can “render” the quarto document, which runs all the code chunks and produces the html page that becomes the page I publish on my website. When just running the code chunks, I get a table with scroll bars, but when rendering the webpage, I get a multi-page table that displays everything. This is fixed by specifying the size of the container for the table. With the container, the table is truncated to a few rows and a scroll bar appears. The container.padding
option just makes sure the data isn’t truncated in the middle of a row.
Cleaning up Typos in the GPS Data (strings)
I put all my cleaned data in a new dataframe. If something unexpected happens, I can check against the original data without having to reload it. I tend to use separate mutates for operation. I know it could be all in one mutate, but even when being careful about indents, I end up missing commas and parentheses as I add and remove steps. Individual mutates makes visually checking for syntax errors much easier for me.
<- tombstones_raw %>%
tombstones mutate(N = str_replace_all(N, pattern = "’", " ")) %>%
mutate(N = str_replace_all(N, pattern = "O", " ")) %>%
mutate(N = str_replace_all(N, pattern = "o", " ")) %>%
mutate(N = str_replace_all(N, pattern = "'", " ")) %>%
mutate(W = str_replace_all(W, pattern = "’", " ")) %>%
mutate(W = str_replace_all(W, pattern = "O", " ")) %>%
mutate(W = str_replace_all(W, pattern = "o", " ")) %>%
mutate(W = str_replace_all(W, pattern = "'", " "))
Look at the cleaned data.
%>%
tombstones select(Surname, First.Name, N, W) %>%
gt() %>%
tab_options(container.height = px(300), container.padding.y = px(24))
Surname | First.Name | N | W |
---|---|---|---|
Anderson | Abraham | 36 56.472 | 86 86.961 |
Anderson | Elizabeth | 36 56.472 | 86 86.961 |
Anderson | Zady | 37 53.396 | 88 41.321 |
Anderson | Albert | 37 52.856 | 88 39.163 |
Anderson | Adesia | 37 52.856 | 88 39.163 |
Anderson | May | 37 52.855 | 88 39.163 |
Anderson | E | 37 52.853 | 88 39.164 |
Anderson | William | 37 52.853 | 88 39.167 |
Anderson | Nancy | 37 52.852 | 88 39.165 |
Appleton | Richard | 36 29.552 | 86 46.793 |
Baldwin | John | 38 33.025 | 87 06.328 |
Baldwin | William | 38 33.025 | 87 06.328 |
Baggett | Mahalia | 36 29.553 | 86 46.793 |
Beasley | E | 36 35.891 | 86 43.204 |
Beasley | Josephine | 36 36.755 | 86 43.145 |
Beasley | Fanning | 36 36.755 | 86 43.145 |
Bell | John | 36 15.064 | 86 11.669 |
Bell | Mary | 36 15.064 | 86 11.669 |
Brazelton | Wm | 35 09.411 | 86 03.624 |
Brazelton | Esther | 35 09.410 | 86 03.624 |
Brown | Elizabeth | 40 40.760 | 75 31.705 |
Brown | Joel | 40 40.760 | 75 31.705 |
Bundy | Hope | 37 45.623 | |
Bundy | Clem | 37 53.380 | 88 44.474 |
Bundy | Nancy | 37 53.380 | 88 44.474 |
Bundy | W | 37 53.380 | 88 44.474 |
Bundy | Charles | 37 53.379 | 88 44.474 |
Bundy | Thomas | 37 52.875 | 88 39.118 |
Bundy | Octavia | 37 52.875 | 88 39.118 |
Bundy | George | 37 52.873 | 88 39.188 |
Bundy | Lora | 37 52.873 | 88 39.188 |
Burgess | W | 37 49.224 | 88 54.527 |
Burgess | Alzada | 37 49.224 | 88 54.527 |
Clayton | G | 37 50.788 | 88 50.968 |
Clayton | Ellen | 37 50.788 | 88 50.968 |
Clayton | L | 37 50.795 | 88 50.977 |
Clayton | Mary | 37 50.795 | 88 50.977 |
Chapman | Daniel | 37 29.894 | 88 54.045 |
Chapman | Elizabeth | 37 29.894 | 88 54.046 |
Chapman | Caroline | 37 25.692 | 88 53.951 |
Chapman | Daniel | 37 25.691 | 88 53.949 |
Chapman | Lucretia | 37 25.691 | 88 53.949 |
Chapman | Samuel | 37 25.692 | 88 53.951 |
Chapman | Elizabeth | 37 25.692 | 88 53.951 |
Chapman | Laura | 37 25.694 | 88 53.951 |
Chapman | Polly | 38 33.026 | 87 06.327 |
Crockett | Mandy | 36 22.801 | 86 45.985 |
Crockett | John | 36 22.804 | 86 45.984 |
Davis | Ezra | 37 44.682 | 88 55.994 |
Davis | Lizzie | 37 44.683 | 88 55.993 |
Davis | Fred | 36 14.260 | 86 43.129 |
Dolch | Catherine | 38 44.563 | 82 58.988 |
Dolch | Christian | 38 44.584 | 82 58.987 |
Dolch | Peter | 38 44.564 | 82 58.987 |
Doley | George | 38 44.615 | 82 58.882 |
Doley | Katie | 38 44.615 | 82 58.882 |
Doley | Mary E | 38 44.615 | 82 58.882 |
Doley | Henriettie | 38 44.615 | 82 58.882 |
NA | MED | 38 44.618 | 82 58.884 |
NA | HD | 38 44.618 | 82 58.884 |
NA | GD | 38 44.618 | 82 58.885 |
NA | Mother | 38 44.618 | 82 58.885 |
NA | Father | 38 44.618 | 82 58.886 |
Doley | George | 38 44.615 | 82 58.882 |
Doley | James | 38 44.610 | 82 58.923 |
Doley | May | 38 44.611 | 82 58.922 |
Doley | John | 38 44.611 | 82 59.012 |
Doley | Maggie | 38 44.611 | 82 59.013 |
Doley | William | 37 49.907 | 88 35.306 |
Doley | Dora | 37 49.907 | 88 35.306 |
Doley | L[eaman] | 37 49.907 | 88 35.306 |
Doley | G[uilford] | 37 58.810 | 88 55.084 |
Doley | D[ora] | 37 58.810 | 88 55.084 |
Doley | Eugene | 37 58.751 | 88 55.161 |
Doley | Lou | 37 58.751 | 88 55.161 |
Dorris | J[oseph] | 36 28.798 | 86 46.011 |
Dorris | Joseph | 36 28.811 | 86 46.008 |
Dorris | Sarah | 36 28.812 | 86 46.008 |
Dorris | W | 36 28.812 | 86 46.008 |
Dorris | A | 36 28.813 | 86 46.007 |
Dorris | J | NA | NA |
Dorris | Elizabeth | NA | NA |
Dorris | Robert | 36 26.485 | 86 48.329 |
Dorris | Rebecca | 36 26.484 | 86 48.329 |
Dorris | Monroe | 38 07.067 | 88 51.870 |
Dorris | Della | 38 07.067 | 88 51.870 |
Dorris | Mary M | 38 07.067 | 88 51.870 |
Dorris | Harve | 38 07.081 | 88 51.903 |
Dorris | Carrie | 38 07.081 | 88 51.903 |
Dorris | Smith | 37 54.310 | 88 58.084 |
Dorris | Ada | 37 54.309 | 88 58.083 |
Dorris | William | 37 54.309 | 88 58.083 |
Dorris | Harvey | 37 54.310 | 88 58.084 |
Dorris | Cora | 37 54.310 | 88 58.084 |
Dorris | John | 37 58.746 | 88 55.204 |
Dorris | W | 37 58.749 | 88 55.205 |
Dorris | Gustavus | 37 47.990 | 88 53.488 |
Dorris | Sarah | 37 47.988 | 88 53.489 |
Dorris | Joseph | 37 51.571 | 88 54.939 |
Dorris | Della | 37 51.571 | 88 54.939 |
Dorris | William | 37 50.787 | 88 50.972 |
Dorris | Harriet | 37 50.788 | 88 50.971 |
Dorris | William | 37 50.794 | 88 50.975 |
Dorris | Mary | 37 50.794 | 88 50.975 |
Dorris | James | 37 50.786 | 88 50.974 |
Dorris | Sarah | 37 50.771 | 88 50.986 |
Dorris | W[illiam] | 37 50.783 | 88 50.983 |
Dorris | E[lisha] | 37 50.775 | 88 50.986 |
Dorris | Sarah | 37 50.775 | 88 50.986 |
Dorris | James | 37 50.775 | 88 50.980 |
Dorris | Georgia | 37 50.775 | 88 50.980 |
Dorris | William | 524 | 783 |
Dorris | Malinda | 528 | 783 |
Drake | Mary | 36 35.870 | 86 43.184 |
Dreisbach | Catherina | 40 44.177 | 75 29.596 |
Dreisbach | Johannes | 40 44.177 | 75 29.593 |
Everett | Semantha | 38 33.026 | 87 06.327 |
Farris | Elizabeth | 37 24.687 | 88 50.538 |
Farris | Elizabeth | 37 24.678 | 88 50.538 |
Finch | Isaac | 44 34.662 | 37 27.129 |
Follis | Fawn | 37 51.764 | 88 56.897 |
Follis | Ralph | 37 51.758 | 88 56.894 |
Follis | A | 37 51.761 | 88 56.893 |
Follis | Christian | 37 51.761 | 88 56.896 |
Follis | G | 37 51.759 | 88 56.895 |
Follis | Ralph | 37 51. | 88 56 |
Follis | E | 37 51.758 | 88 56.896 |
Follis | William | 37 51.758 | 88 56.901 |
Follis | Martha | 37 51.758 | 88 56.901 |
Follis | Jeff | 37 51.758 | 88 56.904 |
Ford | Florence | 37 52.851 | 88 39.161 |
Fox | Frances | 37 48.023 | 88 53.449 |
Frost | Ebenezer | 37 17.909 | 87 28.852 |
NA | NA | 37 17.910 | 87 28.852 |
Frost | NA | 37 17.909 | 87 28.854 |
Fuqua | William | 36 38.189 | 86 51.516 |
Gregory | Leonard | 38 44.609 | 82 58.922 |
Gregory | Lucille | 38 44.611 | 82 58.922 |
Hart | Parmelia | 37 51.757 | 88 56.900 |
Hess | Amalphus | 37 25.687 | 88 53.947 |
Hess | Adolphus | 37 25.687 | 88 53.949 |
Hess | Samuel | 37 25.688 | 88 53.952 |
Hess | Augusta | 37 25.688 | 88 53.952 |
Hess | Ulysses | 37 25.688 | 88 53.952 |
Hess | Ulysses | 37 25.687 | 88 53.947 |
Hess | William | 37 25.688 | 88 53.952 |
Hess | William | 37 25.687 | 88 53.948 |
Hess | Jerome | 37 25.689 | 88 53.952 |
Hess | Franklin | 37 25.689 | 88 53.952 |
Hess | Samuel | 37 25.693 | 88 53.949 |
Hess | Bernice | 37 25.693 | 88 53.947 |
Hess | Catherine | 37 25.693 | 88 53.949 |
Hess | George | 37 25.690 | 88 53.951 |
Holt | Lucinda | NA | NA |
Holt | William | NA | NA |
Horlacher | Daniel | 40 30.928 | 75 25.072 |
Horlacher | Margaretha | 40 30.930 | 75 25.070 |
Horrall | Polly | 37 54.090 | 88 54.218 |
Horrall | James | 38 33.026 | 87 06.326 |
Horrall | William | 38 36.963 | 87 11.369 |
Hurt | Elizabeth | 36 28.804 | 86 46.007 |
Jacobs | Jeremiah | 38 21.315 | 85 41.307 |
Jacobs | Rebecca | 38 21.317 | 85 41.306 |
Johnson | James | 37 52.872 | 88 39.183 |
Johnson | Mary | 37 52.872 | 88 39.183 |
Jones | Levi | 37 47.994 | 88 53.504 |
Jones | Hester | 37 47.994 | 88 53.504 |
Jones | Ridley | 37 47.997 | 88 53.483 |
Jones | James | 37 47.995 | 88 53.483 |
Jones | Tina | 37 47.995 | 88 53.483 |
Jones | Ezra | 37 48.024 | 88 53.451 |
Jones | Nannie | 37 48.024 | 88 53.451 |
Jones | Samuel | 37 48.020 | 88 53.465 |
Jones | Melverda | 37 48.020 | 88 53.465 |
Jones | John | 37 51.747 | 88 52.933 |
Karnes | Willard | 37 58.749 | 88 55.161 |
Karnes | Ruth | 37 58.749 | 88 55.161 |
Keith | James | NA | NA |
Keth | Nancy | 35 09.410 | 86 03.624 |
Kleppinger | Anna | 40 44.178 | 75 29.601 |
Lipsey | Joe | 38 33.917 | 89 07.571 |
Lockwood | Eugenia | NA | NA |
Lockwood | Leland | NA | NA |
Loomis | Jon | 37 36.925 | 89 12.220 |
Mensch | Abraham | 40 39.557 | 75 25.586 |
Merrell | Azariah | 35 43.945 | 80 18.669 |
Merrell | Abigail | 35 43.942 | 80 18.671 |
Meredith | Eleandra | 39 41.114 | 76 35.858 |
Meredith | Micajah | 39 41.115 | 76 35.855 |
Meredith | Samuel | 39 41.116 | 76 35.855 |
Meredith | Elizabeth | 39 41.116 | 76 35.854 |
Meredith | Ruth | 39 41.117 | 76 35.853 |
Bell | Sarah | 39 41.117 | 76 35.853 |
John | Bell | 39 41.117 | 76 35.853 |
Meredith | Mary | 39 41.118 | 76 35.852 |
Meredith | Clarence | 39 41.112 | 76 35.857 |
Meredith | Cora | 39 41.112 | 76 35.857 |
Meredith | W | 39 41.112 | 76 35.856 |
Meredith | Susan | 39 41.112 | 76 35.856 |
Meredith | Hannah | 39 41.112 | 76 35.855 |
Meredith | Mary | 39 41.112 | 76 35.854 |
Meredith | Samuel | 39 41.113 | 76 35.855 |
Meredith | Belinda | 39 41.113 | 76 35.855 |
Tipton | Susannah | 39 41.114 | 76 35.855 |
Meredith | Thomas | 39 41.114 | 76 35.854 |
Meredith | Sarah | 39 41.114 | 76 35.854 |
Mildenberger | Anna | 40 44.194 | 75 29.608 |
Mildenberger | Nicolaus | 40 44.179 | 75 29.574 |
Miller | Myrtie | 37 48.023 | 88 53.449 |
Minnich | Elizabeth | 40 40.757 | 75 31.679 |
Minnich | John | 40 40.759 | 75 31.679 |
Mory | Catherina | 40 33.585 | 75 23.776 |
Mory | Gotthard | 40 33.586 | 75 23.776 |
Mory | Magdelena | 40 33.586 | 75 23.774 |
Mory | Peter | 40 33.585 | 75 23.776 |
Nagel | Anna | 40 33.585 | 75 23.745 |
Nagel | Anna | 40 44.191 | 75 29.603 |
Nagel | Caty | 41 13.033 | 75 57.329 |
Nagel | Daniel | 40 39.575 | 75 25.555 |
Nagel | Frederick | 40 39.577 | 75 25.549 |
Nagel | Friedrich | 40 44.197 | 75 29.605 |
Nagel | Johann | 41 13.031 | 75 57.333 |
Nagel | Maria | 40 39.575 | 75 25.555 |
Nagle | John | 38 44.582 | 82 58.978 |
Nagle | Mary | 38 44.582 | 82 58.978 |
Nagel | Henry | 38 44.582 | 82 58.978 |
Nagel | Mary | 38 44.582 | 82 58.978 |
Nagel | Will | 38 44.582 | 82 58.978 |
Nagel | Adeline | 38 44.582 | 82 58.978 |
NA | NA | NA | NA |
Nutty | John | 37 25.674 | 88 54.020 |
Nutty | Beatrice | 37 25.682 | 88 54.020 |
Nutty | John | 37 25.678 | 88 54.020 |
Ritter | NA | 37 52.861 | 88 39/178 |
Ritter | NA | 37 52.861 | 88 39/178 |
Odom | Archibald | 37 58.794 | 88 55.324 |
Odom | Cynthia | 37 58.795 | 88 55.326 |
Odom | G | 37 47.993 | 88 53.510 |
Odom | Sarah | 37 47.994 | 88 53.510 |
NA | Thomas | 37 47.992 | 88 53.506 |
Odum | Britton | NA | NA |
Odum | Wiley | 37 47.187 | 88 50.175 |
Odum | Sallie A | 37 47.187 | 88 50.175 |
Peters | Daniel | 37 47.244 | 88 55.354 |
Peters | Charlotte | 37 47.244 | 88 55.354 |
Pickard | William | 38 04.918 | 88 52.028 |
Pickard | Harriet | 38 04.919 | 88 52.028 |
Pickard | Louise | 38 04.917 | 88 52.028 |
Pletz | Karl | 37 44.684 | 88 55.998 |
Russell | Caroline | 37 44.683 | 88 55.998 |
Pickard | William | 38 04.918 | 88 54.028 |
Pickard | Harriet | 38 04.919 | 88 54.028 |
Pickard | Louise | 38 04.917 | 88 54.028 |
Pulliam | Frieda | 37 25.697 | 88 53.922 |
Pulliam | Amos | 37 25.697 | 88 53.922 |
Rex | William | 37 45.776 | 88 55.111 |
Rex | Elmina | 37 45.776 | 88 55.111 |
Rex | Mamie | 37 45.777 | 88 55.110 |
Rex | George | 37 45.777 | 88 55.115 |
Rex | Bertie | 37 45.776 | 88 55.112 |
Rex | Lulie | 37 45.774 | 88 55.109 |
Rex | Lily | 37 45.776 | 88 55.108 |
Rex | Arthur | 37 45.776 | 88 55.108 |
Rex | George | 37 45.776 | 88 55.108 |
Rex | Jno | 32 22 549 | 90 52.100 |
Rex | Guy | 37 44.784 | 88 55.855 |
Rex | Harlie | 37 44.785 | 88 55.855 |
Richardson | Annabelle | 37 44.766 | 88 55.776 |
Richardson | Alfred | 37 44.787 | 88 55.775 |
Riegel | Solomon | 37 49.828 | 88 35.346 |
Riegel | Catherine | 37 49.828 | 88 35.346 |
Ritter | J | 37 52.853 | 88 39.174 |
Ritter | Mary | 37 52.853 | 88 39.174 |
Rockel | Balzer | 40 39.556 | 75 25.585 |
Rockel | Elisabetha | 40 39.555 | 75 25.585 |
Rockel | Johannes | 40 39.560 | 75 25.560 |
Rockel | Elizabeth | 40 39.560 | 75 25.559 |
Ross | George | 37 58.752 | 88 58.162 |
Ross | Euna | 37 58.752 | 88 58.162 |
Ruckel | Mary | NA | NA |
Ruckel | Melchir | NA | NA |
Russell | James | 37 44.681 | 88 55.998 |
Russell | Ana | 37 44.682 | 88 55.998 |
NA | NA | 37 44.682 | 88 55.994 |
NA | NA | 37 44.682 | 88 55.994 |
NA | NA | 37 44.682 | 88 55.994 |
Siliven | Jenniel | 37 28.189 | 88 48.007 |
Sinks | A | 36 14.451 | 86 43.526 |
Sinks | Francis | 37 54.081 | 88 54.293 |
Sinks | Delphia | 37 54.081 | 88 54.293 |
Sinks | Salem | 37 54.089 | 88 54.207 |
Sinks | Daniel | 37 52.619 | 88 55.430 |
Sinks | Martha | 37 52.619 | 88 55.430 |
Sinks | Roy | 37 52.619 | 88 55.430 |
Sinks | Elizabeth | 37 47.989 | 88 53.489 |
Sinks | infant son | 37 47.989 | 88 53.489 |
Sinks | John | 37 47.986 | 88 53.489 |
Sinks | Mary | 37 47.985 | 88 53.491 |
Sinks | William | 37 47.984 | 88 53.490 |
Sinks | Charlotte | 37 47.984 | 88 53.490 |
Sinks | Anna | 37 47.984 | 88 53.488 |
Sinks | Leonard | 37 47.982 | 88 53.491 |
Sinks | Etta Faye | 37 48.024 | 88 53.464 |
Sinks | John | 37 48.024 | 88 53.463 |
Sinks | Sena | 37 48.020 | 88 53.463 |
Sinks | William | 37 44.702 | 88 55.998 |
Sweet | Jewell | 37 44.704 | 88 55.995 |
Sinks | Francis | 37 44.702 | 88 55.997 |
Sinks | Arlie | 37 44.704 | 88 55.998 |
Sinks | Viola | 37 44.704 | 88 55.998 |
Sinks | Leonard | 38 33.836 | 89 07.580 |
Sinks | Mae | 38 33.837 | 89 07.579 |
Sinks | Bessie | 38 33.917 | 89 07.572 |
Sinks | Caroline | 38 02.272 | 88 50.161 |
Sinks | Arlie | 37 44.770 | 88 55.779 |
Sinks | Eva | 37 44.770 | 88 55.779 |
Solt | Conrad | 40 48.686 | 75 37.120 |
Solt | Conrad | 40 48.693 | 75 37.119 |
Solt | Maria | 40 48.690 | 75 37.113 |
Sfafford | Trice | 37 52.608 | 88 55.434 |
Sfafford | Phebe | 37 52.608 | 88 55.434 |
Steen | Richard | 38 33.025 | 87 06.328 |
VanCleve | Martin | 37 25.694 | 88 53.921 |
VanCleve | Florence | 37 25.694 | 88 53.921 |
VanCleve | W | 37 33.397 | 88 46.363 |
VanCleve | Nancy | 37 33.397 | 88 46.363 |
VanCleve | J | 37 33.397 | 88 46.363 |
VanCleave | W | 38 04.924 | 88 52.030 |
VanCleave | Elizabeth | 38 04.924 | 88 52.030 |
Veach | Pleasant | 37 29.916 | 86 54.044 |
Veach | Victoria | 37 29.916 | 86 54.044 |
Veach | Ward | 37 29.895 | 86 54.023 |
Veach | Cynthia | 37 29.895 | 86 54.022 |
Veach | James | 37 29.895 | 86 54.020 |
Veach | James | 37 25.692 | 88 53.942 |
Veach | Nannie | 37 26.692 | 88 53.942 |
Veatch | John | 37 26.692 | 88 50.527 |
Veatch | Eleanor | 37 26.692 | 88 50.527 |
Veach | William | 37 26.693 | 88 50.540 |
Veach | James | 37 26.692 | 88 50.531 |
Veach | Rachel | 37 26.692 | 88 50.531 |
Veach | Pleasant | 37 29.895 | 88 54.022 |
Veach | Mary | 37 29.897 | 88 54.022 |
Veatch | Parmelia | 37 28.187 | 88 49.000 |
Veatch | Mary | 37 28.187 | 88 48.999 |
Veatch | Elnor | 37 28.187 | 88 49.001 |
Veatch | Frelin | 37 28.186 | 88 48.005 |
Veach-Nutty | NA | 37 25.682 | 88 54.017 |
Veach | John | 37 25.681 | 88 54.017 |
Veach | Rose | 37 25.679 | 88 54.017 |
Veach | Ruth | 37 25.682 | 88 54.017 |
Ware | Turner | 37 52.856 | 88 39.186 |
Ware | Martha | 37 52.856 | 88 39.186 |
Ware | Joseph | 37 52.867 | 88 39.176 |
Ware | Caroline | 37 52.867 | 88 39.176 |
Webber | Dick | 37 49.829 | 88 35.336 |
Webber | Pearl | 37 49.826 | 88 35.336 |
Weir | James | 36 15.064 | 86 11.669 |
Weir | Mary | 36 15.064 | 86 11.669 |
Wier | Leticia | 37 49.208 | 88 46.787 |
Whiteside | Lucinda | 37 26.743 | 88 50.534 |
Whiteside | John | 37 26.743 | 88 50.534 |
Willis | Matha | 36 35.889 | 86 43.203 |
Wilson | Jessie | 36 26.350 | 86 47.072 |
Wilson | Mary | 36 26.361 | 86 47.070 |
Wilson | Joseph | 36 29.553 | 86 46.791 |
Wilson | Elisha | 36 28.812 | 86 46.023 |
Wilson | Sallie | 36 28.803 | 86 46.007 |
Wilson | Lutetita | NA | NA |
Wilson | Thomas | 37 48.034 | 88 53.443 |
Wilson | Sarah | 37 48.034 | 88 53.443 |
Wilson | Elisha | 36 26.351 | 86 47.070 |
Wilson | Martha | 36 26.351 | 86 47.070 |
Wilson | Charles | 36 26.351 | 86 47.070 |
Wilson | Zack | 36 26.350 | 86 47.073 |
Wilson | Juritha | 36 26.350 | 86 47.073 |
Wilson | Elisha | 36 26.351 | 86 47.071 |
Wilson | Drury | NA | NA |
Wilson | Mary | NA | NA |
Wilson | Sandifer | NA | NA |
Wilson | Nancy | NA | NA |
Wise | Luvena | 37 50.352 | 88 31.612 |
Wollard | John | 37 54.076 | 88 54.322 |
Woolard | Nettie | 37 54.075 | 88 54.322 |
Woolard | Millie | 37 58.721 | 88 55.211 |
Woolard | Lawrence | 37 58.721 | 88 55.211 |
Woolard | Etta | 37 58.721 | 88 55.211 |
Woolard | John | 37 58.723 | 88 55.212 |
Woolard | James | 37 58.721 | 88 55.213 |
Woolard | C | 37 58.720 | 88 55.213 |
Woolard | Blanche | 37 58.720 | 88 55.213 |
Woolard | L | 37 51.394 | 88 41.745 |
Woolard | Ama | 37 51.395 | 88 41.746 |
Woolard | Robert | 37 51.396 | 88 41.747 |
Woolard | James | 37 51.391 | 88 41.742 |
Woolard | Romey | 37 51.397 | 88 41.741 |
Woolard | Anna | 37 52.853 | 88 39.160 |
Woolard | James | 37 52.853 | 88 39.161 |
Woolard | Francis | 37 52.853 | 88 39.160 |
Woolard | Turner | 37 52.853 | 88 39.159 |
Woolard | William | 37 52.854 | 88 39.160 |
Woolard | George | 37 51.742 | 88 52.935 |
Woolard | Nancy | 37 51.742 | 88 52.935 |
Much better. There is some missing data, encoded both as blanks and as NAs. There are also some coordinates that don’t make sense, like 524 (for the entry Dorris William). This will need to be dealt with.
Converting to Decimal Coordinates (Numeric)
Next, I’m converting the N and W data to decimal latitude and longitude. S/W should be “-” and N/E should be “+”. I split the degree/minute/second data into parts and then do the conversion. I delete the intermediate components when done. I used str_split_fixed()
here, which stores the parts in a matrix in your dataframe, hence the indexing to access the parts. The related function str_split()
returns a list. Both functions take the string, a pattern. str_split_fixed()
also requires the number of parts (n
) to split into. If it doesn’t find that many parts it will store a blank (““) rather than fail. More info about the str_split
family can be found here. (A function like separate()
would be more straightforward for this application. I originally included another example here where I use separate, so both methods were illustrated, but I have moved that to a module of this project that isn’t posted yet.)
I want to break a coordinate into 3 parts. So 37 25.687 becomes 37 25 and 687. First I break the coordinate into two parts, using the space as the separator. So 37 and 25.687. I then coerce the first part (which is the degree part of the coordinate) into a numeric. I then split the second part ( 25.687) using the . as the separator and again coerce the results into numbers. The coercion does lead to warning about the generation of NAs during the process, but that is fine. I know not all the data is numeric- there were blanks and NAs to start with. Lastly, I convert my degree, minute, second coordinates to decimal coordinates using the formula degree + minute/60 + second/3600.
Escaping Characters in stringr
It is important to note that stringr defaults to considering that patterns are written in regular expressions (regex). This means some characters are special and require escaping in the pattern. The period is one such character and the correct pattern is “\\.” Otherwise, using “.” will match to every character. The stringr cheat sheet has a high level overview of regular expressions on the second page.
Using Selectors from dplyr
I named all the original output from the string splits such that they contained the word “part” and I can easily remove them using a helper from dplyr, in this case, contains
. I highly recommend using some sort of naming scheme for intermediate variables/ fields so they can be easily removed in one go without lots of typing. I retain the original and the numeric parts so I can double check the results.
<- tombstones %>%
tombstones mutate(part1N = str_split_fixed(N, pattern = " ", n = 2) ) %>%
mutate(N_degree = as.numeric(part1N[,1])) %>%
mutate(part2N = str_split_fixed(part1N[,2], pattern = '\\.', n = 2)) %>%
mutate(N_minute = as.numeric(part2N[,1])) %>%
mutate(N_second = as.numeric(part2N[,2])) %>%
mutate(lat = N_degree + N_minute/60 + N_second/3600)
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `N_minute = as.numeric(part2N[, 1])`.
Caused by warning:
! NAs introduced by coercion
#converting to decimal longitude
<- tombstones %>%
tombstones mutate(part1W = str_split_fixed(W, pattern = " ", n = 2) ) %>%
mutate(W_degree = as.numeric(part1W[,1])) %>%
mutate(part2W = str_split_fixed(part1W[,2], pattern = '\\.', n = 2)) %>%
mutate(W_minute = as.numeric(part2W[,1])) %>%
mutate(W_second = as.numeric(part2W[,2])) %>%
mutate(long = -(W_degree + W_minute/60 + W_second/3600))
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `W_minute = as.numeric(part2W[, 1])`.
Caused by warning:
! NAs introduced by coercion
<- tombstones %>%
tombstones select(-contains("part"))
Taking a quick look at the results
%>%
tombstones select(Surname, First.Name, N, N_degree, N_minute, N_second, lat) %>%
gt() %>%
tab_options(container.height = px(300), container.padding.y = px(24))
Surname | First.Name | N | N_degree | N_minute | N_second | lat |
---|---|---|---|---|---|---|
Anderson | Abraham | 36 56.472 | 36 | 56 | 472 | 37.06444 |
Anderson | Elizabeth | 36 56.472 | 36 | 56 | 472 | 37.06444 |
Anderson | Zady | 37 53.396 | 37 | 53 | 396 | 37.99333 |
Anderson | Albert | 37 52.856 | 37 | 52 | 856 | 38.10444 |
Anderson | Adesia | 37 52.856 | 37 | 52 | 856 | 38.10444 |
Anderson | May | 37 52.855 | 37 | 52 | 855 | 38.10417 |
Anderson | E | 37 52.853 | 37 | 52 | 853 | 38.10361 |
Anderson | William | 37 52.853 | 37 | 52 | 853 | 38.10361 |
Anderson | Nancy | 37 52.852 | 37 | 52 | 852 | 38.10333 |
Appleton | Richard | 36 29.552 | 36 | 29 | 552 | 36.63667 |
Baldwin | John | 38 33.025 | 38 | 33 | 25 | 38.55694 |
Baldwin | William | 38 33.025 | 38 | 33 | 25 | 38.55694 |
Baggett | Mahalia | 36 29.553 | 36 | 29 | 553 | 36.63694 |
Beasley | E | 36 35.891 | 36 | 35 | 891 | 36.83083 |
Beasley | Josephine | 36 36.755 | 36 | 36 | 755 | 36.80972 |
Beasley | Fanning | 36 36.755 | 36 | 36 | 755 | 36.80972 |
Bell | John | 36 15.064 | 36 | 15 | 64 | 36.26778 |
Bell | Mary | 36 15.064 | 36 | 15 | 64 | 36.26778 |
Brazelton | Wm | 35 09.411 | 35 | 9 | 411 | 35.26417 |
Brazelton | Esther | 35 09.410 | 35 | 9 | 410 | 35.26389 |
Brown | Elizabeth | 40 40.760 | 40 | 40 | 760 | 40.87778 |
Brown | Joel | 40 40.760 | 40 | 40 | 760 | 40.87778 |
Bundy | Hope | 37 45.623 | 37 | 45 | 623 | 37.92306 |
Bundy | Clem | 37 53.380 | 37 | 53 | 380 | 37.98889 |
Bundy | Nancy | 37 53.380 | 37 | 53 | 380 | 37.98889 |
Bundy | W | 37 53.380 | 37 | 53 | 380 | 37.98889 |
Bundy | Charles | 37 53.379 | 37 | 53 | 379 | 37.98861 |
Bundy | Thomas | 37 52.875 | 37 | 52 | 875 | 38.10972 |
Bundy | Octavia | 37 52.875 | 37 | 52 | 875 | 38.10972 |
Bundy | George | 37 52.873 | 37 | 52 | 873 | 38.10917 |
Bundy | Lora | 37 52.873 | 37 | 52 | 873 | 38.10917 |
Burgess | W | 37 49.224 | 37 | 49 | 224 | 37.87889 |
Burgess | Alzada | 37 49.224 | 37 | 49 | 224 | 37.87889 |
Clayton | G | 37 50.788 | 37 | 50 | 788 | 38.05222 |
Clayton | Ellen | 37 50.788 | 37 | 50 | 788 | 38.05222 |
Clayton | L | 37 50.795 | 37 | 50 | 795 | 38.05417 |
Clayton | Mary | 37 50.795 | 37 | 50 | 795 | 38.05417 |
Chapman | Daniel | 37 29.894 | 37 | 29 | 894 | 37.73167 |
Chapman | Elizabeth | 37 29.894 | 37 | 29 | 894 | 37.73167 |
Chapman | Caroline | 37 25.692 | 37 | 25 | 692 | 37.60889 |
Chapman | Daniel | 37 25.691 | 37 | 25 | 691 | 37.60861 |
Chapman | Lucretia | 37 25.691 | 37 | 25 | 691 | 37.60861 |
Chapman | Samuel | 37 25.692 | 37 | 25 | 692 | 37.60889 |
Chapman | Elizabeth | 37 25.692 | 37 | 25 | 692 | 37.60889 |
Chapman | Laura | 37 25.694 | 37 | 25 | 694 | 37.60944 |
Chapman | Polly | 38 33.026 | 38 | 33 | 26 | 38.55722 |
Crockett | Mandy | 36 22.801 | 36 | 22 | 801 | 36.58917 |
Crockett | John | 36 22.804 | 36 | 22 | 804 | 36.59000 |
Davis | Ezra | 37 44.682 | 37 | 44 | 682 | 37.92278 |
Davis | Lizzie | 37 44.683 | 37 | 44 | 683 | 37.92306 |
Davis | Fred | 36 14.260 | 36 | 14 | 260 | 36.30556 |
Dolch | Catherine | 38 44.563 | 38 | 44 | 563 | 38.88972 |
Dolch | Christian | 38 44.584 | 38 | 44 | 584 | 38.89556 |
Dolch | Peter | 38 44.564 | 38 | 44 | 564 | 38.89000 |
Doley | George | 38 44.615 | 38 | 44 | 615 | 38.90417 |
Doley | Katie | 38 44.615 | 38 | 44 | 615 | 38.90417 |
Doley | Mary E | 38 44.615 | 38 | 44 | 615 | 38.90417 |
Doley | Henriettie | 38 44.615 | 38 | 44 | 615 | 38.90417 |
NA | MED | 38 44.618 | 38 | 44 | 618 | 38.90500 |
NA | HD | 38 44.618 | 38 | 44 | 618 | 38.90500 |
NA | GD | 38 44.618 | 38 | 44 | 618 | 38.90500 |
NA | Mother | 38 44.618 | 38 | 44 | 618 | 38.90500 |
NA | Father | 38 44.618 | 38 | 44 | 618 | 38.90500 |
Doley | George | 38 44.615 | 38 | 44 | 615 | 38.90417 |
Doley | James | 38 44.610 | 38 | 44 | 610 | 38.90278 |
Doley | May | 38 44.611 | 38 | 44 | 611 | 38.90306 |
Doley | John | 38 44.611 | 38 | 44 | 611 | 38.90306 |
Doley | Maggie | 38 44.611 | 38 | 44 | 611 | 38.90306 |
Doley | William | 37 49.907 | 37 | 49 | 907 | 38.06861 |
Doley | Dora | 37 49.907 | 37 | 49 | 907 | 38.06861 |
Doley | L[eaman] | 37 49.907 | 37 | 49 | 907 | 38.06861 |
Doley | G[uilford] | 37 58.810 | 37 | 58 | 810 | 38.19167 |
Doley | D[ora] | 37 58.810 | 37 | 58 | 810 | 38.19167 |
Doley | Eugene | 37 58.751 | 37 | 58 | 751 | 38.17528 |
Doley | Lou | 37 58.751 | 37 | 58 | 751 | 38.17528 |
Dorris | J[oseph] | 36 28.798 | 36 | 28 | 798 | 36.68833 |
Dorris | Joseph | 36 28.811 | 36 | 28 | 811 | 36.69194 |
Dorris | Sarah | 36 28.812 | 36 | 28 | 812 | 36.69222 |
Dorris | W | 36 28.812 | 36 | 28 | 812 | 36.69222 |
Dorris | A | 36 28.813 | 36 | 28 | 813 | 36.69250 |
Dorris | J | NA | NA | NA | NA | NA |
Dorris | Elizabeth | NA | NA | NA | NA | NA |
Dorris | Robert | 36 26.485 | 36 | 26 | 485 | 36.56806 |
Dorris | Rebecca | 36 26.484 | 36 | 26 | 484 | 36.56778 |
Dorris | Monroe | 38 07.067 | 38 | 7 | 67 | 38.13528 |
Dorris | Della | 38 07.067 | 38 | 7 | 67 | 38.13528 |
Dorris | Mary M | 38 07.067 | 38 | 7 | 67 | 38.13528 |
Dorris | Harve | 38 07.081 | 38 | 7 | 81 | 38.13917 |
Dorris | Carrie | 38 07.081 | 38 | 7 | 81 | 38.13917 |
Dorris | Smith | 37 54.310 | 37 | 54 | 310 | 37.98611 |
Dorris | Ada | 37 54.309 | 37 | 54 | 309 | 37.98583 |
Dorris | William | 37 54.309 | 37 | 54 | 309 | 37.98583 |
Dorris | Harvey | 37 54.310 | 37 | 54 | 310 | 37.98611 |
Dorris | Cora | 37 54.310 | 37 | 54 | 310 | 37.98611 |
Dorris | John | 37 58.746 | 37 | 58 | 746 | 38.17389 |
Dorris | W | 37 58.749 | 37 | 58 | 749 | 38.17472 |
Dorris | Gustavus | 37 47.990 | 37 | 47 | 990 | 38.05833 |
Dorris | Sarah | 37 47.988 | 37 | 47 | 988 | 38.05778 |
Dorris | Joseph | 37 51.571 | 37 | 51 | 571 | 38.00861 |
Dorris | Della | 37 51.571 | 37 | 51 | 571 | 38.00861 |
Dorris | William | 37 50.787 | 37 | 50 | 787 | 38.05194 |
Dorris | Harriet | 37 50.788 | 37 | 50 | 788 | 38.05222 |
Dorris | William | 37 50.794 | 37 | 50 | 794 | 38.05389 |
Dorris | Mary | 37 50.794 | 37 | 50 | 794 | 38.05389 |
Dorris | James | 37 50.786 | 37 | 50 | 786 | 38.05167 |
Dorris | Sarah | 37 50.771 | 37 | 50 | 771 | 38.04750 |
Dorris | W[illiam] | 37 50.783 | 37 | 50 | 783 | 38.05083 |
Dorris | E[lisha] | 37 50.775 | 37 | 50 | 775 | 38.04861 |
Dorris | Sarah | 37 50.775 | 37 | 50 | 775 | 38.04861 |
Dorris | James | 37 50.775 | 37 | 50 | 775 | 38.04861 |
Dorris | Georgia | 37 50.775 | 37 | 50 | 775 | 38.04861 |
Dorris | William | 524 | 524 | NA | NA | NA |
Dorris | Malinda | 528 | 528 | NA | NA | NA |
Drake | Mary | 36 35.870 | 36 | 35 | 870 | 36.82500 |
Dreisbach | Catherina | 40 44.177 | 40 | 44 | 177 | 40.78250 |
Dreisbach | Johannes | 40 44.177 | 40 | 44 | 177 | 40.78250 |
Everett | Semantha | 38 33.026 | 38 | 33 | 26 | 38.55722 |
Farris | Elizabeth | 37 24.687 | 37 | 24 | 687 | 37.59083 |
Farris | Elizabeth | 37 24.678 | 37 | 24 | 678 | 37.58833 |
Finch | Isaac | 44 34.662 | 44 | 34 | 662 | 44.75056 |
Follis | Fawn | 37 51.764 | 37 | 51 | 764 | 38.06222 |
Follis | Ralph | 37 51.758 | 37 | 51 | 758 | 38.06056 |
Follis | A | 37 51.761 | 37 | 51 | 761 | 38.06139 |
Follis | Christian | 37 51.761 | 37 | 51 | 761 | 38.06139 |
Follis | G | 37 51.759 | 37 | 51 | 759 | 38.06083 |
Follis | Ralph | 37 51. | 37 | 51 | NA | NA |
Follis | E | 37 51.758 | 37 | 51 | 758 | 38.06056 |
Follis | William | 37 51.758 | 37 | 51 | 758 | 38.06056 |
Follis | Martha | 37 51.758 | 37 | 51 | 758 | 38.06056 |
Follis | Jeff | 37 51.758 | 37 | 51 | 758 | 38.06056 |
Ford | Florence | 37 52.851 | 37 | 52 | 851 | 38.10306 |
Fox | Frances | 37 48.023 | 37 | 48 | 23 | 37.80639 |
Frost | Ebenezer | 37 17.909 | 37 | 17 | 909 | 37.53583 |
NA | NA | 37 17.910 | 37 | 17 | 910 | 37.53611 |
Frost | NA | 37 17.909 | 37 | 17 | 909 | 37.53583 |
Fuqua | William | 36 38.189 | 36 | 38 | 189 | 36.68583 |
Gregory | Leonard | 38 44.609 | 38 | 44 | 609 | 38.90250 |
Gregory | Lucille | 38 44.611 | 38 | 44 | 611 | 38.90306 |
Hart | Parmelia | 37 51.757 | 37 | 51 | 757 | 38.06028 |
Hess | Amalphus | 37 25.687 | 37 | 25 | 687 | 37.60750 |
Hess | Adolphus | 37 25.687 | 37 | 25 | 687 | 37.60750 |
Hess | Samuel | 37 25.688 | 37 | 25 | 688 | 37.60778 |
Hess | Augusta | 37 25.688 | 37 | 25 | 688 | 37.60778 |
Hess | Ulysses | 37 25.688 | 37 | 25 | 688 | 37.60778 |
Hess | Ulysses | 37 25.687 | 37 | 25 | 687 | 37.60750 |
Hess | William | 37 25.688 | 37 | 25 | 688 | 37.60778 |
Hess | William | 37 25.687 | 37 | 25 | 687 | 37.60750 |
Hess | Jerome | 37 25.689 | 37 | 25 | 689 | 37.60806 |
Hess | Franklin | 37 25.689 | 37 | 25 | 689 | 37.60806 |
Hess | Samuel | 37 25.693 | 37 | 25 | 693 | 37.60917 |
Hess | Bernice | 37 25.693 | 37 | 25 | 693 | 37.60917 |
Hess | Catherine | 37 25.693 | 37 | 25 | 693 | 37.60917 |
Hess | George | 37 25.690 | 37 | 25 | 690 | 37.60833 |
Holt | Lucinda | NA | NA | NA | NA | NA |
Holt | William | NA | NA | NA | NA | NA |
Horlacher | Daniel | 40 30.928 | 40 | 30 | 928 | 40.75778 |
Horlacher | Margaretha | 40 30.930 | 40 | 30 | 930 | 40.75833 |
Horrall | Polly | 37 54.090 | 37 | 54 | 90 | 37.92500 |
Horrall | James | 38 33.026 | 38 | 33 | 26 | 38.55722 |
Horrall | William | 38 36.963 | 38 | 36 | 963 | 38.86750 |
Hurt | Elizabeth | 36 28.804 | 36 | 28 | 804 | 36.69000 |
Jacobs | Jeremiah | 38 21.315 | 38 | 21 | 315 | 38.43750 |
Jacobs | Rebecca | 38 21.317 | 38 | 21 | 317 | 38.43806 |
Johnson | James | 37 52.872 | 37 | 52 | 872 | 38.10889 |
Johnson | Mary | 37 52.872 | 37 | 52 | 872 | 38.10889 |
Jones | Levi | 37 47.994 | 37 | 47 | 994 | 38.05944 |
Jones | Hester | 37 47.994 | 37 | 47 | 994 | 38.05944 |
Jones | Ridley | 37 47.997 | 37 | 47 | 997 | 38.06028 |
Jones | James | 37 47.995 | 37 | 47 | 995 | 38.05972 |
Jones | Tina | 37 47.995 | 37 | 47 | 995 | 38.05972 |
Jones | Ezra | 37 48.024 | 37 | 48 | 24 | 37.80667 |
Jones | Nannie | 37 48.024 | 37 | 48 | 24 | 37.80667 |
Jones | Samuel | 37 48.020 | 37 | 48 | 20 | 37.80556 |
Jones | Melverda | 37 48.020 | 37 | 48 | 20 | 37.80556 |
Jones | John | 37 51.747 | 37 | 51 | 747 | 38.05750 |
Karnes | Willard | 37 58.749 | 37 | 58 | 749 | 38.17472 |
Karnes | Ruth | 37 58.749 | 37 | 58 | 749 | 38.17472 |
Keith | James | NA | NA | NA | NA | NA |
Keth | Nancy | 35 09.410 | 35 | 9 | 410 | 35.26389 |
Kleppinger | Anna | 40 44.178 | 40 | 44 | 178 | 40.78278 |
Lipsey | Joe | 38 33.917 | 38 | 33 | 917 | 38.80472 |
Lockwood | Eugenia | NA | NA | NA | NA | NA |
Lockwood | Leland | NA | NA | NA | NA | NA |
Loomis | Jon | 37 36.925 | 37 | 36 | 925 | 37.85694 |
Mensch | Abraham | 40 39.557 | 40 | 39 | 557 | 40.80472 |
Merrell | Azariah | 35 43.945 | 35 | 43 | 945 | 35.97917 |
Merrell | Abigail | 35 43.942 | 35 | 43 | 942 | 35.97833 |
Meredith | Eleandra | 39 41.114 | 39 | 41 | 114 | 39.71500 |
Meredith | Micajah | 39 41.115 | 39 | 41 | 115 | 39.71528 |
Meredith | Samuel | 39 41.116 | 39 | 41 | 116 | 39.71556 |
Meredith | Elizabeth | 39 41.116 | 39 | 41 | 116 | 39.71556 |
Meredith | Ruth | 39 41.117 | 39 | 41 | 117 | 39.71583 |
Bell | Sarah | 39 41.117 | 39 | 41 | 117 | 39.71583 |
John | Bell | 39 41.117 | 39 | 41 | 117 | 39.71583 |
Meredith | Mary | 39 41.118 | 39 | 41 | 118 | 39.71611 |
Meredith | Clarence | 39 41.112 | 39 | 41 | 112 | 39.71444 |
Meredith | Cora | 39 41.112 | 39 | 41 | 112 | 39.71444 |
Meredith | W | 39 41.112 | 39 | 41 | 112 | 39.71444 |
Meredith | Susan | 39 41.112 | 39 | 41 | 112 | 39.71444 |
Meredith | Hannah | 39 41.112 | 39 | 41 | 112 | 39.71444 |
Meredith | Mary | 39 41.112 | 39 | 41 | 112 | 39.71444 |
Meredith | Samuel | 39 41.113 | 39 | 41 | 113 | 39.71472 |
Meredith | Belinda | 39 41.113 | 39 | 41 | 113 | 39.71472 |
Tipton | Susannah | 39 41.114 | 39 | 41 | 114 | 39.71500 |
Meredith | Thomas | 39 41.114 | 39 | 41 | 114 | 39.71500 |
Meredith | Sarah | 39 41.114 | 39 | 41 | 114 | 39.71500 |
Mildenberger | Anna | 40 44.194 | 40 | 44 | 194 | 40.78722 |
Mildenberger | Nicolaus | 40 44.179 | 40 | 44 | 179 | 40.78306 |
Miller | Myrtie | 37 48.023 | 37 | 48 | 23 | 37.80639 |
Minnich | Elizabeth | 40 40.757 | 40 | 40 | 757 | 40.87694 |
Minnich | John | 40 40.759 | 40 | 40 | 759 | 40.87750 |
Mory | Catherina | 40 33.585 | 40 | 33 | 585 | 40.71250 |
Mory | Gotthard | 40 33.586 | 40 | 33 | 586 | 40.71278 |
Mory | Magdelena | 40 33.586 | 40 | 33 | 586 | 40.71278 |
Mory | Peter | 40 33.585 | 40 | 33 | 585 | 40.71250 |
Nagel | Anna | 40 33.585 | 40 | 33 | 585 | 40.71250 |
Nagel | Anna | 40 44.191 | 40 | 44 | 191 | 40.78639 |
Nagel | Caty | 41 13.033 | 41 | 13 | 33 | 41.22583 |
Nagel | Daniel | 40 39.575 | 40 | 39 | 575 | 40.80972 |
Nagel | Frederick | 40 39.577 | 40 | 39 | 577 | 40.81028 |
Nagel | Friedrich | 40 44.197 | 40 | 44 | 197 | 40.78806 |
Nagel | Johann | 41 13.031 | 41 | 13 | 31 | 41.22528 |
Nagel | Maria | 40 39.575 | 40 | 39 | 575 | 40.80972 |
Nagle | John | 38 44.582 | 38 | 44 | 582 | 38.89500 |
Nagle | Mary | 38 44.582 | 38 | 44 | 582 | 38.89500 |
Nagel | Henry | 38 44.582 | 38 | 44 | 582 | 38.89500 |
Nagel | Mary | 38 44.582 | 38 | 44 | 582 | 38.89500 |
Nagel | Will | 38 44.582 | 38 | 44 | 582 | 38.89500 |
Nagel | Adeline | 38 44.582 | 38 | 44 | 582 | 38.89500 |
NA | NA | NA | NA | NA | NA | NA |
Nutty | John | 37 25.674 | 37 | 25 | 674 | 37.60389 |
Nutty | Beatrice | 37 25.682 | 37 | 25 | 682 | 37.60611 |
Nutty | John | 37 25.678 | 37 | 25 | 678 | 37.60500 |
Ritter | NA | 37 52.861 | 37 | 52 | 861 | 38.10583 |
Ritter | NA | 37 52.861 | 37 | 52 | 861 | 38.10583 |
Odom | Archibald | 37 58.794 | 37 | 58 | 794 | 38.18722 |
Odom | Cynthia | 37 58.795 | 37 | 58 | 795 | 38.18750 |
Odom | G | 37 47.993 | 37 | 47 | 993 | 38.05917 |
Odom | Sarah | 37 47.994 | 37 | 47 | 994 | 38.05944 |
NA | Thomas | 37 47.992 | 37 | 47 | 992 | 38.05889 |
Odum | Britton | NA | NA | NA | NA | NA |
Odum | Wiley | 37 47.187 | 37 | 47 | 187 | 37.83528 |
Odum | Sallie A | 37 47.187 | 37 | 47 | 187 | 37.83528 |
Peters | Daniel | 37 47.244 | 37 | 47 | 244 | 37.85111 |
Peters | Charlotte | 37 47.244 | 37 | 47 | 244 | 37.85111 |
Pickard | William | 38 04.918 | 38 | 4 | 918 | 38.32167 |
Pickard | Harriet | 38 04.919 | 38 | 4 | 919 | 38.32194 |
Pickard | Louise | 38 04.917 | 38 | 4 | 917 | 38.32139 |
Pletz | Karl | 37 44.684 | 37 | 44 | 684 | 37.92333 |
Russell | Caroline | 37 44.683 | 37 | 44 | 683 | 37.92306 |
Pickard | William | 38 04.918 | 38 | 4 | 918 | 38.32167 |
Pickard | Harriet | 38 04.919 | 38 | 4 | 919 | 38.32194 |
Pickard | Louise | 38 04.917 | 38 | 4 | 917 | 38.32139 |
Pulliam | Frieda | 37 25.697 | 37 | 25 | 697 | 37.61028 |
Pulliam | Amos | 37 25.697 | 37 | 25 | 697 | 37.61028 |
Rex | William | 37 45.776 | 37 | 45 | 776 | 37.96556 |
Rex | Elmina | 37 45.776 | 37 | 45 | 776 | 37.96556 |
Rex | Mamie | 37 45.777 | 37 | 45 | 777 | 37.96583 |
Rex | George | 37 45.777 | 37 | 45 | 777 | 37.96583 |
Rex | Bertie | 37 45.776 | 37 | 45 | 776 | 37.96556 |
Rex | Lulie | 37 45.774 | 37 | 45 | 774 | 37.96500 |
Rex | Lily | 37 45.776 | 37 | 45 | 776 | 37.96556 |
Rex | Arthur | 37 45.776 | 37 | 45 | 776 | 37.96556 |
Rex | George | 37 45.776 | 37 | 45 | 776 | 37.96556 |
Rex | Jno | 32 22 549 | 32 | NA | NA | NA |
Rex | Guy | 37 44.784 | 37 | 44 | 784 | 37.95111 |
Rex | Harlie | 37 44.785 | 37 | 44 | 785 | 37.95139 |
Richardson | Annabelle | 37 44.766 | 37 | 44 | 766 | 37.94611 |
Richardson | Alfred | 37 44.787 | 37 | 44 | 787 | 37.95194 |
Riegel | Solomon | 37 49.828 | 37 | 49 | 828 | 38.04667 |
Riegel | Catherine | 37 49.828 | 37 | 49 | 828 | 38.04667 |
Ritter | J | 37 52.853 | 37 | 52 | 853 | 38.10361 |
Ritter | Mary | 37 52.853 | 37 | 52 | 853 | 38.10361 |
Rockel | Balzer | 40 39.556 | 40 | 39 | 556 | 40.80444 |
Rockel | Elisabetha | 40 39.555 | 40 | 39 | 555 | 40.80417 |
Rockel | Johannes | 40 39.560 | 40 | 39 | 560 | 40.80556 |
Rockel | Elizabeth | 40 39.560 | 40 | 39 | 560 | 40.80556 |
Ross | George | 37 58.752 | 37 | 58 | 752 | 38.17556 |
Ross | Euna | 37 58.752 | 37 | 58 | 752 | 38.17556 |
Ruckel | Mary | NA | NA | NA | NA | NA |
Ruckel | Melchir | NA | NA | NA | NA | NA |
Russell | James | 37 44.681 | 37 | 44 | 681 | 37.92250 |
Russell | Ana | 37 44.682 | 37 | 44 | 682 | 37.92278 |
NA | NA | 37 44.682 | 37 | 44 | 682 | 37.92278 |
NA | NA | 37 44.682 | 37 | 44 | 682 | 37.92278 |
NA | NA | 37 44.682 | 37 | 44 | 682 | 37.92278 |
Siliven | Jenniel | 37 28.189 | 37 | 28 | 189 | 37.51917 |
Sinks | A | 36 14.451 | 36 | 14 | 451 | 36.35861 |
Sinks | Francis | 37 54.081 | 37 | 54 | 81 | 37.92250 |
Sinks | Delphia | 37 54.081 | 37 | 54 | 81 | 37.92250 |
Sinks | Salem | 37 54.089 | 37 | 54 | 89 | 37.92472 |
Sinks | Daniel | 37 52.619 | 37 | 52 | 619 | 38.03861 |
Sinks | Martha | 37 52.619 | 37 | 52 | 619 | 38.03861 |
Sinks | Roy | 37 52.619 | 37 | 52 | 619 | 38.03861 |
Sinks | Elizabeth | 37 47.989 | 37 | 47 | 989 | 38.05806 |
Sinks | infant son | 37 47.989 | 37 | 47 | 989 | 38.05806 |
Sinks | John | 37 47.986 | 37 | 47 | 986 | 38.05722 |
Sinks | Mary | 37 47.985 | 37 | 47 | 985 | 38.05694 |
Sinks | William | 37 47.984 | 37 | 47 | 984 | 38.05667 |
Sinks | Charlotte | 37 47.984 | 37 | 47 | 984 | 38.05667 |
Sinks | Anna | 37 47.984 | 37 | 47 | 984 | 38.05667 |
Sinks | Leonard | 37 47.982 | 37 | 47 | 982 | 38.05611 |
Sinks | Etta Faye | 37 48.024 | 37 | 48 | 24 | 37.80667 |
Sinks | John | 37 48.024 | 37 | 48 | 24 | 37.80667 |
Sinks | Sena | 37 48.020 | 37 | 48 | 20 | 37.80556 |
Sinks | William | 37 44.702 | 37 | 44 | 702 | 37.92833 |
Sweet | Jewell | 37 44.704 | 37 | 44 | 704 | 37.92889 |
Sinks | Francis | 37 44.702 | 37 | 44 | 702 | 37.92833 |
Sinks | Arlie | 37 44.704 | 37 | 44 | 704 | 37.92889 |
Sinks | Viola | 37 44.704 | 37 | 44 | 704 | 37.92889 |
Sinks | Leonard | 38 33.836 | 38 | 33 | 836 | 38.78222 |
Sinks | Mae | 38 33.837 | 38 | 33 | 837 | 38.78250 |
Sinks | Bessie | 38 33.917 | 38 | 33 | 917 | 38.80472 |
Sinks | Caroline | 38 02.272 | 38 | 2 | 272 | 38.10889 |
Sinks | Arlie | 37 44.770 | 37 | 44 | 770 | 37.94722 |
Sinks | Eva | 37 44.770 | 37 | 44 | 770 | 37.94722 |
Solt | Conrad | 40 48.686 | 40 | 48 | 686 | 40.99056 |
Solt | Conrad | 40 48.693 | 40 | 48 | 693 | 40.99250 |
Solt | Maria | 40 48.690 | 40 | 48 | 690 | 40.99167 |
Sfafford | Trice | 37 52.608 | 37 | 52 | 608 | 38.03556 |
Sfafford | Phebe | 37 52.608 | 37 | 52 | 608 | 38.03556 |
Steen | Richard | 38 33.025 | 38 | 33 | 25 | 38.55694 |
VanCleve | Martin | 37 25.694 | 37 | 25 | 694 | 37.60944 |
VanCleve | Florence | 37 25.694 | 37 | 25 | 694 | 37.60944 |
VanCleve | W | 37 33.397 | 37 | 33 | 397 | 37.66028 |
VanCleve | Nancy | 37 33.397 | 37 | 33 | 397 | 37.66028 |
VanCleve | J | 37 33.397 | 37 | 33 | 397 | 37.66028 |
VanCleave | W | 38 04.924 | 38 | 4 | 924 | 38.32333 |
VanCleave | Elizabeth | 38 04.924 | 38 | 4 | 924 | 38.32333 |
Veach | Pleasant | 37 29.916 | 37 | 29 | 916 | 37.73778 |
Veach | Victoria | 37 29.916 | 37 | 29 | 916 | 37.73778 |
Veach | Ward | 37 29.895 | 37 | 29 | 895 | 37.73194 |
Veach | Cynthia | 37 29.895 | 37 | 29 | 895 | 37.73194 |
Veach | James | 37 29.895 | 37 | 29 | 895 | 37.73194 |
Veach | James | 37 25.692 | 37 | 25 | 692 | 37.60889 |
Veach | Nannie | 37 26.692 | 37 | 26 | 692 | 37.62556 |
Veatch | John | 37 26.692 | 37 | 26 | 692 | 37.62556 |
Veatch | Eleanor | 37 26.692 | 37 | 26 | 692 | 37.62556 |
Veach | William | 37 26.693 | 37 | 26 | 693 | 37.62583 |
Veach | James | 37 26.692 | 37 | 26 | 692 | 37.62556 |
Veach | Rachel | 37 26.692 | 37 | 26 | 692 | 37.62556 |
Veach | Pleasant | 37 29.895 | 37 | 29 | 895 | 37.73194 |
Veach | Mary | 37 29.897 | 37 | 29 | 897 | 37.73250 |
Veatch | Parmelia | 37 28.187 | 37 | 28 | 187 | 37.51861 |
Veatch | Mary | 37 28.187 | 37 | 28 | 187 | 37.51861 |
Veatch | Elnor | 37 28.187 | 37 | 28 | 187 | 37.51861 |
Veatch | Frelin | 37 28.186 | 37 | 28 | 186 | 37.51833 |
Veach-Nutty | NA | 37 25.682 | 37 | 25 | 682 | 37.60611 |
Veach | John | 37 25.681 | 37 | 25 | 681 | 37.60583 |
Veach | Rose | 37 25.679 | 37 | 25 | 679 | 37.60528 |
Veach | Ruth | 37 25.682 | 37 | 25 | 682 | 37.60611 |
Ware | Turner | 37 52.856 | 37 | 52 | 856 | 38.10444 |
Ware | Martha | 37 52.856 | 37 | 52 | 856 | 38.10444 |
Ware | Joseph | 37 52.867 | 37 | 52 | 867 | 38.10750 |
Ware | Caroline | 37 52.867 | 37 | 52 | 867 | 38.10750 |
Webber | Dick | 37 49.829 | 37 | 49 | 829 | 38.04694 |
Webber | Pearl | 37 49.826 | 37 | 49 | 826 | 38.04611 |
Weir | James | 36 15.064 | 36 | 15 | 64 | 36.26778 |
Weir | Mary | 36 15.064 | 36 | 15 | 64 | 36.26778 |
Wier | Leticia | 37 49.208 | 37 | 49 | 208 | 37.87444 |
Whiteside | Lucinda | 37 26.743 | 37 | 26 | 743 | 37.63972 |
Whiteside | John | 37 26.743 | 37 | 26 | 743 | 37.63972 |
Willis | Matha | 36 35.889 | 36 | 35 | 889 | 36.83028 |
Wilson | Jessie | 36 26.350 | 36 | 26 | 350 | 36.53056 |
Wilson | Mary | 36 26.361 | 36 | 26 | 361 | 36.53361 |
Wilson | Joseph | 36 29.553 | 36 | 29 | 553 | 36.63694 |
Wilson | Elisha | 36 28.812 | 36 | 28 | 812 | 36.69222 |
Wilson | Sallie | 36 28.803 | 36 | 28 | 803 | 36.68972 |
Wilson | Lutetita | NA | NA | NA | NA | NA |
Wilson | Thomas | 37 48.034 | 37 | 48 | 34 | 37.80944 |
Wilson | Sarah | 37 48.034 | 37 | 48 | 34 | 37.80944 |
Wilson | Elisha | 36 26.351 | 36 | 26 | 351 | 36.53083 |
Wilson | Martha | 36 26.351 | 36 | 26 | 351 | 36.53083 |
Wilson | Charles | 36 26.351 | 36 | 26 | 351 | 36.53083 |
Wilson | Zack | 36 26.350 | 36 | 26 | 350 | 36.53056 |
Wilson | Juritha | 36 26.350 | 36 | 26 | 350 | 36.53056 |
Wilson | Elisha | 36 26.351 | 36 | 26 | 351 | 36.53083 |
Wilson | Drury | NA | NA | NA | NA | NA |
Wilson | Mary | NA | NA | NA | NA | NA |
Wilson | Sandifer | NA | NA | NA | NA | NA |
Wilson | Nancy | NA | NA | NA | NA | NA |
Wise | Luvena | 37 50.352 | 37 | 50 | 352 | 37.93111 |
Wollard | John | 37 54.076 | 37 | 54 | 76 | 37.92111 |
Woolard | Nettie | 37 54.075 | 37 | 54 | 75 | 37.92083 |
Woolard | Millie | 37 58.721 | 37 | 58 | 721 | 38.16694 |
Woolard | Lawrence | 37 58.721 | 37 | 58 | 721 | 38.16694 |
Woolard | Etta | 37 58.721 | 37 | 58 | 721 | 38.16694 |
Woolard | John | 37 58.723 | 37 | 58 | 723 | 38.16750 |
Woolard | James | 37 58.721 | 37 | 58 | 721 | 38.16694 |
Woolard | C | 37 58.720 | 37 | 58 | 720 | 38.16667 |
Woolard | Blanche | 37 58.720 | 37 | 58 | 720 | 38.16667 |
Woolard | L | 37 51.394 | 37 | 51 | 394 | 37.95944 |
Woolard | Ama | 37 51.395 | 37 | 51 | 395 | 37.95972 |
Woolard | Robert | 37 51.396 | 37 | 51 | 396 | 37.96000 |
Woolard | James | 37 51.391 | 37 | 51 | 391 | 37.95861 |
Woolard | Romey | 37 51.397 | 37 | 51 | 397 | 37.96028 |
Woolard | Anna | 37 52.853 | 37 | 52 | 853 | 38.10361 |
Woolard | James | 37 52.853 | 37 | 52 | 853 | 38.10361 |
Woolard | Francis | 37 52.853 | 37 | 52 | 853 | 38.10361 |
Woolard | Turner | 37 52.853 | 37 | 52 | 853 | 38.10361 |
Woolard | William | 37 52.854 | 37 | 52 | 854 | 38.10389 |
Woolard | George | 37 51.742 | 37 | 51 | 742 | 38.05611 |
Woolard | Nancy | 37 51.742 | 37 | 51 | 742 | 38.05611 |
The weird coordinates like William Dorris had (524) were turned into NAs by this process, so I don’t need to worry about fixing them. The leading zeros were removed from the seconds data. For this application, it doesn’t matter. For others it might, and you could pad then back using str_pad()
. So that’s it for cleaning this variable.
Cleaning up Dates (strings)
Next, I’m going to clean up the dates. They imported as also imported as strings. I don’t think I’m going to use the dates in the map, but I might use them when I’m working with the web scraping data. Like I did with the GPS Data, I’m first going to cleanup the typos then convert to the format I want.
Viewing the Dates
class(tombstones$DOB)
[1] "character"
%>%
tombstones select(Surname, First.Name,DOB, DOD) %>%
gt() %>%
tab_options(container.height = px(300), container.padding.y = px(24))
Surname | First.Name | DOB | DOD |
---|---|---|---|
Anderson | Abraham | 10 Mar 1776 | 15 Aug 1838 |
Anderson | Elizabeth | 29 Jan 1782 | 13 Oct 1869 |
Anderson | Zady | 18 Apr 1812 | 12 Dec 1839 |
Anderson | Albert | 28 Nov 1809 | 5 Nov 1882 |
Anderson | Adesia | 17 Mar 1808 | 26 Spt 1864 |
Anderson | May | NA | 9 Aug 1887 |
Anderson | E | 23 Spt 1877 | 24 Oct 1899 |
Anderson | William | 26 Feb 1836 | 31 Dec 1895 |
Anderson | Nancy | 2 Spt 1836 | 18 Oct 1917 |
Appleton | Richard | 1 Aug 1817 | 6 Oct 1897 |
Baldwin | John | 25 Sep 1845 | NA |
Baldwin | William | NA | NA |
Baggett | Mahalia | 3 June 1832 | 6 June 1897 |
Beasley | E | 13 May 1826 | NA |
Beasley | Josephine | 23 Dec 1858 | 25 Aug 1899 |
Beasley | Fanning | NA | 25 Aug 1899 |
Bell | John | 1825 | 1872 |
Bell | Mary | 1826 | 1859 |
Brazelton | Wm | NA | 10 Dec 1858 |
Brazelton | Esther | NA | 20 Sept 1840 |
Brown | Elizabeth | 1 July 1807 | 2 Feb 1888 |
Brown | Joel | 20 Dec 1803 | NA |
Bundy | Hope | 1833 | 1918 |
Bundy | Clem | NA | NA |
Bundy | Nancy | NA | NA |
Bundy | W | NA | NA |
Bundy | Charles | NA | NA |
Bundy | Thomas | 4 Feb 1881 | 22 Apr 1892 |
Bundy | Octavia | 16 Apr 1839 | 10 Mar 1932 |
Bundy | George | 1864 | 1949 |
Bundy | Lora | 1870 | 1957 |
Burgess | W | 1844 | 17 Dec 1907 |
Burgess | Alzada | 1854 | 1931 |
Clayton | G | 9 Apr 1846 | 9 Mar 1919 |
Clayton | Ellen | 28 Mar 1852 | 23 May 1917 |
Clayton | L | NA | 25 May 1909 |
Clayton | Mary | NA | 18 Jan 1908 |
Chapman | Daniel | NA | NA |
Chapman | Elizabeth | NA | NA |
Chapman | Caroline | NA | 24 Jan 1861 |
Chapman | Daniel | 5 Jul 1863 | Feb 1842 |
Chapman | Lucretia | 23 Apr 1769 | Aug 1849 |
Chapman | Samuel | 14 May 1794 | 15 Apr 1863 |
Chapman | Elizabeth | 23 May 1796 | 30 Dec 1859 |
Chapman | Laura | NA | 14 June 1865 |
Chapman | Polly | NA | NA |
Crockett | Mandy | 3 Sep 1844 | 5 Nov 1914 |
Crockett | John | 4 Oct 1844 | 25 July 1919 |
Davis | Ezra | 1887 | 1949 |
Davis | Lizzie | 1886 | 1955 |
Davis | Fred | NA | NA |
Dolch | Catherine | 11 Feb 1793 | 14 Feb 1867 |
Dolch | Christian | 24 Dec 1794 | 29 Aug 1874 |
Dolch | Peter | NA | 16 July 1862 |
Doley | George | 25 July 1823 | 26 Jan 1868 |
Doley | Katie | 8 Jan 1833 | 15 Spt 1927 |
Doley | Mary E | 23 Aug 1851 | 18 July 1853 |
Doley | Henriettie | 28 Feb 1853 | 18 July 1854 |
NA | MED | NA | NA |
NA | HD | NA | NA |
NA | GD | NA | NA |
NA | Mother | NA | NA |
NA | Father | NA | NA |
Doley | George | 14 Mar 1856 | 26 Jan 1860 |
Doley | James | 21 Jan 1858 | 11 July 1926 |
Doley | May | 1869 | 1940 |
Doley | John | 1864 | [1949?] |
Doley | Maggie | 1868 | 1917 |
Doley | William | 20 Dec 1862 | 24 Nov 1934 |
Doley | Dora | 17 Feb 1864 | 9 Apr 1934 |
Doley | L[eaman] | 17 Apr 1889 | 19 Nov 1960 |
Doley | G[uilford] | 5 Aug 1894 | 1 June 1960 |
Doley | D[ora] | 1897 | 1983 |
Doley | Eugene | 8 Nov 1886 | 12 June 1958 |
Doley | Lou | 29 Oct 1895 | 26 June 1963 |
Dorris | J[oseph] | 12 Feb 1821 | 8 Jan 1893 |
Dorris | Joseph | 10 Feb 1779 | 5 Nov 1866 |
Dorris | Sarah | 2 June 1790 | 23 June 1862 |
Dorris | W | 2 June 1790 | 5 Nov 1873 |
Dorris | A | NA | 7 Aug 1853 |
Dorris | J | 15 Jan 1822 | 28 Nov 1911 |
Dorris | Elizabeth | 7 Jan 1833 | 27 Aug 1895 |
Dorris | Robert | 24 Mar 1818 | 22 Aug 1894 |
Dorris | Rebecca | 22 mar 1829 | 24 Apr 1910 |
Dorris | Monroe | 1848 | 1928 |
Dorris | Della | 1884 | 1934 |
Dorris | Mary M | 1855 | 1927 |
Dorris | Harve | 13 Aug. 1889 | 6 Apr 1966 |
Dorris | Carrie | 12 July 1892 | 7 Aug 1990 |
Dorris | Smith | NA | NA |
Dorris | Ada | NA | NA |
Dorris | William | NA | NA |
Dorris | Harvey | NA | NA |
Dorris | Cora | NA | NA |
Dorris | John | NA | NA |
Dorris | W | 1857 | 1936 |
Dorris | Gustavus | 2 Aug 1847 | 27 Feb 1923 |
Dorris | Sarah | NA | 17 Spt 1880 |
Dorris | Joseph | NA | NA |
Dorris | Della | NA | NA |
Dorris | William | 28 Nov 1818 | 17 Feb 1905 |
Dorris | Harriet | 28 Nov 1818 | 17 Feb 1905 |
Dorris | William | 1867 | 1942 |
Dorris | Mary | 1872 | 1960 |
Dorris | James | NA | 24 Mar 1877 |
Dorris | Sarah | 1 June 1837 | 6 Dec 1908 |
Dorris | W[illiam] | 2 Jan 1835 | 16 Feb 1898 |
Dorris | E[lisha] | 17 Dec 1853 | 7 May 1907 |
Dorris | Sarah | 28 Aug 1858 | 20 Feb 1922 |
Dorris | James | 1861 | 1921 |
Dorris | Georgia | 1860 | 1908 |
Dorris | William | NA | 10 Aug 1857 |
Dorris | Malinda | NA | 22 Mar 1850 |
Drake | Mary | 1843 | 1909 |
Dreisbach | Catherina | 21 Dec 1752 | 20 Spt 1819 |
Dreisbach | Johannes | 21 Aug 1752 | 20 Spt 1825 |
Everett | Semantha | NA | Apr 1845 |
Farris | Elizabeth | NA | 2 Feb 1866 |
Farris | Elizabeth | NA | 28 Spt 1858 |
Finch | Isaac | NA | 26 Nov 1813 |
Follis | Fawn | 9 Feb 1907 | 17 Nov. 1909 |
Follis | Ralph | 7 Apr 1901 | 27 Feb 1906 |
Follis | A | NA | 17 Oct 1887 |
Follis | Christian | NA | 21 Oct 1892 |
Follis | G | NA | 20 Aug 1899 |
Follis | Ralph | 6 Oct 1892 | 4 Nov 1892 |
Follis | E | NA | 15 Mar 1890 |
Follis | William | 20 Aug 1832 | 25 Aug 1900 |
Follis | Martha | 10 Aug 1841 | 15 Aug 1881 |
Follis | Jeff | 10 Sep 1873 | 25 July 1905 |
Ford | Florence | 3 Spt 1866 | 1 July 1940 |
Fox | Frances | 27 Spt 1904 | 18 Jan 1997 |
Frost | Ebenezer | NA | NA |
NA | NA | NA | NA |
Frost | NA | NA | NA |
Fuqua | William | NA | NA |
Gregory | Leonard | 15 Mar 1887 | 13 June 1963 |
Gregory | Lucille | 1899 | 1986 |
Hart | Parmelia | 14 May 1847 | 17 Sep 1878 |
Hess | Amalphus | NA | 27 Mar 1872 |
Hess | Adolphus | NA | 23 June 1871 |
Hess | Samuel | 23 Dec 1823 | 5 Mar 1901 |
Hess | Augusta | 17 Jan 1828 | NA |
Hess | Ulysses | 21 Dec 1867 | 16 Feb 1897 |
Hess | Ulysses | 867 | 1897 |
Hess | William | 9 Dec 1851 | 26 Apr 1900 |
Hess | William | 1851 | 1900 |
Hess | Jerome | 1849 | 1931 |
Hess | Franklin | 1857 | 1935 |
Hess | Samuel | 1854 | 1949 |
Hess | Bernice | 20 Sep 1894 | 27 Sep 1978 |
Hess | Catherine | 1856 | 1906 |
Hess | George | 27 Feb 1864 | 3 Dec 1942 |
Holt | Lucinda | 15 Aug 1832 | 28 Aug 1888 |
Holt | William | 14 May 1818 | 26 Aug 1903 |
Horlacher | Daniel | 4 Aug 1735 | 24 Spt 1804 |
Horlacher | Margaretha | 4 Jan 1741 | 22 Apr 1806 |
Horrall | Polly | NA | NA |
Horrall | James | NA | 15 Apr 1848 |
Horrall | William | NA | NA |
Hurt | Elizabeth | NA | NA |
Jacobs | Jeremiah | NA | 30 Dec 1824 |
Jacobs | Rebecca | NA | 18 July 1813 |
Johnson | James | 25 Feb 1837 | NA |
Johnson | Mary | 5 Spt, 1844 | 12 Nov 1895 |
Jones | Levi | 1828 | 1892 |
Jones | Hester | 1841 | 1910 |
Jones | Ridley | NA | 2 Mar 1863 |
Jones | James | 1863 | 1933 |
Jones | Tina | 1874 | 1918 |
Jones | Ezra | 15 Spt 1867 | 19 May 1942 |
Jones | Nannie | 18 July 1968 | 18 July 1957 |
Jones | Samuel | 11 Feb 1876 | 22 Nov 1915 |
Jones | Melverda | 1 Nov 1879 | 2 Apr 1924 |
Jones | John | NA | NA |
Karnes | Willard | 1 Dec 1911 | 27 Oct 1990 |
Karnes | Ruth | 8 Jan 1917 | 8 Jan 2000 |
Keith | James | NA | 6 Oct 1841 |
Keth | Nancy | NA | 17 Nov 1827 |
Kleppinger | Anna | 29 Spt 1748 | 19 June 1817 |
Lipsey | Joe | 1888 | 1928 |
Lockwood | Eugenia | 11 July 1910 | 5 Oct 1994 |
Lockwood | Leland | 19 Feb. 1914 | 24 Dec 1970 |
Loomis | Jon | NA | NA |
Mensch | Abraham | 6 Apr 1754 | 16 Mar 1826 |
Merrell | Azariah | 20 May 1777 | 25 Jan 1844 |
Merrell | Abigail | 1781 | 11 June 1844 |
Meredith | Eleandra | NA | 6 Oct 1875 |
Meredith | Micajah | NA | 26 Oct 1822 |
Meredith | Samuel | 8 Sept 1753 | 10 Oct 1825 |
Meredith | Elizabeth | 22 Jan 1757 | 6 Apr 1824 |
Meredith | Ruth | NA | 28 Aug 1856 |
Bell | Sarah | NA | 1860 |
John | Bell | 7 July 1812 | 4 July 1872 |
Meredith | Mary | 16 Apr 1793 | 11 Dec 1873 |
Meredith | Clarence | 1899 | 1979 |
Meredith | Cora | 1900 | 1983 |
Meredith | W | 1926 | 1945 |
Meredith | Susan | 28 July 1833 | 2 Mar 1919 |
Meredith | Hannah | 15 Nov 1830 | 21 Dec 1903 |
Meredith | Mary | NA | 11 Nov 1882 |
Meredith | Samuel | NA | 5 Jan 1884 |
Meredith | Belinda | NA | 4 Aug 1889 |
Tipton | Susannah | NA | 3 May 1852 |
Meredith | Thomas | NA | 20 Apr 1840 |
Meredith | Sarah | NA | 21 Mar 1830 |
Mildenberger | Anna | 29 Spt 1739 | 11 Oct. 1777 |
Mildenberger | Nicolaus | 15 Oct 1781 | 19 Oct 1856 |
Miller | Myrtie | 18 July 1896 | 7 Dec 1991 |
Minnich | Elizabeth | NA | NA |
Minnich | John | NA | NA |
Mory | Catherina | 8 May 1758 | 25 Aug 1837 |
Mory | Gotthard | 20 Mar 1752 | 26 May 1843 |
Mory | Magdelena | 17 Spt 1759 | 26 Nov 1827 |
Mory | Peter | 3 May 1757 | [grass blocks date] |
Nagel | Anna | 28 Jul 1761 | 27 Mar 1840 |
Nagel | Anna | 9 Feb 1725 | 9 Spt 1790 |
Nagel | Caty | NA | 4 May 1817 |
Nagel | Daniel | NA | 7 May 1866 |
Nagel | Frederick | 26 Apr 1759 | 10 Mar 1839 |
Nagel | Friedrich | 1713 | 22 Nov 1779 |
Nagel | Johann | 15 Feb 1746 | 3 June 1823 |
Nagel | Maria | NA | NA |
Nagle | John | NA | 23 Nov 1870 |
Nagle | Mary | NA | 18 Mar 1870 |
Nagel | Henry | 1834 | 1913 |
Nagel | Mary | 1836 | 1921 |
Nagel | Will | 1858 | 1916 |
Nagel | Adeline | 1860 | 1941 |
NA | NA | NA | NA |
Nutty | John | 1907 | 1977 |
Nutty | Beatrice | 1909 | 1989 |
Nutty | John | 1907 | 1977 |
Ritter | NA | NA | NA |
Ritter | NA | NA | NA |
Odom | Archibald | 20 Feb 1838 | 20 May 1915 |
Odom | Cynthia | NA | NA |
Odom | G | 1854 | 1943 |
Odom | Sarah | 14 June 1854 | 24 Oct 1910 |
NA | Thomas | NA | 15 Nov 1887 |
Odum | Britton | 1794 | 1863 |
Odum | Wiley | 21 May 1879 | 2 Mar 1937 |
Odum | Sallie A | 15 May 1881 | 6 May 1946 |
Peters | Daniel | NA | 20 Aug 1808 |
Peters | Charlotte | NA | 20 June 1880 |
Pickard | William | 19 Feb 1839 | 6 Jan 1907 |
Pickard | Harriet | 1843 | 1916 |
Pickard | Louise | 1868 | 1935 |
Pletz | Karl | 1914 | 1975 |
Russell | Caroline | 1917 | 1995 |
Pickard | William | 1839 | 1907 |
Pickard | Harriet | 1847 | 1916 |
Pickard | Louise | 1868 | 1935 |
Pulliam | Frieda | NA | NA |
Pulliam | Amos | NA | NA |
Rex | William | NA | 27 Apr 1909 |
Rex | Elmina | NA | 25 Jan 1900 |
Rex | Mamie | NA | 7 Aug 1892 |
Rex | George | NA | 8 Dec 1891 |
Rex | Bertie | NA | 5 Nov 1887 |
Rex | Lulie | 10 Oct 1877 | 2 Oct 1878 |
Rex | Lily | 21 Spt 1871 | 11 Oct 1872 |
Rex | Arthur | 11 Aug 1870 | 25 Aug 1870 |
Rex | George | NA | NA |
Rex | Jno | NA | NA |
Rex | Guy | 28 May 1889 | 14 Oct 1983 |
Rex | Harlie | 9 Sep 1890 | 10 May 1979 |
Richardson | Annabelle | 1916 | 1993 |
Richardson | Alfred | 1915 | 1987 |
Riegel | Solomon | NA | 18 May 1827 |
Riegel | Catherine | NA | 27 Dec 1882 |
Ritter | J | NA | 11 Nov 1917 |
Ritter | Mary | NA | 10 Mat 1903 |
Rockel | Balzer | 10 Nov 1707 | 9 June 1800 |
Rockel | Elisabetha | 24 June 1719 | 16 Oct 1794 |
Rockel | Johannes | 23 Mar 1749 | 4 Jan 1838 |
Rockel | Elizabeth | 7 Oct 1764 | 1 Mar 1835 |
Ross | George | 1 Aug 1885 | 8 June 1971 |
Ross | Euna | 2 July 1885 | 19 Mar 1938 |
Ruckel | Mary | NA | NA |
Ruckel | Melchir | 27 may 1769 | 15 Apr 1832 |
Russell | James | NA | NA |
Russell | Ana | NA | NA |
NA | NA | NA | NA |
NA | NA | NA | NA |
NA | NA | NA | NA |
Siliven | Jenniel | NA | 30 Aug 1873 |
Sinks | A | NA | NA |
Sinks | Francis | 20 Nov 1837 | 17 Jan 1909 |
Sinks | Delphia | 27 Dec 1838 | 11 Dec 1931 |
Sinks | Salem | NA | 26 Oct 1869 |
Sinks | Daniel | 1841 | 1923 |
Sinks | Martha | 1856 | 1924 |
Sinks | Roy | 1889 | 1908 |
Sinks | Elizabeth | 10 Mar 1835 | 10 Aug 1911 |
Sinks | infant son | 18 Dec. 1898 | 18 Dec. 1898 |
Sinks | John | NA | 20 Nov 1893 |
Sinks | Mary | NA | 23 Aug 1906 |
Sinks | William | NA | 27 Spt 1953 |
Sinks | Charlotte | NA | 31 Oct 1910 |
Sinks | Anna | NA | 5 Aug 1909 |
Sinks | Leonard | NA | 26 Oct 1919 |
Sinks | Etta Faye | 31 Mar 1910 | 19 Mar 1989 |
Sinks | John | NA | 28 Jan 1918 |
Sinks | Sena | 18 Spt 1888 | 13 Dec 1972 |
Sinks | William | 11 Spt 1914 | 30 Spt 1986 |
Sweet | Jewell | 1912 | 1990 |
Sinks | Francis | 1908 | 1988 |
Sinks | Arlie | 1883 | 1951 |
Sinks | Viola | 1882 | 1966 |
Sinks | Leonard | NA | NA |
Sinks | Mae | NA | NA |
Sinks | Bessie | 21 Nov 1889 | 24 Apr 1979 |
Sinks | Caroline | NA | 17 Apr 1876 |
Sinks | Arlie | 27 July 1916 | 4 July 2009 |
Sinks | Eva | 15 Jan 1914 | 8 Sep 1002 |
Solt | Conrad | 29 Spt. 1758 | 24 Dec 1825 |
Solt | Conrad | 20 Mar 1753 | 25 Aug 1830 |
Solt | Maria | 1760 | 23 Dec 1839 |
Sfafford | Trice | NA | NA |
Sfafford | Phebe | NA | NA |
Steen | Richard | NA | NA |
VanCleve | Martin | 1860 | 1946 |
VanCleve | Florence | 1867 | 1946 |
VanCleve | W | 1 Jan 1813 | 20 May 1886 |
VanCleve | Nancy | 26 Nov 1814 | 22 Feb 1902 |
VanCleve | J | 20 Dec 1850 | 31 Oct 1872 |
VanCleave | W | 29 June 1838 | 15 Jan 1914 |
VanCleave | Elizabeth | 3 Sep 1840 | 25 May 1901 |
Veach | Pleasant | 19 Dec 1845 | 23 Jan 1917 |
Veach | Victoria | 4 Apr 1849 | 10 Apr 1921 |
Veach | Ward | 1886 | 1916 |
Veach | Cynthia | 1861 | 1921 |
Veach | James | 1860 | 1890 |
Veach | James | 1857 | 1939 |
Veach | Nannie | 1863 | 1953 |
Veatch | John | 11 Nov 1776 | 17 Spt 1844 |
Veatch | Eleanor | 26 Oct 1773 | 22 July 1852 |
Veach | William | 1882 | 1957 |
Veach | James | NA | NA |
Veach | Rachel | NA | NA |
Veach | Pleasant | 1 Oct 1837 | 18 Jan 1895 |
Veach | Mary | NA | 16 Nov 1876 |
Veatch | Parmelia | 9 Feb 1808 | 31 Jan 1867 |
Veatch | Mary | 5 Feb 1842 | NA |
Veatch | Elnor | NA | 1834 |
Veatch | Frelin | 21 Mar 1825 | 18 Jan 1848 |
Veach-Nutty | NA | NA | NA |
Veach | John | 1863 | 1950 |
Veach | Rose | 1868 | 1948 |
Veach | Ruth | 1900 | 1901 |
Ware | Turner | 14 Feb 1817 | 5 Feb 1902 |
Ware | Martha | NA | 23 Mar 1881 |
Ware | Joseph | NA | NA |
Ware | Caroline | NA | NA |
Webber | Dick | 7 Aug 1881 | 3 Dec 1942 |
Webber | Pearl | 17 Apr 1889 | 10 Apr 1967 |
Weir | James | 1799 | 1879 |
Weir | Mary | 1801 | 1886 |
Wier | Leticia | 29 Dec 1836 | 26 Mar 1865 |
Whiteside | Lucinda | 21 Apr 1858 | 31 Jan 1936 |
Whiteside | John | 2 Apr 1852 | 30 Sep 1913 |
Willis | Matha | 10 Mar 1830 | 13 Oct 1925 |
Wilson | Jessie | 14 Apr 1924 | 16 May 1996 |
Wilson | Mary | 7 Dec 1927 | 2 June 1978 |
Wilson | Joseph | 26 Feb 1825 | Jan 1862 |
Wilson | Elisha | 24 May 1800 | 9 July 1873 |
Wilson | Sallie | 15 Oct 1807 | 4 Apr 1866 |
Wilson | Lutetita | 1826 | 1907 |
Wilson | Thomas | NA | 1922 |
Wilson | Sarah | NA | NA |
Wilson | Elisha | 1841 | 1907 |
Wilson | Martha | 1840 | 1929 |
Wilson | Charles | 23 Jan 1908 | 19 Apr 1955 |
Wilson | Zack | 1848 | 28 Sep 1918 |
Wilson | Juritha | 6 Aug 1854 | 28 Sep 1916 |
Wilson | Elisha | 1841 | 1907 |
Wilson | Drury | 10 Dec 1827 | 5 Feb 1911 |
Wilson | Mary | 17 Jan 1837 | 1 Mar 1900 |
Wilson | Sandifer | 15 Oct 1834 | 8 Jan 1929 |
Wilson | Nancy | 26 Oct 1842 | 1 Apr 1913 |
Wise | Luvena | NA | NA |
Wollard | John | 12 Jul 1846 | 9 Jan 1918 |
Woolard | Nettie | 8 Spt 1874 | 28 Nov 1910 |
Woolard | Millie | 13 Nov 1837 | 19 Apr 1932 |
Woolard | Lawrence | 25 Dec 1877 | 12 May 1923 |
Woolard | Etta | 14 July 1883 | 7 Dec 1945 |
Woolard | John | 21 Jan 1872 | 19 Oct 1936 |
Woolard | James | 30 June 1908 | 12 Oct 1966 |
Woolard | C | 2 Oct 1882 | 9 Dec 1953 |
Woolard | Blanche | 1 Jan 1885 | 25 July 1959 |
Woolard | L | 15 Spt. 1811 | 9 May 1878 |
Woolard | Ama | 23 May 1819 | 5 Dec 1891 |
Woolard | Robert | 1863 | 1938 |
Woolard | James | 12 Apr 1848 | 7 Spt 1888 |
Woolard | Romey | 23 May 1922 | 12 Spt. 1944 |
Woolard | Anna | NA | 30 Spt 1887 |
Woolard | James | NA | 27 Spt 1878 |
Woolard | Francis | NA | 20 Feb. 1867 |
Woolard | Turner | NA | 20 Oct 1861 |
Woolard | William | 22 Aug 1857 | 2 Jan 1858 |
Woolard | George | 28 Mar 1867 | 5 Jan 1935 |
Woolard | Nancy | 1869 | 1955 |
A few things to note on first glance. The majority of the dates use day month(abbreviated) year. Some entries only have the year. There are NAs. A few abbreviations have a period (Nov. in the Follis Fawn entry). September is abbreviated as both Spt and Sep. There are some dates that are guesses ([1949?] in the Doley John record.) I will replace Spt with Sep and the period with “” using str_replace_all() as above. I’m making the corrections in the original columns (DOB, DOD) but putting the date version in new variables. That way it is easy to check that the conversions were done correctly.
The date type must have a day, month, and year, so the years only records will become NAs. I don’t think this matters for my application, but if it did, I’d probably make an integer column for the year data. The dates would need to be split into numeric month, day, and year field or possibly date and numeric year fields.
Cleaning Typos and Converting to Dates
I’m using base R as.Date()
for the conversion. This let’s you specify the format. A detailed explanation of specifying dates and the code can be found at the Epidemiologist R Handbook. Various functions from tidyverse package lubridate could also be used to parse the dates. If I were planning on doing more with the dates I’d probably use lubridate functions exclusively.
<- tombstones %>%
tombstones mutate(DOB = str_replace_all(DOB, "Spt", "Sep")) %>%
mutate(DOD = str_replace_all(DOD, "Spt", "Sep")) %>%
mutate(DOB = str_replace_all(DOB, "\\.", "")) %>%
mutate(DOD = str_replace_all(DOD, "\\.", "")) %>%
mutate(DOB_date = as.Date(DOB, format = "%d %b %Y")) %>%
mutate(DOD_date = as.Date(DOD, format = "%d %b %Y"))
Checking that everything worked.
%>%
tombstones select(Surname, First.Name,DOB, DOD, DOB_date, DOD_date) %>%
gt() %>%
tab_options(container.height = px(300), container.padding.y = px(24))
Surname | First.Name | DOB | DOD | DOB_date | DOD_date |
---|---|---|---|---|---|
Anderson | Abraham | 10 Mar 1776 | 15 Aug 1838 | 1776-03-10 | 1838-08-15 |
Anderson | Elizabeth | 29 Jan 1782 | 13 Oct 1869 | 1782-01-29 | 1869-10-13 |
Anderson | Zady | 18 Apr 1812 | 12 Dec 1839 | 1812-04-18 | 1839-12-12 |
Anderson | Albert | 28 Nov 1809 | 5 Nov 1882 | 1809-11-28 | 1882-11-05 |
Anderson | Adesia | 17 Mar 1808 | 26 Sep 1864 | 1808-03-17 | 1864-09-26 |
Anderson | May | NA | 9 Aug 1887 | NA | 1887-08-09 |
Anderson | E | 23 Sep 1877 | 24 Oct 1899 | 1877-09-23 | 1899-10-24 |
Anderson | William | 26 Feb 1836 | 31 Dec 1895 | 1836-02-26 | 1895-12-31 |
Anderson | Nancy | 2 Sep 1836 | 18 Oct 1917 | 1836-09-02 | 1917-10-18 |
Appleton | Richard | 1 Aug 1817 | 6 Oct 1897 | 1817-08-01 | 1897-10-06 |
Baldwin | John | 25 Sep 1845 | NA | 1845-09-25 | NA |
Baldwin | William | NA | NA | NA | NA |
Baggett | Mahalia | 3 June 1832 | 6 June 1897 | 1832-06-03 | 1897-06-06 |
Beasley | E | 13 May 1826 | NA | 1826-05-13 | NA |
Beasley | Josephine | 23 Dec 1858 | 25 Aug 1899 | 1858-12-23 | 1899-08-25 |
Beasley | Fanning | NA | 25 Aug 1899 | NA | 1899-08-25 |
Bell | John | 1825 | 1872 | NA | NA |
Bell | Mary | 1826 | 1859 | NA | NA |
Brazelton | Wm | NA | 10 Dec 1858 | NA | 1858-12-10 |
Brazelton | Esther | NA | 20 Sept 1840 | NA | NA |
Brown | Elizabeth | 1 July 1807 | 2 Feb 1888 | 1807-07-01 | 1888-02-02 |
Brown | Joel | 20 Dec 1803 | NA | 1803-12-20 | NA |
Bundy | Hope | 1833 | 1918 | NA | NA |
Bundy | Clem | NA | NA | NA | NA |
Bundy | Nancy | NA | NA | NA | NA |
Bundy | W | NA | NA | NA | NA |
Bundy | Charles | NA | NA | NA | NA |
Bundy | Thomas | 4 Feb 1881 | 22 Apr 1892 | 1881-02-04 | 1892-04-22 |
Bundy | Octavia | 16 Apr 1839 | 10 Mar 1932 | 1839-04-16 | 1932-03-10 |
Bundy | George | 1864 | 1949 | NA | NA |
Bundy | Lora | 1870 | 1957 | NA | NA |
Burgess | W | 1844 | 17 Dec 1907 | NA | 1907-12-17 |
Burgess | Alzada | 1854 | 1931 | NA | NA |
Clayton | G | 9 Apr 1846 | 9 Mar 1919 | 1846-04-09 | 1919-03-09 |
Clayton | Ellen | 28 Mar 1852 | 23 May 1917 | 1852-03-28 | 1917-05-23 |
Clayton | L | NA | 25 May 1909 | NA | 1909-05-25 |
Clayton | Mary | NA | 18 Jan 1908 | NA | 1908-01-18 |
Chapman | Daniel | NA | NA | NA | NA |
Chapman | Elizabeth | NA | NA | NA | NA |
Chapman | Caroline | NA | 24 Jan 1861 | NA | 1861-01-24 |
Chapman | Daniel | 5 Jul 1863 | Feb 1842 | 1863-07-05 | NA |
Chapman | Lucretia | 23 Apr 1769 | Aug 1849 | 1769-04-23 | NA |
Chapman | Samuel | 14 May 1794 | 15 Apr 1863 | 1794-05-14 | 1863-04-15 |
Chapman | Elizabeth | 23 May 1796 | 30 Dec 1859 | 1796-05-23 | 1859-12-30 |
Chapman | Laura | NA | 14 June 1865 | NA | 1865-06-14 |
Chapman | Polly | NA | NA | NA | NA |
Crockett | Mandy | 3 Sep 1844 | 5 Nov 1914 | 1844-09-03 | 1914-11-05 |
Crockett | John | 4 Oct 1844 | 25 July 1919 | 1844-10-04 | 1919-07-25 |
Davis | Ezra | 1887 | 1949 | NA | NA |
Davis | Lizzie | 1886 | 1955 | NA | NA |
Davis | Fred | NA | NA | NA | NA |
Dolch | Catherine | 11 Feb 1793 | 14 Feb 1867 | 1793-02-11 | 1867-02-14 |
Dolch | Christian | 24 Dec 1794 | 29 Aug 1874 | 1794-12-24 | 1874-08-29 |
Dolch | Peter | NA | 16 July 1862 | NA | 1862-07-16 |
Doley | George | 25 July 1823 | 26 Jan 1868 | 1823-07-25 | 1868-01-26 |
Doley | Katie | 8 Jan 1833 | 15 Sep 1927 | 1833-01-08 | 1927-09-15 |
Doley | Mary E | 23 Aug 1851 | 18 July 1853 | 1851-08-23 | 1853-07-18 |
Doley | Henriettie | 28 Feb 1853 | 18 July 1854 | 1853-02-28 | 1854-07-18 |
NA | MED | NA | NA | NA | NA |
NA | HD | NA | NA | NA | NA |
NA | GD | NA | NA | NA | NA |
NA | Mother | NA | NA | NA | NA |
NA | Father | NA | NA | NA | NA |
Doley | George | 14 Mar 1856 | 26 Jan 1860 | 1856-03-14 | 1860-01-26 |
Doley | James | 21 Jan 1858 | 11 July 1926 | 1858-01-21 | 1926-07-11 |
Doley | May | 1869 | 1940 | NA | NA |
Doley | John | 1864 | [1949?] | NA | NA |
Doley | Maggie | 1868 | 1917 | NA | NA |
Doley | William | 20 Dec 1862 | 24 Nov 1934 | 1862-12-20 | 1934-11-24 |
Doley | Dora | 17 Feb 1864 | 9 Apr 1934 | 1864-02-17 | 1934-04-09 |
Doley | L[eaman] | 17 Apr 1889 | 19 Nov 1960 | 1889-04-17 | 1960-11-19 |
Doley | G[uilford] | 5 Aug 1894 | 1 June 1960 | 1894-08-05 | 1960-06-01 |
Doley | D[ora] | 1897 | 1983 | NA | NA |
Doley | Eugene | 8 Nov 1886 | 12 June 1958 | 1886-11-08 | 1958-06-12 |
Doley | Lou | 29 Oct 1895 | 26 June 1963 | 1895-10-29 | 1963-06-26 |
Dorris | J[oseph] | 12 Feb 1821 | 8 Jan 1893 | 1821-02-12 | 1893-01-08 |
Dorris | Joseph | 10 Feb 1779 | 5 Nov 1866 | 1779-02-10 | 1866-11-05 |
Dorris | Sarah | 2 June 1790 | 23 June 1862 | 1790-06-02 | 1862-06-23 |
Dorris | W | 2 June 1790 | 5 Nov 1873 | 1790-06-02 | 1873-11-05 |
Dorris | A | NA | 7 Aug 1853 | NA | 1853-08-07 |
Dorris | J | 15 Jan 1822 | 28 Nov 1911 | 1822-01-15 | 1911-11-28 |
Dorris | Elizabeth | 7 Jan 1833 | 27 Aug 1895 | 1833-01-07 | 1895-08-27 |
Dorris | Robert | 24 Mar 1818 | 22 Aug 1894 | 1818-03-24 | 1894-08-22 |
Dorris | Rebecca | 22 mar 1829 | 24 Apr 1910 | 1829-03-22 | 1910-04-24 |
Dorris | Monroe | 1848 | 1928 | NA | NA |
Dorris | Della | 1884 | 1934 | NA | NA |
Dorris | Mary M | 1855 | 1927 | NA | NA |
Dorris | Harve | 13 Aug 1889 | 6 Apr 1966 | 1889-08-13 | 1966-04-06 |
Dorris | Carrie | 12 July 1892 | 7 Aug 1990 | 1892-07-12 | 1990-08-07 |
Dorris | Smith | NA | NA | NA | NA |
Dorris | Ada | NA | NA | NA | NA |
Dorris | William | NA | NA | NA | NA |
Dorris | Harvey | NA | NA | NA | NA |
Dorris | Cora | NA | NA | NA | NA |
Dorris | John | NA | NA | NA | NA |
Dorris | W | 1857 | 1936 | NA | NA |
Dorris | Gustavus | 2 Aug 1847 | 27 Feb 1923 | 1847-08-02 | 1923-02-27 |
Dorris | Sarah | NA | 17 Sep 1880 | NA | 1880-09-17 |
Dorris | Joseph | NA | NA | NA | NA |
Dorris | Della | NA | NA | NA | NA |
Dorris | William | 28 Nov 1818 | 17 Feb 1905 | 1818-11-28 | 1905-02-17 |
Dorris | Harriet | 28 Nov 1818 | 17 Feb 1905 | 1818-11-28 | 1905-02-17 |
Dorris | William | 1867 | 1942 | NA | NA |
Dorris | Mary | 1872 | 1960 | NA | NA |
Dorris | James | NA | 24 Mar 1877 | NA | 1877-03-24 |
Dorris | Sarah | 1 June 1837 | 6 Dec 1908 | 1837-06-01 | 1908-12-06 |
Dorris | W[illiam] | 2 Jan 1835 | 16 Feb 1898 | 1835-01-02 | 1898-02-16 |
Dorris | E[lisha] | 17 Dec 1853 | 7 May 1907 | 1853-12-17 | 1907-05-07 |
Dorris | Sarah | 28 Aug 1858 | 20 Feb 1922 | 1858-08-28 | 1922-02-20 |
Dorris | James | 1861 | 1921 | NA | NA |
Dorris | Georgia | 1860 | 1908 | NA | NA |
Dorris | William | NA | 10 Aug 1857 | NA | 1857-08-10 |
Dorris | Malinda | NA | 22 Mar 1850 | NA | 1850-03-22 |
Drake | Mary | 1843 | 1909 | NA | NA |
Dreisbach | Catherina | 21 Dec 1752 | 20 Sep 1819 | 1752-12-21 | 1819-09-20 |
Dreisbach | Johannes | 21 Aug 1752 | 20 Sep 1825 | 1752-08-21 | 1825-09-20 |
Everett | Semantha | NA | Apr 1845 | NA | NA |
Farris | Elizabeth | NA | 2 Feb 1866 | NA | 1866-02-02 |
Farris | Elizabeth | NA | 28 Sep 1858 | NA | 1858-09-28 |
Finch | Isaac | NA | 26 Nov 1813 | NA | 1813-11-26 |
Follis | Fawn | 9 Feb 1907 | 17 Nov 1909 | 1907-02-09 | 1909-11-17 |
Follis | Ralph | 7 Apr 1901 | 27 Feb 1906 | 1901-04-07 | 1906-02-27 |
Follis | A | NA | 17 Oct 1887 | NA | 1887-10-17 |
Follis | Christian | NA | 21 Oct 1892 | NA | 1892-10-21 |
Follis | G | NA | 20 Aug 1899 | NA | 1899-08-20 |
Follis | Ralph | 6 Oct 1892 | 4 Nov 1892 | 1892-10-06 | 1892-11-04 |
Follis | E | NA | 15 Mar 1890 | NA | 1890-03-15 |
Follis | William | 20 Aug 1832 | 25 Aug 1900 | 1832-08-20 | 1900-08-25 |
Follis | Martha | 10 Aug 1841 | 15 Aug 1881 | 1841-08-10 | 1881-08-15 |
Follis | Jeff | 10 Sep 1873 | 25 July 1905 | 1873-09-10 | 1905-07-25 |
Ford | Florence | 3 Sep 1866 | 1 July 1940 | 1866-09-03 | 1940-07-01 |
Fox | Frances | 27 Sep 1904 | 18 Jan 1997 | 1904-09-27 | 1997-01-18 |
Frost | Ebenezer | NA | NA | NA | NA |
NA | NA | NA | NA | NA | NA |
Frost | NA | NA | NA | NA | NA |
Fuqua | William | NA | NA | NA | NA |
Gregory | Leonard | 15 Mar 1887 | 13 June 1963 | 1887-03-15 | 1963-06-13 |
Gregory | Lucille | 1899 | 1986 | NA | NA |
Hart | Parmelia | 14 May 1847 | 17 Sep 1878 | 1847-05-14 | 1878-09-17 |
Hess | Amalphus | NA | 27 Mar 1872 | NA | 1872-03-27 |
Hess | Adolphus | NA | 23 June 1871 | NA | 1871-06-23 |
Hess | Samuel | 23 Dec 1823 | 5 Mar 1901 | 1823-12-23 | 1901-03-05 |
Hess | Augusta | 17 Jan 1828 | NA | 1828-01-17 | NA |
Hess | Ulysses | 21 Dec 1867 | 16 Feb 1897 | 1867-12-21 | 1897-02-16 |
Hess | Ulysses | 867 | 1897 | NA | NA |
Hess | William | 9 Dec 1851 | 26 Apr 1900 | 1851-12-09 | 1900-04-26 |
Hess | William | 1851 | 1900 | NA | NA |
Hess | Jerome | 1849 | 1931 | NA | NA |
Hess | Franklin | 1857 | 1935 | NA | NA |
Hess | Samuel | 1854 | 1949 | NA | NA |
Hess | Bernice | 20 Sep 1894 | 27 Sep 1978 | 1894-09-20 | 1978-09-27 |
Hess | Catherine | 1856 | 1906 | NA | NA |
Hess | George | 27 Feb 1864 | 3 Dec 1942 | 1864-02-27 | 1942-12-03 |
Holt | Lucinda | 15 Aug 1832 | 28 Aug 1888 | 1832-08-15 | 1888-08-28 |
Holt | William | 14 May 1818 | 26 Aug 1903 | 1818-05-14 | 1903-08-26 |
Horlacher | Daniel | 4 Aug 1735 | 24 Sep 1804 | 1735-08-04 | 1804-09-24 |
Horlacher | Margaretha | 4 Jan 1741 | 22 Apr 1806 | 1741-01-04 | 1806-04-22 |
Horrall | Polly | NA | NA | NA | NA |
Horrall | James | NA | 15 Apr 1848 | NA | 1848-04-15 |
Horrall | William | NA | NA | NA | NA |
Hurt | Elizabeth | NA | NA | NA | NA |
Jacobs | Jeremiah | NA | 30 Dec 1824 | NA | 1824-12-30 |
Jacobs | Rebecca | NA | 18 July 1813 | NA | 1813-07-18 |
Johnson | James | 25 Feb 1837 | NA | 1837-02-25 | NA |
Johnson | Mary | 5 Sep, 1844 | 12 Nov 1895 | NA | 1895-11-12 |
Jones | Levi | 1828 | 1892 | NA | NA |
Jones | Hester | 1841 | 1910 | NA | NA |
Jones | Ridley | NA | 2 Mar 1863 | NA | 1863-03-02 |
Jones | James | 1863 | 1933 | NA | NA |
Jones | Tina | 1874 | 1918 | NA | NA |
Jones | Ezra | 15 Sep 1867 | 19 May 1942 | 1867-09-15 | 1942-05-19 |
Jones | Nannie | 18 July 1968 | 18 July 1957 | 1968-07-18 | 1957-07-18 |
Jones | Samuel | 11 Feb 1876 | 22 Nov 1915 | 1876-02-11 | 1915-11-22 |
Jones | Melverda | 1 Nov 1879 | 2 Apr 1924 | 1879-11-01 | 1924-04-02 |
Jones | John | NA | NA | NA | NA |
Karnes | Willard | 1 Dec 1911 | 27 Oct 1990 | 1911-12-01 | 1990-10-27 |
Karnes | Ruth | 8 Jan 1917 | 8 Jan 2000 | 1917-01-08 | 2000-01-08 |
Keith | James | NA | 6 Oct 1841 | NA | 1841-10-06 |
Keth | Nancy | NA | 17 Nov 1827 | NA | 1827-11-17 |
Kleppinger | Anna | 29 Sep 1748 | 19 June 1817 | 1748-09-29 | 1817-06-19 |
Lipsey | Joe | 1888 | 1928 | NA | NA |
Lockwood | Eugenia | 11 July 1910 | 5 Oct 1994 | 1910-07-11 | 1994-10-05 |
Lockwood | Leland | 19 Feb 1914 | 24 Dec 1970 | 1914-02-19 | 1970-12-24 |
Loomis | Jon | NA | NA | NA | NA |
Mensch | Abraham | 6 Apr 1754 | 16 Mar 1826 | 1754-04-06 | 1826-03-16 |
Merrell | Azariah | 20 May 1777 | 25 Jan 1844 | 1777-05-20 | 1844-01-25 |
Merrell | Abigail | 1781 | 11 June 1844 | NA | 1844-06-11 |
Meredith | Eleandra | NA | 6 Oct 1875 | NA | 1875-10-06 |
Meredith | Micajah | NA | 26 Oct 1822 | NA | 1822-10-26 |
Meredith | Samuel | 8 Sept 1753 | 10 Oct 1825 | NA | 1825-10-10 |
Meredith | Elizabeth | 22 Jan 1757 | 6 Apr 1824 | 1757-01-22 | 1824-04-06 |
Meredith | Ruth | NA | 28 Aug 1856 | NA | 1856-08-28 |
Bell | Sarah | NA | 1860 | NA | NA |
John | Bell | 7 July 1812 | 4 July 1872 | 1812-07-07 | 1872-07-04 |
Meredith | Mary | 16 Apr 1793 | 11 Dec 1873 | 1793-04-16 | 1873-12-11 |
Meredith | Clarence | 1899 | 1979 | NA | NA |
Meredith | Cora | 1900 | 1983 | NA | NA |
Meredith | W | 1926 | 1945 | NA | NA |
Meredith | Susan | 28 July 1833 | 2 Mar 1919 | 1833-07-28 | 1919-03-02 |
Meredith | Hannah | 15 Nov 1830 | 21 Dec 1903 | 1830-11-15 | 1903-12-21 |
Meredith | Mary | NA | 11 Nov 1882 | NA | 1882-11-11 |
Meredith | Samuel | NA | 5 Jan 1884 | NA | 1884-01-05 |
Meredith | Belinda | NA | 4 Aug 1889 | NA | 1889-08-04 |
Tipton | Susannah | NA | 3 May 1852 | NA | 1852-05-03 |
Meredith | Thomas | NA | 20 Apr 1840 | NA | 1840-04-20 |
Meredith | Sarah | NA | 21 Mar 1830 | NA | 1830-03-21 |
Mildenberger | Anna | 29 Sep 1739 | 11 Oct 1777 | 1739-09-29 | 1777-10-11 |
Mildenberger | Nicolaus | 15 Oct 1781 | 19 Oct 1856 | 1781-10-15 | 1856-10-19 |
Miller | Myrtie | 18 July 1896 | 7 Dec 1991 | 1896-07-18 | 1991-12-07 |
Minnich | Elizabeth | NA | NA | NA | NA |
Minnich | John | NA | NA | NA | NA |
Mory | Catherina | 8 May 1758 | 25 Aug 1837 | 1758-05-08 | 1837-08-25 |
Mory | Gotthard | 20 Mar 1752 | 26 May 1843 | 1752-03-20 | 1843-05-26 |
Mory | Magdelena | 17 Sep 1759 | 26 Nov 1827 | 1759-09-17 | 1827-11-26 |
Mory | Peter | 3 May 1757 | [grass blocks date] | 1757-05-03 | NA |
Nagel | Anna | 28 Jul 1761 | 27 Mar 1840 | 1761-07-28 | 1840-03-27 |
Nagel | Anna | 9 Feb 1725 | 9 Sep 1790 | 1725-02-09 | 1790-09-09 |
Nagel | Caty | NA | 4 May 1817 | NA | 1817-05-04 |
Nagel | Daniel | NA | 7 May 1866 | NA | 1866-05-07 |
Nagel | Frederick | 26 Apr 1759 | 10 Mar 1839 | 1759-04-26 | 1839-03-10 |
Nagel | Friedrich | 1713 | 22 Nov 1779 | NA | 1779-11-22 |
Nagel | Johann | 15 Feb 1746 | 3 June 1823 | 1746-02-15 | 1823-06-03 |
Nagel | Maria | NA | NA | NA | NA |
Nagle | John | NA | 23 Nov 1870 | NA | 1870-11-23 |
Nagle | Mary | NA | 18 Mar 1870 | NA | 1870-03-18 |
Nagel | Henry | 1834 | 1913 | NA | NA |
Nagel | Mary | 1836 | 1921 | NA | NA |
Nagel | Will | 1858 | 1916 | NA | NA |
Nagel | Adeline | 1860 | 1941 | NA | NA |
NA | NA | NA | NA | NA | NA |
Nutty | John | 1907 | 1977 | NA | NA |
Nutty | Beatrice | 1909 | 1989 | NA | NA |
Nutty | John | 1907 | 1977 | NA | NA |
Ritter | NA | NA | NA | NA | NA |
Ritter | NA | NA | NA | NA | NA |
Odom | Archibald | 20 Feb 1838 | 20 May 1915 | 1838-02-20 | 1915-05-20 |
Odom | Cynthia | NA | NA | NA | NA |
Odom | G | 1854 | 1943 | NA | NA |
Odom | Sarah | 14 June 1854 | 24 Oct 1910 | 1854-06-14 | 1910-10-24 |
NA | Thomas | NA | 15 Nov 1887 | NA | 1887-11-15 |
Odum | Britton | 1794 | 1863 | NA | NA |
Odum | Wiley | 21 May 1879 | 2 Mar 1937 | 1879-05-21 | 1937-03-02 |
Odum | Sallie A | 15 May 1881 | 6 May 1946 | 1881-05-15 | 1946-05-06 |
Peters | Daniel | NA | 20 Aug 1808 | NA | 1808-08-20 |
Peters | Charlotte | NA | 20 June 1880 | NA | 1880-06-20 |
Pickard | William | 19 Feb 1839 | 6 Jan 1907 | 1839-02-19 | 1907-01-06 |
Pickard | Harriet | 1843 | 1916 | NA | NA |
Pickard | Louise | 1868 | 1935 | NA | NA |
Pletz | Karl | 1914 | 1975 | NA | NA |
Russell | Caroline | 1917 | 1995 | NA | NA |
Pickard | William | 1839 | 1907 | NA | NA |
Pickard | Harriet | 1847 | 1916 | NA | NA |
Pickard | Louise | 1868 | 1935 | NA | NA |
Pulliam | Frieda | NA | NA | NA | NA |
Pulliam | Amos | NA | NA | NA | NA |
Rex | William | NA | 27 Apr 1909 | NA | 1909-04-27 |
Rex | Elmina | NA | 25 Jan 1900 | NA | 1900-01-25 |
Rex | Mamie | NA | 7 Aug 1892 | NA | 1892-08-07 |
Rex | George | NA | 8 Dec 1891 | NA | 1891-12-08 |
Rex | Bertie | NA | 5 Nov 1887 | NA | 1887-11-05 |
Rex | Lulie | 10 Oct 1877 | 2 Oct 1878 | 1877-10-10 | 1878-10-02 |
Rex | Lily | 21 Sep 1871 | 11 Oct 1872 | 1871-09-21 | 1872-10-11 |
Rex | Arthur | 11 Aug 1870 | 25 Aug 1870 | 1870-08-11 | 1870-08-25 |
Rex | George | NA | NA | NA | NA |
Rex | Jno | NA | NA | NA | NA |
Rex | Guy | 28 May 1889 | 14 Oct 1983 | 1889-05-28 | 1983-10-14 |
Rex | Harlie | 9 Sep 1890 | 10 May 1979 | 1890-09-09 | 1979-05-10 |
Richardson | Annabelle | 1916 | 1993 | NA | NA |
Richardson | Alfred | 1915 | 1987 | NA | NA |
Riegel | Solomon | NA | 18 May 1827 | NA | 1827-05-18 |
Riegel | Catherine | NA | 27 Dec 1882 | NA | 1882-12-27 |
Ritter | J | NA | 11 Nov 1917 | NA | 1917-11-11 |
Ritter | Mary | NA | 10 Mat 1903 | NA | NA |
Rockel | Balzer | 10 Nov 1707 | 9 June 1800 | 1707-11-10 | 1800-06-09 |
Rockel | Elisabetha | 24 June 1719 | 16 Oct 1794 | 1719-06-24 | 1794-10-16 |
Rockel | Johannes | 23 Mar 1749 | 4 Jan 1838 | 1749-03-23 | 1838-01-04 |
Rockel | Elizabeth | 7 Oct 1764 | 1 Mar 1835 | 1764-10-07 | 1835-03-01 |
Ross | George | 1 Aug 1885 | 8 June 1971 | 1885-08-01 | 1971-06-08 |
Ross | Euna | 2 July 1885 | 19 Mar 1938 | 1885-07-02 | 1938-03-19 |
Ruckel | Mary | NA | NA | NA | NA |
Ruckel | Melchir | 27 may 1769 | 15 Apr 1832 | 1769-05-27 | 1832-04-15 |
Russell | James | NA | NA | NA | NA |
Russell | Ana | NA | NA | NA | NA |
NA | NA | NA | NA | NA | NA |
NA | NA | NA | NA | NA | NA |
NA | NA | NA | NA | NA | NA |
Siliven | Jenniel | NA | 30 Aug 1873 | NA | 1873-08-30 |
Sinks | A | NA | NA | NA | NA |
Sinks | Francis | 20 Nov 1837 | 17 Jan 1909 | 1837-11-20 | 1909-01-17 |
Sinks | Delphia | 27 Dec 1838 | 11 Dec 1931 | 1838-12-27 | 1931-12-11 |
Sinks | Salem | NA | 26 Oct 1869 | NA | 1869-10-26 |
Sinks | Daniel | 1841 | 1923 | NA | NA |
Sinks | Martha | 1856 | 1924 | NA | NA |
Sinks | Roy | 1889 | 1908 | NA | NA |
Sinks | Elizabeth | 10 Mar 1835 | 10 Aug 1911 | 1835-03-10 | 1911-08-10 |
Sinks | infant son | 18 Dec 1898 | 18 Dec 1898 | 1898-12-18 | 1898-12-18 |
Sinks | John | NA | 20 Nov 1893 | NA | 1893-11-20 |
Sinks | Mary | NA | 23 Aug 1906 | NA | 1906-08-23 |
Sinks | William | NA | 27 Sep 1953 | NA | 1953-09-27 |
Sinks | Charlotte | NA | 31 Oct 1910 | NA | 1910-10-31 |
Sinks | Anna | NA | 5 Aug 1909 | NA | 1909-08-05 |
Sinks | Leonard | NA | 26 Oct 1919 | NA | 1919-10-26 |
Sinks | Etta Faye | 31 Mar 1910 | 19 Mar 1989 | 1910-03-31 | 1989-03-19 |
Sinks | John | NA | 28 Jan 1918 | NA | 1918-01-28 |
Sinks | Sena | 18 Sep 1888 | 13 Dec 1972 | 1888-09-18 | 1972-12-13 |
Sinks | William | 11 Sep 1914 | 30 Sep 1986 | 1914-09-11 | 1986-09-30 |
Sweet | Jewell | 1912 | 1990 | NA | NA |
Sinks | Francis | 1908 | 1988 | NA | NA |
Sinks | Arlie | 1883 | 1951 | NA | NA |
Sinks | Viola | 1882 | 1966 | NA | NA |
Sinks | Leonard | NA | NA | NA | NA |
Sinks | Mae | NA | NA | NA | NA |
Sinks | Bessie | 21 Nov 1889 | 24 Apr 1979 | 1889-11-21 | 1979-04-24 |
Sinks | Caroline | NA | 17 Apr 1876 | NA | 1876-04-17 |
Sinks | Arlie | 27 July 1916 | 4 July 2009 | 1916-07-27 | 2009-07-04 |
Sinks | Eva | 15 Jan 1914 | 8 Sep 1002 | 1914-01-15 | 1002-09-08 |
Solt | Conrad | 29 Sep 1758 | 24 Dec 1825 | 1758-09-29 | 1825-12-24 |
Solt | Conrad | 20 Mar 1753 | 25 Aug 1830 | 1753-03-20 | 1830-08-25 |
Solt | Maria | 1760 | 23 Dec 1839 | NA | 1839-12-23 |
Sfafford | Trice | NA | NA | NA | NA |
Sfafford | Phebe | NA | NA | NA | NA |
Steen | Richard | NA | NA | NA | NA |
VanCleve | Martin | 1860 | 1946 | NA | NA |
VanCleve | Florence | 1867 | 1946 | NA | NA |
VanCleve | W | 1 Jan 1813 | 20 May 1886 | 1813-01-01 | 1886-05-20 |
VanCleve | Nancy | 26 Nov 1814 | 22 Feb 1902 | 1814-11-26 | 1902-02-22 |
VanCleve | J | 20 Dec 1850 | 31 Oct 1872 | 1850-12-20 | 1872-10-31 |
VanCleave | W | 29 June 1838 | 15 Jan 1914 | 1838-06-29 | 1914-01-15 |
VanCleave | Elizabeth | 3 Sep 1840 | 25 May 1901 | 1840-09-03 | 1901-05-25 |
Veach | Pleasant | 19 Dec 1845 | 23 Jan 1917 | 1845-12-19 | 1917-01-23 |
Veach | Victoria | 4 Apr 1849 | 10 Apr 1921 | 1849-04-04 | 1921-04-10 |
Veach | Ward | 1886 | 1916 | NA | NA |
Veach | Cynthia | 1861 | 1921 | NA | NA |
Veach | James | 1860 | 1890 | NA | NA |
Veach | James | 1857 | 1939 | NA | NA |
Veach | Nannie | 1863 | 1953 | NA | NA |
Veatch | John | 11 Nov 1776 | 17 Sep 1844 | 1776-11-11 | 1844-09-17 |
Veatch | Eleanor | 26 Oct 1773 | 22 July 1852 | 1773-10-26 | 1852-07-22 |
Veach | William | 1882 | 1957 | NA | NA |
Veach | James | NA | NA | NA | NA |
Veach | Rachel | NA | NA | NA | NA |
Veach | Pleasant | 1 Oct 1837 | 18 Jan 1895 | 1837-10-01 | 1895-01-18 |
Veach | Mary | NA | 16 Nov 1876 | NA | 1876-11-16 |
Veatch | Parmelia | 9 Feb 1808 | 31 Jan 1867 | 1808-02-09 | 1867-01-31 |
Veatch | Mary | 5 Feb 1842 | NA | 1842-02-05 | NA |
Veatch | Elnor | NA | 1834 | NA | NA |
Veatch | Frelin | 21 Mar 1825 | 18 Jan 1848 | 1825-03-21 | 1848-01-18 |
Veach-Nutty | NA | NA | NA | NA | NA |
Veach | John | 1863 | 1950 | NA | NA |
Veach | Rose | 1868 | 1948 | NA | NA |
Veach | Ruth | 1900 | 1901 | NA | NA |
Ware | Turner | 14 Feb 1817 | 5 Feb 1902 | 1817-02-14 | 1902-02-05 |
Ware | Martha | NA | 23 Mar 1881 | NA | 1881-03-23 |
Ware | Joseph | NA | NA | NA | NA |
Ware | Caroline | NA | NA | NA | NA |
Webber | Dick | 7 Aug 1881 | 3 Dec 1942 | 1881-08-07 | 1942-12-03 |
Webber | Pearl | 17 Apr 1889 | 10 Apr 1967 | 1889-04-17 | 1967-04-10 |
Weir | James | 1799 | 1879 | NA | NA |
Weir | Mary | 1801 | 1886 | NA | NA |
Wier | Leticia | 29 Dec 1836 | 26 Mar 1865 | 1836-12-29 | 1865-03-26 |
Whiteside | Lucinda | 21 Apr 1858 | 31 Jan 1936 | 1858-04-21 | 1936-01-31 |
Whiteside | John | 2 Apr 1852 | 30 Sep 1913 | 1852-04-02 | 1913-09-30 |
Willis | Matha | 10 Mar 1830 | 13 Oct 1925 | 1830-03-10 | 1925-10-13 |
Wilson | Jessie | 14 Apr 1924 | 16 May 1996 | 1924-04-14 | 1996-05-16 |
Wilson | Mary | 7 Dec 1927 | 2 June 1978 | 1927-12-07 | 1978-06-02 |
Wilson | Joseph | 26 Feb 1825 | Jan 1862 | 1825-02-26 | NA |
Wilson | Elisha | 24 May 1800 | 9 July 1873 | 1800-05-24 | 1873-07-09 |
Wilson | Sallie | 15 Oct 1807 | 4 Apr 1866 | 1807-10-15 | 1866-04-04 |
Wilson | Lutetita | 1826 | 1907 | NA | NA |
Wilson | Thomas | NA | 1922 | NA | NA |
Wilson | Sarah | NA | NA | NA | NA |
Wilson | Elisha | 1841 | 1907 | NA | NA |
Wilson | Martha | 1840 | 1929 | NA | NA |
Wilson | Charles | 23 Jan 1908 | 19 Apr 1955 | 1908-01-23 | 1955-04-19 |
Wilson | Zack | 1848 | 28 Sep 1918 | NA | 1918-09-28 |
Wilson | Juritha | 6 Aug 1854 | 28 Sep 1916 | 1854-08-06 | 1916-09-28 |
Wilson | Elisha | 1841 | 1907 | NA | NA |
Wilson | Drury | 10 Dec 1827 | 5 Feb 1911 | 1827-12-10 | 1911-02-05 |
Wilson | Mary | 17 Jan 1837 | 1 Mar 1900 | 1837-01-17 | 1900-03-01 |
Wilson | Sandifer | 15 Oct 1834 | 8 Jan 1929 | 1834-10-15 | 1929-01-08 |
Wilson | Nancy | 26 Oct 1842 | 1 Apr 1913 | 1842-10-26 | 1913-04-01 |
Wise | Luvena | NA | NA | NA | NA |
Wollard | John | 12 Jul 1846 | 9 Jan 1918 | 1846-07-12 | 1918-01-09 |
Woolard | Nettie | 8 Sep 1874 | 28 Nov 1910 | 1874-09-08 | 1910-11-28 |
Woolard | Millie | 13 Nov 1837 | 19 Apr 1932 | 1837-11-13 | 1932-04-19 |
Woolard | Lawrence | 25 Dec 1877 | 12 May 1923 | 1877-12-25 | 1923-05-12 |
Woolard | Etta | 14 July 1883 | 7 Dec 1945 | 1883-07-14 | 1945-12-07 |
Woolard | John | 21 Jan 1872 | 19 Oct 1936 | 1872-01-21 | 1936-10-19 |
Woolard | James | 30 June 1908 | 12 Oct 1966 | 1908-06-30 | 1966-10-12 |
Woolard | C | 2 Oct 1882 | 9 Dec 1953 | 1882-10-02 | 1953-12-09 |
Woolard | Blanche | 1 Jan 1885 | 25 July 1959 | 1885-01-01 | 1959-07-25 |
Woolard | L | 15 Sep 1811 | 9 May 1878 | 1811-09-15 | 1878-05-09 |
Woolard | Ama | 23 May 1819 | 5 Dec 1891 | 1819-05-23 | 1891-12-05 |
Woolard | Robert | 1863 | 1938 | NA | NA |
Woolard | James | 12 Apr 1848 | 7 Sep 1888 | 1848-04-12 | 1888-09-07 |
Woolard | Romey | 23 May 1922 | 12 Sep 1944 | 1922-05-23 | 1944-09-12 |
Woolard | Anna | NA | 30 Sep 1887 | NA | 1887-09-30 |
Woolard | James | NA | 27 Sep 1878 | NA | 1878-09-27 |
Woolard | Francis | NA | 20 Feb 1867 | NA | 1867-02-20 |
Woolard | Turner | NA | 20 Oct 1861 | NA | 1861-10-20 |
Woolard | William | 22 Aug 1857 | 2 Jan 1858 | 1857-08-22 | 1858-01-02 |
Woolard | George | 28 Mar 1867 | 5 Jan 1935 | 1867-03-28 | 1935-01-05 |
Woolard | Nancy | 1869 | 1955 | NA | NA |
Cleaning up Cemetery Names (strings)
The string/ character data is more difficult to clean, since the possibilities are endless. Almost anything could be correct. Correcting this data requires some subject matter expertise. I think I would like to group tombstones by cemetery in my map, so I do want to clean this up. However, I’m generally going to take a light touch with this.
Figuring Out the Types of Typos
Here, I expect that there is a limited set of cemeteries, so I can group the data to look for typos.
<- tombstones %>%
cems_unique distinct(Cemetery) %>%
arrange(Cemetery)
%>%
cems_unique gt() %>%
tab_options(container.height = px(300), container.padding.y = px(24))
Cemetery |
---|
Baggett Cem |
Baldwin Cem |
Bethlehem |
Bethlehem Baptist Church |
Bethlehem Cem |
Blockhouse |
Boner Cem |
Britton Odum Farm Commerce |
Campground Cem |
Casey Springs |
Casey Springs Cem |
Chapman-Veatch Cemeteryem |
Christ Church |
Cole Cem |
Corinth Zion Meth Cem |
Crocker Cem |
Denning |
Ebenezer Bapt. Ch. Cem. |
Ebenezer Cumberland Presbyterian Church Cem |
Eld SM Williams |
Ewing Cem |
Follis Cemetery |
Fredonia |
Friedensville Evangelican Lutheran Church |
Gladdish-Anderson |
Goshen Cumberland Presbyterian Church Cem |
Grapevine Cem |
Green Cem |
Greenbrier Cem |
Greenlawn |
Hanover Green |
Harmony Free Will Baptist Ch Cem |
Harmony Free Will Baptist Church |
Hartwell |
Hidlay Cem. |
Hudgens Cem. |
Jersey Baptis Ch |
Johnson Cem |
Johnston City Cem |
Lake Creek Cem. |
Lebanon Cem |
Lebanon Cumberland Presbyterian Ch |
Lebanon Cumberland Presbyterian Church |
Liberty Meth Cem |
Maplewood Cem |
Masonic |
Masonic & Oddfellows Cem |
Miller Cem |
Mt Olive Cem |
Mt Olive Meth Ch Cem |
Mt. Olive Cem. |
Nashville National Cem. |
New Chapel Methodist Church |
Oak Grove Cem |
Oddfellows |
Oddfellows Cem |
Old Bethel Cem |
Orlinda |
Orlinda Cem |
Pleasant Valley Bapt. Church Cem. |
Robinson Cem |
Rose Hill |
Russell Cem |
Spillertown Cem. |
Spring Hill |
St. John's Lutheran Church |
St. Paul's Lutheran "Blue" Church |
Trinity |
Trinity Methodist Church |
Union Grove Cem |
Vicksburg National Cem |
Vienna Fraternal Cem |
Webber Campground |
Weber-Campground |
Weir Cem. |
White Ash |
Williams Prairie Baptist Ch Cem |
Zion Stone Ch |
NA |
There are 79 unique names. I see some obvious issues. Sometimes the word cemetery is added to the end (or an abbreviation like Cem and Cem.). Sometimes it isn’t. Church is rendered as Ch. in some cases.
Cleaning the Easy Typos and Inconsistencies
Since all of these are cemeteries, I’m just going to remove that from the name. I will change all Ch. to Church and I will clean out all periods. Using str_replace_all()
as usual.
Order of operation matters here, I think. I want to replace Ch and Ch. with Church but I don’t want the Church to be replaced. So I either need a regex or I need to include something else in my pattern. I think I can use “Ch” and “Ch.” as the pattern. I can’t just clear out the periods and replace them with spaces and use “Ch” because some of the periods occur in the middle of the string and I’ll end up with extra spaces I need to clear out. Similarly, Cemetary
should be replaced before Cem
, otherwise I’ll have to clean out etary
. Meth
should be Methodist. Bapt.
should be Baptist. There is a Cemeterem
, which is clearly a typo
(The fact that I’m putting this in a new dataframe is a hint that I’m not as clever as I thought. Can you spot what I did wrong?)
<- tombstones %>%
tombstones2 mutate(Cemetery = str_replace_all(Cemetery, "Ch ", "Church")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Ch.", "Church")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "\\.", "")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Cemeteryem", "")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Cemetery", "")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Cem", "")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Meth ", "Methodist ")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Bapt ", "Baptist "))
See how I did…
<- tombstones2 %>%
cems_unique distinct(Cemetery) %>%
arrange(Cemetery)
%>%
cems_unique gt() %>%
tab_options(container.height = px(300), container.padding.y = px(24))
Cemetery |
---|
Baggett |
Baldwin |
Bethlehem |
Bethlehem |
Bethlehem Baptist Churchrch |
Blockhouse |
Boner |
Britton Odum Farm Commerce |
Campground |
Casey Springs |
Casey Springs |
Churchist Churchrch |
Churchpman-Veatch |
Cole |
Corinth Zion Methodist |
Crocker |
Denning |
Ebenezer Baptist Church |
Ebenezer Cumberland Presbyterian Churchrch |
Eld SM Williams |
Ewing |
Follis |
Fredonia |
Friedensville Evangelican Lutheran Churchrch |
Gladdish-Anderson |
Goshen Cumberland Presbyterian Churchrch |
Grapevine |
Green |
Greenbrier |
Greenlawn |
Hanover Green |
Harmony Free Will Baptist Churchrch |
Hartwell |
Hidlay |
Hudgens |
Jersey Baptis Ch |
Johnson |
Johnston City |
Lake Creek |
Lebanon |
Lebanon Cumberland Presbyterian Ch |
Lebanon Cumberland Presbyterian Churchrch |
Liberty Methodist |
Maplewood |
Masonic |
Masonic & Oddfellows |
Miller |
Mt Olive |
Mt Olive Methodist Churchrch |
Nashville National |
New Churchpel Methodist Churchrch |
Oak Grove |
Oddfellows |
Oddfellows |
Old Bethel |
Orlinda |
Orlinda |
Pleasant Valley Baptist Churchrch |
Robinson |
Rose Hill |
Russell |
Spillertown |
Spring Hill |
St John's Lutheran Churchrch |
St Paul's Lutheran "Blue" Churchrch |
Trinity |
Trinity Methodist Churchrch |
Union Grove |
Vicksburg National |
Vienna Fraternal |
Webber Campground |
Weber-Campground |
Weir |
White Ash |
Williams Prairie Baptist Churchrch |
Zion Stone Ch |
NA |
Badly! Almost like I should read my own digression about regex and escape characters. Even the period in Ch.
needs to be escaped, not just single periods or leading periods. So "Ch\\."
not "Ch."
as the pattern.
<- tombstones %>%
tombstones3 mutate(Cemetery = str_replace_all(Cemetery, "Ch ", "Church")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Ch\\.", "Church")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "\\.", "")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Cemeteryem", "")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Cemetery", "")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Cem", "")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Meth ", "Methodist ")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Bapt ", "Baptist "))
Did that work?
<- tombstones3 %>%
cems_unique distinct(Cemetery) %>%
arrange(Cemetery)
%>%
cems_unique gt() %>%
tab_options(container.height = px(300), container.padding.y = px(24))
Cemetery |
---|
Baggett |
Baldwin |
Bethlehem |
Bethlehem |
Bethlehem Baptist Church |
Blockhouse |
Boner |
Britton Odum Farm Commerce |
Campground |
Casey Springs |
Casey Springs |
Chapman-Veatch |
Christ Church |
Cole |
Corinth Zion Methodist |
Crocker |
Denning |
Ebenezer Baptist Church |
Ebenezer Cumberland Presbyterian Church |
Eld SM Williams |
Ewing |
Follis |
Fredonia |
Friedensville Evangelican Lutheran Church |
Gladdish-Anderson |
Goshen Cumberland Presbyterian Church |
Grapevine |
Green |
Greenbrier |
Greenlawn |
Hanover Green |
Harmony Free Will Baptist Church |
Hartwell |
Hidlay |
Hudgens |
Jersey Baptis Ch |
Johnson |
Johnston City |
Lake Creek |
Lebanon |
Lebanon Cumberland Presbyterian Ch |
Lebanon Cumberland Presbyterian Church |
Liberty Methodist |
Maplewood |
Masonic |
Masonic & Oddfellows |
Miller |
Mt Olive |
Mt Olive Methodist Church |
Nashville National |
New Chapel Methodist Church |
Oak Grove |
Oddfellows |
Oddfellows |
Old Bethel |
Orlinda |
Orlinda |
Pleasant Valley Baptist Church |
Robinson |
Rose Hill |
Russell |
Spillertown |
Spring Hill |
St John's Lutheran Church |
St Paul's Lutheran "Blue" Church |
Trinity |
Trinity Methodist Church |
Union Grove |
Vicksburg National |
Vienna Fraternal |
Webber Campground |
Weber-Campground |
Weir |
White Ash |
Williams Prairie Baptist Church |
Zion Stone Ch |
NA |
Better, but some names that look the same are coming up as distinct entries, like Bethlehem. This probably means that there are extraneous spaces floating around. These can be removed from the front and back of a string using the stringr function str_trim() with the side set to both.
<- tombstones3 %>%
tombstones3 mutate(Cemetery = str_trim(Cemetery, side = c("both")))
Checking…
<- tombstones3 %>%
cems_unique distinct(Cemetery) %>%
arrange(Cemetery)
%>%
cems_unique gt() %>%
tab_options(container.height = px(300), container.padding.y = px(24))
Cemetery |
---|
Baggett |
Baldwin |
Bethlehem |
Bethlehem Baptist Church |
Blockhouse |
Boner |
Britton Odum Farm Commerce |
Campground |
Casey Springs |
Chapman-Veatch |
Christ Church |
Cole |
Corinth Zion Methodist |
Crocker |
Denning |
Ebenezer Baptist Church |
Ebenezer Cumberland Presbyterian Church |
Eld SM Williams |
Ewing |
Follis |
Fredonia |
Friedensville Evangelican Lutheran Church |
Gladdish-Anderson |
Goshen Cumberland Presbyterian Church |
Grapevine |
Green |
Greenbrier |
Greenlawn |
Hanover Green |
Harmony Free Will Baptist Church |
Hartwell |
Hidlay |
Hudgens |
Jersey Baptis Ch |
Johnson |
Johnston City |
Lake Creek |
Lebanon |
Lebanon Cumberland Presbyterian Ch |
Lebanon Cumberland Presbyterian Church |
Liberty Methodist |
Maplewood |
Masonic |
Masonic & Oddfellows |
Miller |
Mt Olive |
Mt Olive Methodist Church |
Nashville National |
New Chapel Methodist Church |
Oak Grove |
Oddfellows |
Old Bethel |
Orlinda |
Pleasant Valley Baptist Church |
Robinson |
Rose Hill |
Russell |
Spillertown |
Spring Hill |
St John's Lutheran Church |
St Paul's Lutheran "Blue" Church |
Trinity |
Trinity Methodist Church |
Union Grove |
Vicksburg National |
Vienna Fraternal |
Webber Campground |
Weber-Campground |
Weir |
White Ash |
Williams Prairie Baptist Church |
Zion Stone Ch |
NA |
Using Regex Anchors to Help Clean
The abbreviation Ch at the end of the line doesn’t have a space after it, so it isn’t replaced. Here we can use an anchor in our regex to specify that we want to match the pattern at the end of a string. “Ch$” will match at the end and “^Ch” will match at the start of a string.
<- tombstones3 %>%
tombstones3 mutate(Cemetery = str_replace_all(Cemetery, "Ch$", "Church"))
Checking again.
<- tombstones3 %>%
cems_unique distinct(Cemetery) %>%
arrange(Cemetery)
%>%
cems_unique gt() %>%
tab_options(container.height = px(300), container.padding.y = px(24))
Cemetery |
---|
Baggett |
Baldwin |
Bethlehem |
Bethlehem Baptist Church |
Blockhouse |
Boner |
Britton Odum Farm Commerce |
Campground |
Casey Springs |
Chapman-Veatch |
Christ Church |
Cole |
Corinth Zion Methodist |
Crocker |
Denning |
Ebenezer Baptist Church |
Ebenezer Cumberland Presbyterian Church |
Eld SM Williams |
Ewing |
Follis |
Fredonia |
Friedensville Evangelican Lutheran Church |
Gladdish-Anderson |
Goshen Cumberland Presbyterian Church |
Grapevine |
Green |
Greenbrier |
Greenlawn |
Hanover Green |
Harmony Free Will Baptist Church |
Hartwell |
Hidlay |
Hudgens |
Jersey Baptis Church |
Johnson |
Johnston City |
Lake Creek |
Lebanon |
Lebanon Cumberland Presbyterian Church |
Liberty Methodist |
Maplewood |
Masonic |
Masonic & Oddfellows |
Miller |
Mt Olive |
Mt Olive Methodist Church |
Nashville National |
New Chapel Methodist Church |
Oak Grove |
Oddfellows |
Old Bethel |
Orlinda |
Pleasant Valley Baptist Church |
Robinson |
Rose Hill |
Russell |
Spillertown |
Spring Hill |
St John's Lutheran Church |
St Paul's Lutheran "Blue" Church |
Trinity |
Trinity Methodist Church |
Union Grove |
Vicksburg National |
Vienna Fraternal |
Webber Campground |
Weber-Campground |
Weir |
White Ash |
Williams Prairie Baptist Church |
Zion Stone Church |
NA |
Using the Geographic Data to Match Cemeteries
Now, I do have both geographic data and a subject matter expert (my father) to further refine this list. Let’s start with geography. Bethlehem and Bethlehem Baptist Church could be the same place.
%>%
tombstones3 filter(Cemetery == "Bethlehem" |
== "Bethlehem Baptist Church") %>%
Cemetery select(Cemetery, City, State, lat, long) %>%
gt() %>%
tab_options(container.height = px(300), container.padding.y = px(24))
Cemetery | City | State | lat | long |
---|---|---|---|---|
Bethlehem | Robertson Co | TN | 36.58917 | -87.02361 |
Bethlehem | Robertson Co | TN | 36.59000 | -87.02333 |
Bethlehem | Springfield | TN | 36.68833 | -86.76972 |
Bethlehem | Springfield | TN | 36.69194 | -86.76889 |
Bethlehem | Springfield | TN | 36.69222 | -86.76889 |
Bethlehem | Springfield | TN | 36.69222 | -86.76889 |
Bethlehem | Springfield | TN | 36.69250 | -86.76861 |
Bethlehem | Springfield | TN | NA | NA |
Bethlehem | Springfield | TN | NA | NA |
Bethlehem Baptist Church | Springfield | TN | 36.69000 | -86.76861 |
Bethlehem Baptist Church | Springfield | TN | 36.69222 | -86.77306 |
Bethlehem Baptist Church | Springfield | TN | 36.68972 | -86.76861 |
Bethlehem Baptist Church | Springfield | TN | NA | NA |
Bethlehem | Robertson Co. | TN | NA | NA |
Bethlehem | Robertson Co. | TN | NA | NA |
It turns out that Roberston County’s county seat is Springfield, so these entries are all the same. I’d say this is a pretty common type of problem that you might run into. The spreadsheet column says City, but what it really means is something like “the local level of geography that is meaningful to me”. So thinking about what the data represents rather than how it is labeled can help guide how you handle it.
So, perhaps you could check if cemeteries were the same by calculating the distance between the two sets of coordinates. There is a calculator here, which gives a distance as <0.2 mi for one comparison between Bethlehem and Bethlehem Baptist Church. You could also calculate the distance programmatically, see code block 14 in this blog post. You could pull in other geographic data and correct/ add data so you have both city and county if applicable.
For now, I’m going to make the correction so all entries say Bethlehem Baptist Church. I’m not going to correct the city/county issue.
<- tombstones3 %>%
tombstones3 mutate(Cemetery = str_replace_all(Cemetery, "Bethlehem$", "Bethlehem Baptist Church"))
%>% filter(Cemetery == "Bethlehem" |
tombstones3 == "Bethlehem Baptist Church") %>%
Cemetery select(Cemetery, City, State, lat, long) %>%
gt() %>%
tab_options(container.height = px(300), container.padding.y = px(24))
Cemetery | City | State | lat | long |
---|---|---|---|---|
Bethlehem Baptist Church | Robertson Co | TN | 36.58917 | -87.02361 |
Bethlehem Baptist Church | Robertson Co | TN | 36.59000 | -87.02333 |
Bethlehem Baptist Church | Springfield | TN | 36.68833 | -86.76972 |
Bethlehem Baptist Church | Springfield | TN | 36.69194 | -86.76889 |
Bethlehem Baptist Church | Springfield | TN | 36.69222 | -86.76889 |
Bethlehem Baptist Church | Springfield | TN | 36.69222 | -86.76889 |
Bethlehem Baptist Church | Springfield | TN | 36.69250 | -86.76861 |
Bethlehem Baptist Church | Springfield | TN | NA | NA |
Bethlehem Baptist Church | Springfield | TN | NA | NA |
Bethlehem Baptist Church | Springfield | TN | 36.69000 | -86.76861 |
Bethlehem Baptist Church | Springfield | TN | 36.69222 | -86.77306 |
Bethlehem Baptist Church | Springfield | TN | 36.68972 | -86.76861 |
Bethlehem Baptist Church | Springfield | TN | NA | NA |
Bethlehem Baptist Church | Robertson Co. | TN | NA | NA |
Bethlehem Baptist Church | Robertson Co. | TN | NA | NA |
There aren’t that many possible duplicates.
Lebanon |
Lebanon Cumberland Presbyterian Church |
Mt Olive |
Mt Olive Methodist Church |
Trinity |
Trinity Methodist Church |
Webber Campground |
Weber-Campground |
Talking to the Subject Matter Expert
Sometimes it is easier just to go talk to the subject matter expert than it is to come up with complicated programmatic solutions. Sometimes that isn’t possible; maybe you don’t even know who generated the data. Or perhaps their time is much more valuable than yours. That’s why I also came up with a solution (using geographic data to match cemeteries) that could be done independently, even though I can ask the expert.
For me, a 5 minute conversation eliminated the need to write lots more code. It turns out that Webber (correct spelling) Campground Cemetery, Weber Campground and Campground are all the same. I wouldn’t have caught Campground as being a match, but my father mentioned it. These entries also switch between city and county in the location part of the table, making it even trickier. It turns out all of these pairs are the same, so I’m going to correct them. I’m also writing this back to the main tombstones dataframe, rather than continuing with intermediate variables. My father also told me that Johnston City Cemetery is actually Shakerag Masonic Cemetery. He also stated that Oddfellows was Oddfellow and Masonic and Oddfellows was Masonic and Odd Fellows. So I’ll correct these too.
<- tombstones3 %>%
tombstones mutate(Cemetery = str_replace_all(Cemetery, "Lebanon$", "Lebanon Cumberland Presbyterian Church")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Mt Olive$", "Mt Olive Methodist Church")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Trinity$", "Trinity Methodist Church")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Weber-Campground", "Webber Campground")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "^Campground", "Webber Campground")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Johnston City", "Shakerag Masonic")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Masonic & Oddfellows", "Masonic & Odd Fellows")) %>%
mutate(Cemetery = str_replace_all(Cemetery, "Oddfellows", "Oddfellow"))
Final check.
<- tombstones %>%
cems_unique distinct(Cemetery) %>%
arrange(Cemetery)
%>%
cems_unique gt() %>%
tab_options(container.height = px(300), container.padding.y = px(24))
Cemetery |
---|
Baggett |
Baldwin |
Bethlehem Baptist Church |
Blockhouse |
Boner |
Britton Odum Farm Commerce |
Casey Springs |
Chapman-Veatch |
Christ Church |
Cole |
Corinth Zion Methodist |
Crocker |
Denning |
Ebenezer Baptist Church |
Ebenezer Cumberland Presbyterian Church |
Eld SM Williams |
Ewing |
Follis |
Fredonia |
Friedensville Evangelican Lutheran Church |
Gladdish-Anderson |
Goshen Cumberland Presbyterian Church |
Grapevine |
Green |
Greenbrier |
Greenlawn |
Hanover Green |
Harmony Free Will Baptist Church |
Hartwell |
Hidlay |
Hudgens |
Jersey Baptis Church |
Johnson |
Lake Creek |
Lebanon Cumberland Presbyterian Church |
Liberty Methodist |
Maplewood |
Masonic |
Masonic & Odd Fellows |
Miller |
Mt Olive Methodist Church |
Nashville National |
New Chapel Methodist Church |
Oak Grove |
Oddfellow |
Old Bethel |
Orlinda |
Pleasant Valley Baptist Church |
Robinson |
Rose Hill |
Russell |
Shakerag Masonic |
Spillertown |
Spring Hill |
St John's Lutheran Church |
St Paul's Lutheran "Blue" Church |
Trinity Methodist Church |
Union Grove |
Vicksburg National |
Vienna Fraternal |
Webber Campground |
Weir |
White Ash |
Williams Prairie Baptist Church |
Zion Stone Church |
NA |
Lastly, I’m going to replace the NAs with a blank. This will make things nicer if I use this for a label.
<- tombstones %>%
tombstones mutate(Cemetery = ifelse(is.na(Cemetery), "", Cemetery))
Cleaning Up Names (strings)
In addition to cleaning up typos and such from the names, I’m going to construct a some compound names following my father’s photo naming pattern.
My father’s photo naming convention is mostly “last name first name middle initial”. He often includes other information, like if the photo was a close up or distance or if the stone was rubbed with chalk to enhance. So I can’t use exact pattern matching, but rather need to use partial pattern matching.
This is another situation where having some clear guidelines about how data will be handled is important. I have decided to prioritize correct matches over more matches. Take my name- Louise Elaine Sinks. It could be rendered as LE Sinks, L E Sinks, L Sinks, Louise E Sinks, and Louise Sinks. I could capture all occurrences of my name by matching on all of these variations, but I’m also more likely to get false matches. In the context of this dataset, names and name patterns are often reused within a family, making false matches even more likely. (I inherited my grandfather’s nameplate and have it on my desk- L E Sinks Jr- since I share the same name pattern with him.)
I will create the following patterns for matching:
- Full Name with whatever is in the Middle Name column (e.g. Louise Elaine Sinks)
- If First and Middle name are just initials, I will create a version with and without spaces, since I see it done both ways in the photo names (e.g. LE Sinks and L E Sinks)
- If no middle name is available, I will use First and Last Name (e.g. Louise Sinks)
I will not:
- Omit the middle name if it is available. (e.g. Louise Elaine Sinks will never be Louise Sinks)
- Truncate a name to an initial. (e.g. If the entry says Louise Elaine Sinks, I will not use the pattern L E Sinks or Louise E Sinks)
These rules seem complicated, but when you have a set of unique individuals with names like A T Sinks, Arlie T Sinks and Arlie Sinks, you need to think very carefully about how you will distinguish them when you are doing partial matching.
With that said, let’s clean up some names.
Cleaning up First Names
Look at what we are dealing with.
%>%
tombstones_raw select(First.Name) %>%
gt() %>%
tab_options(container.height = px(300), container.padding.y = px(24))
First.Name |
---|
Abraham |
Elizabeth |
Zady |
Albert |
Adesia |
May |
E |
William |
Nancy |
Richard |
John |
William |
Mahalia |
E |
Josephine |
Fanning |
John |
Mary |
Wm |
Esther |
Elizabeth |
Joel |
Hope |
Clem |
Nancy |
W |
Charles |
Thomas |
Octavia |
George |
Lora |
W |
Alzada |
G |
Ellen |
L |
Mary |
Daniel |
Elizabeth |
Caroline |
Daniel |
Lucretia |
Samuel |
Elizabeth |
Laura |
Polly |
Mandy |
John |
Ezra |
Lizzie |
Fred |
Catherine |
Christian |
Peter |
George |
Katie |
Mary E |
Henriettie |
MED |
HD |
GD |
Mother |
Father |
George |
James |
May |
John |
Maggie |
William |
Dora |
L[eaman] |
G[uilford] |
D[ora] |
Eugene |
Lou |
J[oseph] |
Joseph |
Sarah |
W |
A |
J |
Elizabeth |
Robert |
Rebecca |
Monroe |
Della |
Mary M |
Harve |
Carrie |
Smith |
Ada |
William |
Harvey |
Cora |
John |
W |
Gustavus |
Sarah |
Joseph |
Della |
William |
Harriet |
William |
Mary |
James |
Sarah |
W[illiam] |
E[lisha] |
Sarah |
James |
Georgia |
William |
Malinda |
Mary |
Catherina |
Johannes |
Semantha |
Elizabeth |
Elizabeth |
Isaac |
Fawn |
Ralph |
A |
Christian |
G |
Ralph |
E |
William |
Martha |
Jeff |
Florence |
Frances |
Ebenezer |
NA |
NA |
William |
Leonard |
Lucille |
Parmelia |
Amalphus |
Adolphus |
Samuel |
Augusta |
Ulysses |
Ulysses |
William |
William |
Jerome |
Franklin |
Samuel |
Bernice |
Catherine |
George |
Lucinda |
William |
Daniel |
Margaretha |
Polly |
James |
William |
Elizabeth |
Jeremiah |
Rebecca |
James |
Mary |
Levi |
Hester |
Ridley |
James |
Tina |
Ezra |
Nannie |
Samuel |
Melverda |
John |
Willard |
Ruth |
James |
Nancy |
Anna |
Joe |
Eugenia |
Leland |
Jon |
Abraham |
Azariah |
Abigail |
Eleandra |
Micajah |
Samuel |
Elizabeth |
Ruth |
Sarah |
Bell |
Mary |
Clarence |
Cora |
W |
Susan |
Hannah |
Mary |
Samuel |
Belinda |
Susannah |
Thomas |
Sarah |
Anna |
Nicolaus |
Myrtie |
Elizabeth |
John |
Catherina |
Gotthard |
Magdelena |
Peter |
Anna |
Anna |
Caty |
Daniel |
Frederick |
Friedrich |
Johann |
Maria |
John |
Mary |
Henry |
Mary |
Will |
Adeline |
NA |
John |
Beatrice |
John |
NA |
NA |
Archibald |
Cynthia |
G |
Sarah |
Thomas |
Britton |
Wiley |
Sallie A |
Daniel |
Charlotte |
William |
Harriet |
Louise |
Karl |
Caroline |
William |
Harriet |
Louise |
Frieda |
Amos |
William |
Elmina |
Mamie |
George |
Bertie |
Lulie |
Lily |
Arthur |
George |
Jno |
Guy |
Harlie |
Annabelle |
Alfred |
Solomon |
Catherine |
J |
Mary |
Balzer |
Elisabetha |
Johannes |
Elizabeth |
George |
Euna |
Mary |
Melchir |
James |
Ana |
NA |
NA |
NA |
Jenniel |
A |
Francis |
Delphia |
Salem |
Daniel |
Martha |
Roy |
Elizabeth |
infant son |
John |
Mary |
William |
Charlotte |
Anna |
Leonard |
Etta Faye |
John |
Sena |
William |
Jewell |
Francis |
Arlie |
Viola |
Leonard |
Mae |
Bessie |
Caroline |
Arlie |
Eva |
Conrad |
Conrad |
Maria |
Trice |
Phebe |
Richard |
Martin |
Florence |
W |
Nancy |
J |
W |
Elizabeth |
Pleasant |
Victoria |
Ward |
Cynthia |
James |
James |
Nannie |
John |
Eleanor |
William |
James |
Rachel |
Pleasant |
Mary |
Parmelia |
Mary |
Elnor |
Frelin |
NA |
John |
Rose |
Ruth |
Turner |
Martha |
Joseph |
Caroline |
Dick |
Pearl |
James |
Mary |
Leticia |
Lucinda |
John |
Matha |
Jessie |
Mary |
Joseph |
Elisha |
Sallie |
Lutetita |
Thomas |
Sarah |
Elisha |
Martha |
Charles |
Zack |
Juritha |
Elisha |
Drury |
Mary |
Sandifer |
Nancy |
Luvena |
John |
Nettie |
Millie |
Lawrence |
Etta |
John |
James |
C |
Blanche |
L |
Ama |
Robert |
James |
Romey |
Anna |
James |
Francis |
Turner |
William |
George |
Nancy |
Sometimes the tombstone only has the initial, but my father knows the full name through other means. This would be rendered as L[ouise]. The photos will usually use L not Louise, so I’m just removing this extra data and storing it elsewhere. I do this with str_extract() and then I replace it with a blank using str_replace_all(). There are also extra spaces that I will trim off as before with str_trim().
Using Regex Quantity Codes
The regex for getting rid of the bracketed information is a bit more complicated than what I’ve used before. I want something that looks for [ followed by any number of other characters followed by ]. The brackets are special characters, so they need to be escaped. So that would be “\\[ \\]” for the front and back of the pattern. The middle can be anything. This is where regex is so powerful. As I mentioned before, an un-escaped period will match to any character. There are also quantity codes- a plus sign will match one or more. So I use “\\[.+\\]” which will match one or more characters inside brackets. (I could use the quantity symbol * which matches 0 or more, but there will never be empty brackets in this dataset.)
<- tombstones %>%
tombstones mutate(Extra_First_Name = str_extract(First.Name, "\\[.+\\]")) %>%
mutate(First.Name = str_replace(First.Name, "\\[.+\\]", "")) %>%
mutate(First.Name = str_replace(First.Name, "\\.", "")) %>%
mutate(First.Name = str_trim(First.Name, side = c("both"))) %>%
drop_na(First.Name)
Look at the data.
%>%
tombstones select(First.Name) %>%
gt() %>%
tab_options(container.height = px(300), container.padding.y = px(24))
First.Name |
---|
Abraham |
Elizabeth |
Zady |
Albert |
Adesia |
May |
E |
William |
Nancy |
Richard |
John |
William |
Mahalia |
E |
Josephine |
Fanning |
John |
Mary |
Wm |
Esther |
Elizabeth |
Joel |
Hope |
Clem |
Nancy |
W |
Charles |
Thomas |
Octavia |
George |
Lora |
W |
Alzada |
G |
Ellen |
L |
Mary |
Daniel |
Elizabeth |
Caroline |
Daniel |
Lucretia |
Samuel |
Elizabeth |
Laura |
Polly |
Mandy |
John |
Ezra |
Lizzie |
Fred |
Catherine |
Christian |
Peter |
George |
Katie |
Mary E |
Henriettie |
MED |
HD |
GD |
Mother |
Father |
George |
James |
May |
John |
Maggie |
William |
Dora |
L |
G |
D |
Eugene |
Lou |
J |
Joseph |
Sarah |
W |
A |
J |
Elizabeth |
Robert |
Rebecca |
Monroe |
Della |
Mary M |
Harve |
Carrie |
Smith |
Ada |
William |
Harvey |
Cora |
John |
W |
Gustavus |
Sarah |
Joseph |
Della |
William |
Harriet |
William |
Mary |
James |
Sarah |
W |
E |
Sarah |
James |
Georgia |
William |
Malinda |
Mary |
Catherina |
Johannes |
Semantha |
Elizabeth |
Elizabeth |
Isaac |
Fawn |
Ralph |
A |
Christian |
G |
Ralph |
E |
William |
Martha |
Jeff |
Florence |
Frances |
Ebenezer |
William |
Leonard |
Lucille |
Parmelia |
Amalphus |
Adolphus |
Samuel |
Augusta |
Ulysses |
Ulysses |
William |
William |
Jerome |
Franklin |
Samuel |
Bernice |
Catherine |
George |
Lucinda |
William |
Daniel |
Margaretha |
Polly |
James |
William |
Elizabeth |
Jeremiah |
Rebecca |
James |
Mary |
Levi |
Hester |
Ridley |
James |
Tina |
Ezra |
Nannie |
Samuel |
Melverda |
John |
Willard |
Ruth |
James |
Nancy |
Anna |
Joe |
Eugenia |
Leland |
Jon |
Abraham |
Azariah |
Abigail |
Eleandra |
Micajah |
Samuel |
Elizabeth |
Ruth |
Sarah |
Bell |
Mary |
Clarence |
Cora |
W |
Susan |
Hannah |
Mary |
Samuel |
Belinda |
Susannah |
Thomas |
Sarah |
Anna |
Nicolaus |
Myrtie |
Elizabeth |
John |
Catherina |
Gotthard |
Magdelena |
Peter |
Anna |
Anna |
Caty |
Daniel |
Frederick |
Friedrich |
Johann |
Maria |
John |
Mary |
Henry |
Mary |
Will |
Adeline |
John |
Beatrice |
John |
Archibald |
Cynthia |
G |
Sarah |
Thomas |
Britton |
Wiley |
Sallie A |
Daniel |
Charlotte |
William |
Harriet |
Louise |
Karl |
Caroline |
William |
Harriet |
Louise |
Frieda |
Amos |
William |
Elmina |
Mamie |
George |
Bertie |
Lulie |
Lily |
Arthur |
George |
Jno |
Guy |
Harlie |
Annabelle |
Alfred |
Solomon |
Catherine |
J |
Mary |
Balzer |
Elisabetha |
Johannes |
Elizabeth |
George |
Euna |
Mary |
Melchir |
James |
Ana |
Jenniel |
A |
Francis |
Delphia |
Salem |
Daniel |
Martha |
Roy |
Elizabeth |
infant son |
John |
Mary |
William |
Charlotte |
Anna |
Leonard |
Etta Faye |
John |
Sena |
William |
Jewell |
Francis |
Arlie |
Viola |
Leonard |
Mae |
Bessie |
Caroline |
Arlie |
Eva |
Conrad |
Conrad |
Maria |
Trice |
Phebe |
Richard |
Martin |
Florence |
W |
Nancy |
J |
W |
Elizabeth |
Pleasant |
Victoria |
Ward |
Cynthia |
James |
James |
Nannie |
John |
Eleanor |
William |
James |
Rachel |
Pleasant |
Mary |
Parmelia |
Mary |
Elnor |
Frelin |
John |
Rose |
Ruth |
Turner |
Martha |
Joseph |
Caroline |
Dick |
Pearl |
James |
Mary |
Leticia |
Lucinda |
John |
Matha |
Jessie |
Mary |
Joseph |
Elisha |
Sallie |
Lutetita |
Thomas |
Sarah |
Elisha |
Martha |
Charles |
Zack |
Juritha |
Elisha |
Drury |
Mary |
Sandifer |
Nancy |
Luvena |
John |
Nettie |
Millie |
Lawrence |
Etta |
John |
James |
C |
Blanche |
L |
Ama |
Robert |
James |
Romey |
Anna |
James |
Francis |
Turner |
William |
George |
Nancy |
There is an unnamed child (infant son) and maybe a typo’d name (Jno). I’ll keep these cases in mind as I move forward.
Cleaning up Middle Names
The middle name column is very problematic. Sometimes it is the middle name. Sometimes it is a note like “shared family tombstone”. Sometimes it is the middle initial from the gravestone, but with the full name filled in with brackets like A[lvis]. Sometimes it is a maiden name or the name of the spouse.
Most of this stuff is similar to what I’ve done with other fields. I’ll strip out extra spaces, deal with the bracketed info, periods and the abbreviation ux (Latin for wife). I’m taking out everything after ux, since that should be the name of the wife.
I’m also changing all blanks to NAs and I will use this to partition my dataset when I am doing the photo matching. na_if()
from dplyr will let you replace any specific value with NA.
<- tombstones %>%
tombstones mutate(Extra_Middle_Name1 = str_extract(Middle.Name, "ux .+")) %>%
mutate(Middle.Name = str_replace(Middle.Name, "ux .+", "")) %>%
mutate(Extra_Middle_Name2 = str_extract(Middle.Name, "\\[.+\\]")) %>%
mutate(Middle.Name = str_replace(Middle.Name, "\\[.+\\]", "")) %>%
mutate(Middle.Name = str_replace(Middle.Name, "\\.", "")) %>%
mutate(Middle.Name = str_trim(Middle.Name, side = c("both"))) %>%
mutate(Middle.Name = na_if(Middle.Name, ""))
This fixes most of the issues, but not all. I’m going to see how the rest of this going and then come back and fix problem cases if needed. (Turns out this is fine.)
Now I’m making the full name with Middle Name/initial (e.g. Sinks Louise Elaine). His photo naming format is almost always last name first. Nothing too exotic here. I do use paste()
to glue together my pieces, using space as the separator.
<- tombstones %>%
tombstones mutate(full_name_MI = ifelse(
is.na(Middle.Name) == TRUE,
Middle.Name,paste(Surname, First.Name, Middle.Name, sep = " ")
%>%
)) mutate(full_name = paste(Surname, First.Name, sep = " "))
Now to deal with the case where first and middle names that are only letters, sometimes the photo name is Sinks LE rather than Sinks L E (which is created above) . So I need a second middle name column for these cases. (There are also cases with initials in the middle name itself. I don’t know if I need to clean that up. See Hess Ulysses S G.)
This may not be elegant, but I’m just slamming together first and middle name without spaces. So I’m getting names like Sinks LouiseElaine also.
<- tombstones %>%
tombstones mutate(full_name_MI_no_space = ifelse(
is.na(Middle.Name) == TRUE,
Middle.Name,paste(Surname, paste0(First.Name, Middle.Name), sep = " ")
))
And here I filter out first + middle name combo that is longer than three characters and setting the middle name without spaces column to NA. So LouiseElaine gets set to NA but LE remains. I chose 3 because I did see some folks with double initials in the middle name (e.g. Ulysses SG Hess) and I wasn’t sure if there were some fully initialed names that might be like that. I think in practice 2 is fine. This whole thing is a bit sloppy. The more rigorous way to do this is to check the patterns of the first and middle name and only created the combined name if they matched the pattern of being a single letter or two single letters separated by a space. Sometimes though, quick and dirty gets the job done.
<- tombstones %>%
tombstones mutate(full_name_MI_no_space = ifelse(
nchar(paste0(First.Name, Middle.Name)) > 3,
" ",
full_name_MI_no_space%>%
)) mutate(full_name_MI_no_space = na_if(full_name_MI_no_space, " "))
There is another part of this project that I’m working on separately, so I’m saving a copy of the cleaned dataset to be used in that module.
saveRDS(tombstones, "tombstones_cleaned.RDS")
Matching to Photos
As I mentioned above, I decided that it was more important to have completely correct matching, rather than more complete matching. The order of the matching matters too. I’m starting with the most complete names, matching them, and then moving them out of the unmatched photo folder. Since I’m doing partial matching, Sinks A will match to both Sinks A (correct) and Sinks A T (incorrect). So I need to match Sinks A T first, so it is not available for matching by the time I get to matching with Sinks A. This took a lot of iterations to get the correct logic and resulting flow. I also prototyped by just matching full names and not using any functions. I’m not showing that here, but I think it is often easier to do that, and then go back and convert certain chunks to functions.
I’m not showing all the iterations and mistakes in full here like I did with the Cemetery cleaning section, because this section is pretty slow to run. But don’t think I am super clever and came up with the logic/scheme first go. I did also try the methods that I said I wouldn’t do, just to see how many false matches I got and how many more (probably) correct matches I got.
Reset Photo Folders
Testing the matching is an iterative process. I also ended up going back and adding more cleaning steps based on what I was seeing during the matching. There are about 500 photos and they get sorted into various folders. I was manually resetting all the folders, but that got old quickly. I wrote a code chunk to reset the folders. I move everything into a folder called trash- please do this rather than actually deleting the files until you are sure your code works! Then I copy the originals into a starting folder (Unmatched Photos)
How this works is I generate a list of files in each folder using list.files()
and then I pass that list to the file.rename()
or file.copy()
functions. The file functions report TRUE for each file they correct find an act on and FALSE otherwise. The output should be all TRUEs.
(I thought I could make this a function, since I end up using it again below, but it doesn’t work. I didn’t return which is invalid and I ended up locking up Studio. Then I returned TRUE, but that seems to not work either. I ended up terminating R. It seems to hang when copying the files back in.)
# first moving all the photos to the trash folder, folder by folder
<-
my_file_list list.files(here(blog_folder, photo_folder, "Unmatched Photos"))
file.rename(
from = here(blog_folder, photo_folder, "Unmatched Photos", my_file_list),
to = here(blog_folder, photo_folder, "Trash", my_file_list)
)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[106] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[121] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[136] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[151] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[166] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[181] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[196] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
<-
my_file_list list.files(here(blog_folder, photo_folder, "Archived Photos"))
file.rename(
from = here(blog_folder, photo_folder, "Archived Photos", my_file_list),
to = here(blog_folder, photo_folder, "Trash", my_file_list)
)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[106] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[121] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[136] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[151] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
<-
my_file_list list.files(here(blog_folder, photo_folder, "Matched_Round1"))
file.rename(
from = here(blog_folder, photo_folder, "Matched_Round1", my_file_list),
to = here(blog_folder, photo_folder, "Trash", my_file_list)
)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE
<-
my_file_list list.files(here(blog_folder, photo_folder, "Matched_Round2"))
file.rename(
from = here(blog_folder, photo_folder, "Matched_Round2", my_file_list),
to = here(blog_folder, photo_folder, "Trash", my_file_list)
)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
<-
my_file_list list.files(here(blog_folder, photo_folder, "Matched_Round3"))
file.rename(
from = here(blog_folder, photo_folder, "Matched_Round3", my_file_list),
to = here(blog_folder, photo_folder, "Trash", my_file_list)
)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
<-
my_file_list list.files(here(blog_folder, photo_folder, "Matched_Round4"))
file.rename(
from = here(blog_folder, photo_folder, "Matched_Round4", my_file_list),
to = here(blog_folder, photo_folder, "Trash", my_file_list)
)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
<-
my_file_list list.files(here(blog_folder, photo_folder, "Map"))
file.rename(
from = here(blog_folder, photo_folder, "Map", my_file_list),
to = here(blog_folder, photo_folder, "Trash", my_file_list)
)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[106] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[121] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[136] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[151] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[166] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[181] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[196] TRUE TRUE TRUE TRUE
# now I move a copy of all the photos into the starting folder.
<-
my_file_list list.files(here(blog_folder, photo_folder, "Original Photos"))
file.copy(
from = here(blog_folder, photo_folder, "Original Photos", my_file_list),
to = here(blog_folder, photo_folder, unmatched_folder, my_file_list)
)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[106] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[121] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[136] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[151] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[166] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[181] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[196] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[211] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[226] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[241] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[256] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[271] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[286] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[301] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[316] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[331] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[346] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[361] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[376] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[391] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[406] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[421] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[436] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[451] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[466] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[481] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[496] TRUE TRUE TRUE TRUE
Split out Middle Names and No Middle Names
Okay, now I am creating the two main data sets- with and without middle name entries.
<- tombstones %>%
unmatched_middle filter(is.na(Surname) == FALSE) %>%
filter(is.na(Middle.Name) == FALSE)
<- tombstones %>%
unmatched_no_middle filter(is.na(Surname) == FALSE) %>%
filter(is.na(Middle.Name) == TRUE)
Now a function to do the matching.
Matching_Photos()
Function
This function is a beast and should probably be broken into smaller functions. When you have to comment all the different steps in a function, then it is likely doing too many things! As I mentioned, I originally just tested this in non-function form, and I just took the whole chunk and put it in a function. I might clean this up later, but since this is a one off project, it may not be worth the time.
There are a few things to note about this function. First, for debugging inside a quarto file, use browser()
inside the function to allow you to step through and observe the variables in the function. Not all of the regular debugging tools work in quarto (compared to a regular R file).
Secondly, accessing the value of the parameters inside the function required using functions like get()
. Filter on name_to_match and nothing matches, because nothing has that name. Using get(name_to_match)
actual returns the column name to match on.
Oh, and I use loops for everything! If I were to clean this up, I’d definitely get rid of loops using purrr or something.
So what does this function do?
It takes in a dataframe (df_to_match) and what column of names it should be matching against (name_to_match). It also takes in which match round this is.
- Generates the list of unmatched photos.
- Finds duplicated names.
- and filters them out so that matching only occurs on the unique names in the set.
- Does the matching with a fuzzyjoin. More about fuzzyjoin here and here.
- Move all the photos that matched to the appropriate folder for the round.
- Remove all the unmatched names from the dataframe.
- Generate a list of people who have multiple photos associated with their name.
- Make a panel/composite photo from the individual photos for each person with multiples and move the original to the folder archive. The panel photos are made using the magick package. I got started using this solution from stack overflow.
- Return the dataframe with the matches. This is useful for troubleshooting, but it is not the final results dataframe because it has the original (multiple) photos listed and not the new panel photos. Functions in R must return something, and I did use this returned dataframe extensively when troubleshooting, but in other cases, I’d probably silence the output or just return TRUE/FALSE (with some error checking) to reflect if the function executed properly. It could also return the duplicate names df which could be sent to my father to manually match photos to those names. Or it could call
Update_Photo_Names()
as the last part, which would then letMatching_Photos()
return the dataframe with the correct file names.
<-
Matching_Photos function(df_to_match, name_to_match, match_folder) {
# browser()
#Step 1: list of unmatched photos
= list.files(here(blog_folder, photo_folder, unmatched_folder))
photo_names = as.data.frame(photo_names)
photo_df #Step 2: generate duplicate names
<- df_to_match %>%
duplicate_names group_by_at(name_to_match) %>% count(sort = TRUE) %>% filter(n > 1)
# return(duplicate_names) #this was for troubleshooting
#Step 3: remove duplicate names
<- df_to_match %>%
tombstones_unique_names anti_join(duplicate_names)
#step 4: do the matching
<-
tombstones_merged fuzzy_right_join(
photo_df,
tombstones_unique_names,by = c("photo_names" = name_to_match),
match_fun = str_detect
)#step 5: Moving all the photos that match to the correct match folder
<- inner_join(photo_df, tombstones_merged)
matched_this_round <- 1
index for (index in seq(1:nrow(matched_this_round))) {
file.rename(
from = here(
blog_folder,
photo_folder,
unmatched_folder,$photo_names[index]
matched_this_round
),to = here(
blog_folder,
photo_folder,
match_folder,$photo_names[index]
matched_this_round
)
)
}#step 6: Remove any unmatched names
<- tombstones_merged %>%
tombstones_merged drop_na(photo_names)
#step 7: generate the list of folks with multiple photos
<-
multiple_photos %>% group_by_at(name_to_match) %>% count(sort = TRUE) %>% filter(n > 1)
tombstones_merged #step 8: make the panel photos and move the originals to archive. (function?)
if (nrow(multiple_photos) > 0) {
<- 1
index for (index in seq(1:nrow(multiple_photos))) {
<- tombstones_merged %>%
df_temp filter(get(name_to_match) == multiple_photos[[name_to_match]][index])
<-
these as.list(here(
blog_folder,
photo_folder,
match_folder,$photo_names
df_temp
))<-
photo_panel image_append(do.call("c", lapply(these, image_read)))
image_write(
photo_panel,path = here(
blog_folder,
photo_folder,
match_folder,paste0(df_temp[[name_to_match]][1], "_panel.png")
#paste0(df_temp$full_name_MI[1], "_panel.png")
),format = "jpeg"
)<- 1
index2 for (index2 in seq(1:nrow(df_temp))) {
file.rename(
from = here(
blog_folder,
photo_folder,
match_folder,$photo_names[index2]
df_temp
),to = here(
blog_folder,
photo_folder,
archive_folder,$photo_names[index2]
df_temp
)
)
}
}
}#Step 9: return
return(tombstones_merged)
}
A False Start
Does the function work?
<- Matching_Photos(unmatched_middle, "full_name_MI", match1) tester
Joining with `by = join_by(full_name_MI)`
Joining with `by = join_by(photo_names)`
Warning in file.rename(from = here(blog_folder, photo_folder, unmatched_folder,
: cannot rename file 'C:/Users/drsin/OneDrive/Documents/R
Projects/lsinks.github.io/posts/2023-08-04-data-cleaning-tombstone/Photos/Unmatched
Photos/Jones Levi A Jones Hester J.JPG' to 'C:/Users/drsin/OneDrive/Documents/R
Projects/lsinks.github.io/posts/2023-08-04-data-cleaning-tombstone/Photos/Matched_Round1/Jones
Levi A Jones Hester J.JPG', reason 'The system cannot find the file specified'
Warning in file.rename(from = here(blog_folder, photo_folder, unmatched_folder,
: cannot rename file 'C:/Users/drsin/OneDrive/Documents/R
Projects/lsinks.github.io/posts/2023-08-04-data-cleaning-tombstone/Photos/Unmatched
Photos/Pickard William Epps.JPG' to 'C:/Users/drsin/OneDrive/Documents/R
Projects/lsinks.github.io/posts/2023-08-04-data-cleaning-tombstone/Photos/Matched_Round1/Pickard
William Epps.JPG', reason 'The system cannot find the file specified'
Yes, it works! However, there are a couple of warnings kicked up. The warnings are generated when the file to be moved doesn’t exist. This means that it has matched to another name and has already been moved. So these aren’t a problem with the function, but rather a problem with my logic/ partitioning of data into the different rounds.
The first message is about “Jones Levi A Jones Hester J.JPG”. The tombstone is for both the husband and wife and it should match to two names- Levi A Jones and Hester J Jones. So this is okay and not an error. The second one error arises from “Pickard William Epps.jpg”. This is a problem. There is apparently both a William E Pickard and a William Epps Pickard. The strings that are being matched against the photos are Pickard William E and Pickard William Epps. The first will match to the second when doing a partial match. Notice that if the naming convention were First Middle Last and not Last First Middle there would not be a problem. William E Pickard and William Epps Pickard will not match with each other in my method. This also means I don’t need to worry about a similar scenario for the first name, only the middle name/initial because it is at the end of the string.
Reset Photos Again
I’m suppressing the output, so you won’t see pages of TRUES.
# first moving all the photos to the trash folder, folder by folder
<-
my_file_list list.files(here(blog_folder, photo_folder, "Unmatched Photos"))
file.rename(
from = here(blog_folder, photo_folder, "Unmatched Photos", my_file_list),
to = here(blog_folder, photo_folder, "Trash", my_file_list)
)<-
my_file_list list.files(here(blog_folder, photo_folder, "Archived Photos"))
file.rename(
from = here(blog_folder, photo_folder, "Archived Photos", my_file_list),
to = here(blog_folder, photo_folder, "Trash", my_file_list)
)<-
my_file_list list.files(here(blog_folder, photo_folder, "Matched_Round1"))
file.rename(
from = here(blog_folder, photo_folder, "Matched_Round1", my_file_list),
to = here(blog_folder, photo_folder, "Trash", my_file_list)
)<-
my_file_list list.files(here(blog_folder, photo_folder, "Matched_Round2"))
file.rename(
from = here(blog_folder, photo_folder, "Matched_Round2", my_file_list),
to = here(blog_folder, photo_folder, "Trash", my_file_list)
)<-
my_file_list list.files(here(blog_folder, photo_folder, "Matched_Round3"))
file.rename(
from = here(blog_folder, photo_folder, "Matched_Round3", my_file_list),
to = here(blog_folder, photo_folder, "Trash", my_file_list)
)<-
my_file_list list.files(here(blog_folder, photo_folder, "Matched_Round4"))
file.rename(
from = here(blog_folder, photo_folder, "Matched_Round4", my_file_list),
to = here(blog_folder, photo_folder, "Trash", my_file_list)
)
<-
my_file_list list.files(here(blog_folder, photo_folder, "Map"))
file.rename(
from = here(blog_folder, photo_folder, "Map", my_file_list),
to = here(blog_folder, photo_folder, "Trash", my_file_list)
)
# now I move a copy of all the photos into the starting folder.
<-
my_file_list list.files(here(blog_folder, photo_folder, "Original Photos"))
file.copy(
from = here(blog_folder, photo_folder, "Original Photos", my_file_list),
to = here(blog_folder, photo_folder, unmatched_folder, my_file_list)
)
The Full Scheme: A Flowchart
So to fix this, I can split this into two new middle name dataframes. One with middle names longer than one character (or maybe 2) and those with initials for middle names. Again, the more complete version needs to be matched first.
I made a nice flowchart to explain. (I worked off of handwritten charts while I was coding. This was too complex not to have a reference.) Dark blue squares are the various dataframes as they are filtered and split before matching. Light blue squares are the results of the various rounds of matching. The magenta boxes are the matching protocols for each round. At the end, all the results_r# dataframes will be merged.
Round 1: Long Middle Names
Splitting : Long and Initials Middle Names
<- unmatched_middle %>%
middle_names_long filter(nchar(Middle.Name) > 2)
<- unmatched_middle %>%
middle_names_short filter(nchar(Middle.Name) <= 2)
Matching Round 1: Full Middle Names
<- Matching_Photos(middle_names_long, "full_name_MI", match1) tester
Joining with `by = join_by(full_name_MI)`
Joining with `by = join_by(photo_names)`
No errors or warnings. Now I’m generating the new list of photos. This will have the panel photos instead of the individual photos. I’m matching back to the original input dataframe to the match function, not the output (tester). The output of the renaming function is our results dataframe for this round.
Update_Photo_Names()
: Rename and Moves Photos Function
This function updates the photo list to include the panel photos instead of the original files. It also moves all the photos from this round into the proper folder for the round.
<- function(df, name_to_match, match_folder) {
Update_Photo_Names <-
photo_names list.files(here(blog_folder, photo_folder, match_folder))
= as.data.frame(photo_names)
photo_df <- fuzzy_right_join(photo_df,
df_updated
df,by = c("photo_names" = name_to_match),
match_fun = str_detect)
#new_name <- paste("photos", match_folder, sep = "_") this was for debugging
<- df_updated %>%
df_updated rename_with(.fn = ~ paste("photos", match_folder, sep = "_"),
.cols = photo_names)
return(df_updated)
}
Rename Round 1: Generate first results DF
<- Update_Photo_Names(middle_names_long, "full_name_MI", match1) results_r1
And now we just go through this again for each round.
Round 2: Short Middle Names
<- Matching_Photos(middle_names_short, "full_name_MI", match2) tester2
Joining with `by = join_by(full_name_MI)`
Joining with `by = join_by(photo_names)`
Warning in file.rename(from = here(blog_folder, photo_folder, unmatched_folder,
: cannot rename file 'C:/Users/drsin/OneDrive/Documents/R
Projects/lsinks.github.io/posts/2023-08-04-data-cleaning-tombstone/Photos/Unmatched
Photos/Jones Levi A Jones Hester J.JPG' to 'C:/Users/drsin/OneDrive/Documents/R
Projects/lsinks.github.io/posts/2023-08-04-data-cleaning-tombstone/Photos/Matched_Round2/Jones
Levi A Jones Hester J.JPG', reason 'The system cannot find the file specified'
Only get the Hester and Levi Jones warning, which is fine.
<- Update_Photo_Names(middle_names_short, "full_name_MI", match2) results_r2
Round 3: Middle Names with No Space
Now I need to pull out the names that are only initials and match the no space version. These should only be contained in the results_r2
dataframe (since the other dataframe (r1) only has long middle names anyway.)
<- results_r2 %>%
middle_name_initials mutate(full_name_MI_no_space = na_if(full_name_MI_no_space, " ")) %>%
filter(is.na(full_name_MI_no_space) == FALSE) %>%
filter(is.na(photos_Matched_Round2) == TRUE)
Okay, 17 didn’t match.
<- Matching_Photos(middle_name_initials, "full_name_MI_no_space", match3) tester
Joining with `by = join_by(full_name_MI_no_space)`
Joining with `by = join_by(photo_names)`
No warnings or errors.
<-
results_r3 Update_Photo_Names(middle_name_initials, "full_name_MI_no_space", match3)
Round 4: No Middle Initials
Almost done. Now I’m matching the folks with no middle names. This group has more panel pictures than the others and it runs much slower the other match rounds.
<- Matching_Photos(unmatched_no_middle, "full_name", match4) tester3
Joining with `by = join_by(full_name)`
Joining with `by = join_by(photo_names)`
Now we generate the new names.
<- Update_Photo_Names(unmatched_no_middle, "full_name", match4) results_r4
Join all the Results
Now I full_join all the results dataframes together.
<- results_r1 %>%
tombstones_matched full_join(results_r2) %>%
full_join(results_r3) %>%
full_join(results_r4)
Now generate the final photo list. Because of the way I segmented the data and how careful I was about order of matching, there should be only one match per name. And this is true.
However, by also being careful about what order I create the final photo list, I can put in a safeguard against incorrect matching for future work. The file name should always be taking from the earliest match round if for some reason it matches in multiple rounds. This might be cleaner written as a case statement, but I wrote this section when I only had two rounds of matching and the nested ifelse was very clear.
<- tombstones_matched %>%
tombstones_matched_final mutate(photo_list = ifelse(
is.na(photos_Matched_Round1) == TRUE,
ifelse(
is.na(photos_Matched_Round2) == TRUE,
ifelse(
is.na(photos_Matched_Round3) == TRUE,
photos_Matched_Round4,
photos_Matched_Round3
),
photos_Matched_Round2
)
,
photos_Matched_Round1 ))
Now I just double check that the photo_list was generated properly.
<- tombstones_matched_final %>%
tombstones_matched_checker select(photo_list, contains("photos_Matched"))
Cleaning Up in Prep for Mapping
Move to all Photos to the Map Folder
Lastly, I move all the matched photos to a single folder. I’m putting it in a folder called “Map” since I’m using this to make my leaflet map. Again, I’ve suppressed the output for this block, but when troubleshooting you do want to make sure you are getting all trues.
<-
my_file_list list.files(here(blog_folder, photo_folder, match1))
file.copy(
from = here(blog_folder, photo_folder, match1, my_file_list),
to = here(blog_folder, photo_folder, "Map", my_file_list)
)
<-
my_file_list list.files(here(blog_folder, photo_folder, match2))
file.copy(
from = here(blog_folder, photo_folder, match2, my_file_list),
to = here(blog_folder, photo_folder, "Map", my_file_list)
)
<-
my_file_list list.files(here(blog_folder, photo_folder, match3))
file.copy(
from = here(blog_folder, photo_folder, match3, my_file_list),
to = here(blog_folder, photo_folder, "Map", my_file_list)
)
<-
my_file_list list.files(here(blog_folder, photo_folder, match4))
file.copy(
from = here(blog_folder, photo_folder, match4, my_file_list),
to = here(blog_folder, photo_folder, "Map", my_file_list)
)
Making Labels for the Map
Making a Complete Name Field
Now I need to make a column of the most complete name. This will be used as the label in the map. If the person has a middle name entry, I use the full name with middle name, and if not, I use the first and last. Note that I’m generating this by pasting together the individual name parts rather than using all the name variations I generated to match with. I want this label to be in conventional order and not the last name first format that I used for matching.
<- tombstones_matched_final %>%
tombstones_matched_final mutate(complete_name = ifelse(
is.na(full_name_MI) == FALSE,
paste(First.Name, Middle.Name, Surname, sep = " "),
paste(First.Name, Surname, sep = " "))
)
Making a Cemetery Label
<- tombstones_matched_final %>%
tombstones_matched_final mutate(cemetery_name = ifelse(Cemetery == "", "", paste(Cemetery, "Cemetery", sep = " "))
)
Saving the file for use in another part of this project.
saveRDS(tombstones_matched_final, "tombstones_matched_final.RDS")
Converting to Geo Data
First, I’m dropping any NAs in lat/long. I convert it to a SF geo object. I talk about that process in a TidyTuesday post, where I also generate a map.
I’m adding some jitter to the coordinates with sf_jitter()
; tombstones that are very close to each other have the same coordinates and show up as overlapping on the map. I’m not really happy with how far apart they are jittered, but I’ll play with that in the leaflet tutorial.
<- tombstones_matched_final %>% drop_na(lat) %>% drop_na(long)
tombstones_matched_final
<- st_as_sf(tombstones_matched_final, coords = c("long", "lat"), crs = 4326)
tombstones_geo
<- tombstones_geo %>% drop_na(photo_list)
tombstones_geo
<- st_jitter(tombstones_geo, factor = 0.0001) tombstones_geo_jittered
Making the Leaflet Map
This is just a draft. I’d like to link to some other material and color and group by cemetery. I’ll post a detailed tutorial on that later. I do have a simple walk through of making an interactive leaflet map here. I found datacamp’s course on leaflet very good also.
The draft map is another opportunity to check for obvious errors or bugs. I’m commenting out the section that generates the photo pop-up window. This creates a file larger than github prefers. I’ll have a link to the final map with photos in the leaflet tutorial.
<- tombstones_geo_jittered$photo_list
image_list
leaflet() %>%
addTiles() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addScaleBar() %>%
addCircleMarkers(
data = tombstones_geo,
label = ~ (paste(complete_name, cemetery_name, sep = " ")),
clusterOptions = markerClusterOptions(),
opacity = 1,
radius = 10,
color = "blue",
stroke = NA,
group = "group1"
# %>% )
# leafpop::addPopupImages(
# image = paste0(here(blog_folder, photo_folder, "Map" ),"/", image_list),
# src = local,
# group = "group1", width = 400,
# maxHeight = 300, maxWidth = 400
# )
Why is blockhouse cemetery in the ocean?
%>% filter(Cemetery == "Blockhouse") %>%
tombstones_geo_jittered select(Cemetery, City, State, N, W, geometry)
Simple feature collection with 1 feature and 5 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -37.48555 ymin: 44.75355 xmax: -37.48555 ymax: 44.75355
Geodetic CRS: WGS 84
Cemetery City State N W
1 Blockhouse Peru, Clinton Co. NY 44 34.662 37 27.129
geometry
1 POINT (-37.48555 44.75355)
A quick Google reveals that “The latitude and longitude coordinates (GPS waypoint) of Peru are 44.578379 (North), -73.5268028 (West) and the approximate elevation is 335 feet (102 meters)”.
So, this is a simple transposition typo- instead of 73 he typed 37. I could have put some sort of error handling in the cleaning GPS data section. For example, I could have flagged lats or longs that were outside the US.
I’m not entirely happy with the sf_jitter parameters, so I’m correcting this in two places- the jittered dataframe I use for the mapping here, and the unjittered version as well. I will be optimizing on the unjittered version when I create the final map. Making the same correction twice isn’t great, but I wanted to produce a final map here independent of making the pretty leaflet map.
st_geometry(tombstones_geo_jittered[tombstones_geo$Cemetery == "Blockhouse", ]) <- st_sfc(st_point(c(-73.48583,44.75056)))
%>% filter(Cemetery == "Blockhouse") %>%
tombstones_geo_jittered select(Cemetery, N, W, geometry)
Simple feature collection with 1 feature and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -73.48583 ymin: 44.75056 xmax: -73.48583 ymax: 44.75056
Geodetic CRS: WGS 84
Cemetery N W geometry
1 Blockhouse 44 34.662 37 27.129 POINT (-73.48583 44.75056)
st_geometry(tombstones_geo[tombstones_geo$Cemetery == "Blockhouse", ]) <- st_sfc(st_point(c(-73.48583,44.75056)))
%>% filter(Cemetery == "Blockhouse") %>%
tombstones_geo select(Cemetery, N, W, geometry)
Simple feature collection with 1 feature and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -73.48583 ymin: 44.75056 xmax: -73.48583 ymax: 44.75056
Geodetic CRS: WGS 84
Cemetery N W geometry
1 Blockhouse 44 34.662 37 27.129 POINT (-73.48583 44.75056)
Now re-check the map.
leaflet() %>%
addTiles() %>%
addScaleBar() %>%
addCircleMarkers(
data = tombstones_geo,
label = ~ (paste(complete_name, "-", Cemetery, "Cemetery", sep = " ")),
clusterOptions = markerClusterOptions(),
opacity = 1,
radius = 10,
color = "blue",
stroke = NA,
group = "group1"
# %>% )
# leafpop::addPopupImages(
# image = paste0(here(blog_folder, photo_folder, "Map" ),"/", image_list),
# src = local,
# group = "group1", width = 400,
# maxHeight = 300, maxWidth = 600
# )
Blockhouse is now correctly located in upstate New York.
I assume I can save this file as a RDS file for use in the leaflet mapper module, but if not, I’ll let you know.
saveRDS(tombstones_geo, "tombstones_geo.RDS")
Conclusions
Despite fairly intensive efforts with data cleaning, only 194 names and photos were included in the final map. A handful of matched names and photos didn’t make it on the map because there was a problem with the geographic data and got dropped for not having latitude or longitude.
Hand correction of the files names would be needed to increase the matches in the current dataset. Moving forward, photos should include an additional identifier, perhaps birth or death year to disambiguate people with identical names.
Citation
@online{sinks2023,
author = {Sinks, Louise E.},
title = {Data {Cleaning} for the {Tombstone} {Project}},
date = {2023-08-04},
url = {https://lsinks.github.io/posts/2023-08-04-data-cleaning-tombstone/tombstone_data_cleaning.html},
langid = {en}
}