Skip to content

Commit

Permalink
Fix merge_surnames.R and documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
kkprinceton committed May 4, 2017
1 parent 0cf803e commit af04518
Show file tree
Hide file tree
Showing 4 changed files with 22 additions and 16 deletions.
1 change: 1 addition & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ Date Version Comment
2016-12-13 0.1-1 New function to pre-download Census data and other minor improvements
2017-03-03 0.1-2 Updated surname handling, enhanced demographics option, and improved error handling and documentation
2017-04-10 0.1-3 Allows Census data download at level user prefers (block, tract, or county)
2017-05-03 0.1-4 Fixed error in merge_surnames.R and updated relevant documentation
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: wru
Version: 0.1-3
Date: 2017-4-10
Version: 0.1-4
Date: 2017-5-3
Title: Who are You? Bayesian Prediction of Racial Category Using Surname and
Geolocation
Author: Kabir Khanna [aut, cre], Kosuke Imai [aut, cre], Hubert Jin [ctb]
Expand Down
19 changes: 12 additions & 7 deletions R/merge_surnames.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@
#' Census Surname List (from 2000 or 2010) and Spanish Surname List to obtain
#' Pr(Race | Surname) for each of the five major racial groups.
#'
#' By default, the function matches surnames to the Census list as follows
#' (each step only applies to surnames not matched in previous steps):
#' By default, the function matches surnames to the Census list as follows:
#' 1) Search raw surnames in Census surname list;
#' 2) Remove any punctuation and search again;
#' 3) Remove any spaces and search again;
Expand All @@ -18,6 +17,9 @@
#' 7) For any remaining names, impute probabilities using distribution
#' for all names not appearing on Census list.
#'
#' Each step only applies to surnames not matched in a previous ste.
#' Steps 2 through 7 are not applied if \code{clean.surname} is FALSE.
#'
#' Note: Any name appearing only on the Spanish Surname List is assigned a
#' probability of 1 for Hispanics/Latinos and 0 for all other racial groups.
#'
Expand All @@ -27,11 +29,9 @@
#' Census Surname List is from. Accepted values are \code{2010} and \code{2000}.
#' Default is \code{2010}.
#' @param clean.surname A \code{TRUE}/\code{FALSE} object. If \code{TRUE},
#' \code{clean.surname} function will be run to clean raw surnames in
#' \code{\var{voter.file}} before matching them with Census lists,
#' in order to increase the chance of finding a match.
#' See \code{clean.surname} documentation for details.
#' Default is \code{TRUE}.
#' any surnames in \code{\var{voter.file}} that cannot initially be matched
#' to surname lists will be cleaned, according to U.S. Census specifications,
#' in order to increase the chance of finding a match. Default is \code{TRUE}.
#' @param impute.missing A \code{TRUE}/\code{FALSE} object. If \code{TRUE},
#' race/ethnicity probabilities will be imputed for unmatched names using
#' race/ethnicity distribution for all other names (i.e., not on Census List).
Expand Down Expand Up @@ -75,6 +75,11 @@ merge_surnames <- function(voter.file, surname.year = 2010, clean.surname = T, i

## Merge Surnames with Census List (No Cleaning Yet)
df <- merge(df[names(df) %in% p_eth == F], surnames[c("surname", p_eth)], by.x = "surname.match", by.y = "surname", all.x = TRUE)

if (nrow(df[df$surname.upper %in% surnames$surname == F, ]) == 0) {
return(df[order(df$caseid), c(names(voter.file), "surname.match", p_eth)])
}

df[df$surname.upper %in% surnames$surname == F, ]$surname.match <- ""

df1 <- df[df$surname.upper %in% surnames$surname, ] #Matched surnames
Expand Down
14 changes: 7 additions & 7 deletions man/merge_surnames.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit af04518

Please sign in to comment.