Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rm_dirs_from_zip_paths fn #25

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions R/import_gtfs.R
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ import_gtfs <- function(path,
# from the text files to reference them without it in messages and errors

files_in_gtfs <- zip::zip_list(path)$filename
files_in_gtfs <- rm_dirs_from_zip_paths (files_in_gtfs)

non_text_files <- files_in_gtfs[!grepl("\\.txt$", files_in_gtfs)]

Expand Down Expand Up @@ -213,6 +214,22 @@ import_gtfs <- function(path,

}

# GTFS zip archives may embed files in an internal directory, in which case
# `zip::file_list()` returns the directory name as one "file". This function
# remotes such instances, but leaves intact any non-".txt" files for appropriate
# issuing of messages.
rm_dirs_from_zip_paths <- function (files_in_gtfs) {

# https://github.com/r-lib/fs/blob/4cc4b56c26b9d7f177a676fbb331133bb2584b86/R/path.R # nolint
f <- strsplit(files_in_gtfs, "^(?=/)(?!//)|(?<!^)(?<!^/)/", perl = TRUE)
lens <- vapply(f, length, integer(1))
if (length(unique(lens)) > 1L) {
files_in_gtfs <- files_in_gtfs[which(lens == median(lens))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mpadge, thanks for the PR. It looks fine to me, although I'm a bit worried about licensing, since the regex match is copied from {fs}. I don't really have much experience with licensing, so I'm not sure how to proceed here. Should we just explicitly give attribution to this line of code to {fs} authors?

Two other things that raised my attention:

  • Perhaps the subsetting condition should be changed to which(lens == max(lens))? You mentioned that this solution could handle feeds nested at arbitrary depth, but it seems to me that it would fail if it was nested inside many directories, which could change the value of the median length. If we use max() here we make sure that we're using the "flattest" files.
  • I've also come across some weird "GTFS" files that were actually a zipped file containing many directories, each one of them containing a GTFS (think of a gtfs.zip file that contained two directories, dir1 and dir2, and each one contained a list of gtfs tables). In such case, we would potentially have tables with the same name inside the final GTFS object, if, for example, both of these directories contained an agency table. I know this case is an exception, but perhaps we should handle it in this function? Something along the lines of, if two or more of the splitted path have the same length, and this length is not the max length (i.e. not the "flattest" files), we throw an error saying that two or more directories have been found inside the zip?

Happy to hear your thoughts, and thanks again for submitting this PR. If you agree with the changes I propose but don't have the time to make them please let me know and I'll accept the PR and make them myself.

Copy link
Collaborator

@dhersz dhersz Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: I literally wrote this review on the date you sent the PR but didn't realize I had not "finished" the review... 🤦

Sorry about this.

(there was a huge "Pending" label right next to this comment, but I thought it meant there were pending changes, not pending submission lol)

}

return(files_in_gtfs)
}



#' Read a GTFS text file
Expand Down