-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternate filename formatting for split
#1365
Comments
johnkerl
changed the title
alternate filename formatting for
Alternate filename formatting for Aug 21, 2023
split
split
@sloanlance I'd love to look at a PR! :) |
sloanlance
pushed a commit
to sloanlance/miller
that referenced
this issue
Aug 21, 2023
sloanlance
added a commit
to sloanlance/miller
that referenced
this issue
Aug 21, 2023
* Don't use joiner string when prefix is empty. * Add option to specify joiner string. * Add option to not URL-escape file names.
sloanlance
added a commit
to sloanlance/miller
that referenced
this issue
Aug 21, 2023
sloanlance
added a commit
to sloanlance/miller
that referenced
this issue
Aug 22, 2023
I **_thought_** it'd be cool to apply URL-escaping to the file name prefix as well, just in case it included spaces or other characters. I forgot that a common use for the prefix is to specify a directory path that will contain the file. When the slashes ("`/`") of the path are URL-escaped, they become "`%2F`" and the directories will not be created. So, I moved the prefix handling code to come after the URL-escaping.
sloanlance
added a commit
to sloanlance/miller
that referenced
this issue
Aug 22, 2023
sloanlance
added a commit
to sloanlance/miller
that referenced
this issue
Aug 22, 2023
Trying to make the `return` statement cleaner, I thought it'd be good to add the file name suffix immediately after the file name is URL-escaped. I'd forgotten that the suffix will not be added if the new `-e` option is used to skip URL-escaping. So, I put the suffix back where I had it.
sloanlance
added a commit
to sloanlance/miller
that referenced
this issue
Aug 22, 2023
Not strictly part of this issue, but as I was checking for docs that I should update as a result of my changes, I noticed this document showed how to split data using the `put` and `tee` combination, but not about the `split` verb.
sloanlance
added a commit
to sloanlance/miller
that referenced
this issue
Aug 22, 2023
When I ran `make dev`, generating `data-diving-examples.md` failed. The two `manpage.txt` files ended up empty, but `mlr.1` seems to be correct.
johnkerl
pushed a commit
that referenced
this issue
Aug 23, 2023
* #1365 - filename options for `split` * Don't use joiner string when prefix is empty. * Add option to specify joiner string. * Add option to not URL-escape file names. * #1365 - update documentation * #1365 - don't URL-escape file name prefix I **_thought_** it'd be cool to apply URL-escaping to the file name prefix as well, just in case it included spaces or other characters. I forgot that a common use for the prefix is to specify a directory path that will contain the file. When the slashes ("`/`") of the path are URL-escaped, they become "`%2F`" and the directories will not be created. So, I moved the prefix handling code to come after the URL-escaping. * #1365 - new `split` options for CLI help output * #1365 - fix escape/suffix logic error Trying to make the `return` statement cleaner, I thought it'd be good to add the file name suffix immediately after the file name is URL-escaped. I'd forgotten that the suffix will not be added if the new `-e` option is used to skip URL-escaping. So, I put the suffix back where I had it. * #1365 - add `split` to the "10 minutes" document Not strictly part of this issue, but as I was checking for docs that I should update as a result of my changes, I noticed this document showed how to split data using the `put` and `tee` combination, but not about the `split` verb. * #1365 - updated manpage When I ran `make dev`, generating `data-diving-examples.md` failed. The two `manpage.txt` files ended up empty, but `mlr.1` seems to be correct. --------- Co-authored-by: Mr. Lance E Sloan (sloanlance) <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I recently enjoyed using
split
to break a large TSV file into several smaller ones using the-g
option to group by field values. Using basic shell tools took a few steps to do this, butmlr
could do it all in one. However, whenmlr
was done, I still had to write a loop to rename all the output files. It is…What I'd like to see
split
do better is…--prefix ''
is specified, DO NOT start the output filenames with underscore. I.e., do not name the files_col+a+value_col+b+value.tsv
.+
signs.To address those, I would change how the prefix is handled (item 1) and add an option to not change spaces to other characters (items 2 and 3).
I'm not very experienced with go, but I'm keen to learn, so I would be happy to work on a solution for this and open a PR. If this is deemed to be a legitimate issue, that is.
The text was updated successfully, but these errors were encountered: