Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filename options for split (iss. #1365) #1366

Merged
merged 7 commits into from
Aug 23, 2023

Conversation

sloanlance
Copy link
Contributor

@sloanlance sloanlance commented Aug 21, 2023

  • Don't use joiner string when prefix is empty.
  • Add -j option to specify joiner string.
  • Add -e option to not URL-escape file names.

Resolves #1365.

* Don't use joiner string when prefix is empty.
* Add option to specify joiner string.
* Add option to not URL-escape file names.
@sloanlance
Copy link
Contributor Author

@johnkerl, I will review the development and contribution documents to make sure I've met all the requirements for contributing to the project. If you want to point out anything I should focus on, please do. Early comments on my code are welcome, too.

I like code reviews. I appreciate constructive criticism. FYI: This is my first time coding with Go.

@johnkerl
Copy link
Owner

johnkerl commented Aug 21, 2023

make sure I've met all the requirements for contributing to the project

All good, and thanks for asking! It's pretty chill TBH. If anything's wrong or missing we can chat about it.

Developer notes (such as they are) are here: https://github.com/johnkerl/miller/blob/main/README-dev.md

But that's just an attempt to be helpful & describe some of the issues/prereqs.

Over the years Miller has benefited from a huge number of people writing in with ideas & feature requests; also from many people contributing cloud-infra things like automatically building multi-platform binaries on a release, etc etc -- but for whatever reason, there hasn't been very much contribution on development work. So this is lightly trodden ground. :)

My foremost ask is, tell me if https://github.com/johnkerl/miller/blob/main/README-dev.md is a genuinely helpful getting-started guide or not, & if anything is missing/unclear/etc.

FYI: This is my first time coding with Go.

I hope you like it! I do, quite a bit, & I found it pretty intuitive to use given my background. Let me know if you've any questions though; I'm always happy to chat about such things.

I like code reviews. I appreciate constructive criticism.

Awesome! https://github.com/johnkerl/miller/pull/1366/files looks good so far 😎

@johnkerl
Copy link
Owner

Oops, in addition to https://github.com/johnkerl/miller/blob/main/README-dev.md:

https://miller.readthedocs.io/en/latest/contributing/

@sloanlance
Copy link
Contributor Author

I thought I should update the docs wherever I could. It's difficult to be sure whether I found all the references. Besides this split verb, there are several split functions and splitting examples using put and tee. Did I miss anything?

Do you have any thoughts about putting docs inside/beside the source code so there's one set of documentation?

I **_thought_** it'd be cool to apply URL-escaping to the file name prefix as well, just in case it included spaces or other characters.  I forgot that a common use for the prefix is to specify a directory path that will contain the file.  When the slashes ("`/`") of the path are URL-escaped, they become "`%2F`" and the directories will not be created.  So, I moved the prefix handling code to come after the URL-escaping.
Trying to make the `return` statement cleaner, I thought it'd be good to add the file name suffix immediately after the file name is URL-escaped.  I'd forgotten that the suffix will not be added if the new `-e` option is used to skip URL-escaping.  So, I put the suffix back where I had it.
@sloanlance
Copy link
Contributor Author

My foremost ask is, tell me if main/README-dev.md is a genuinely helpful getting-started guide or not, & if anything is missing/unclear/etc.

Feedback on dev. documents…

  • test/README.md
    • I see it's rough, so I tried some of the commands listed there. The problems I saw were that the -c and -g options for regtest do not work/exist.
  • README-dev.md
    • The very first bullet point was useful. make check was exactly what I needed, the results of which led me to test/README.md.
    • The sections about porting from C to Go are somewhat interesting. I was left wondering how long ago did that happen? Was it recent? Maybe the date of the port should be added to the document. If it's far enough in the past, maybe the sections about porting and justifying the choice of Go should be moved to a different document.
    • The order of the sections is a little confusing. The Miller per se section seems general and should be closer to the top of the document.
  • docs/src/contributing.md
    • Helpful. A little repetitive, given that I'd already found much of that information in the other two docs listed above. Maybe this file should be combined with the previous one.

Not strictly part of this issue, but as I was checking for docs that I should update as a result of my changes, I noticed this document showed how to split data using the `put` and `tee` combination, but not about the `split` verb.
When I ran `make dev`, generating `data-diving-examples.md` failed.  The two `manpage.txt` files ended up empty, but `mlr.1` seems to be correct.
@sloanlance sloanlance changed the title 🚧 DRAFT 🚧 — filename options for split (iss. #1365) filename options for split (iss. #1365) Aug 22, 2023
@sloanlance sloanlance marked this pull request as ready for review August 22, 2023 03:21
Copy link
Owner

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the one change requested -- thank you!!! :)

internal/pkg/transformers/split.go Show resolved Hide resolved
@johnkerl
Copy link
Owner

Did I miss anything?

I think this is fine. If we see anything later we can always add it.

Do you have any thoughts about putting docs inside/beside the source code so there's one set of documentation?

Yeah it's definitely separate ... not sure how to best to implement this ... 🤔

@johnkerl
Copy link
Owner

johnkerl commented Aug 23, 2023

test/README.md
I see it's rough, so I tried some of the commands listed there. The problems I saw were that the -c and -g options for regtest do not work/exist.

Thanks! I will fix.

README-dev.md
The very first bullet point was useful. make check was exactly what I needed, the results of which led me to test/README.md.

The sections about porting from C to Go are somewhat interesting. I was left wondering how long ago did that happen? Was it recent? Maybe the date of the port should be added to the document. If it's far enough in the past, maybe the sections about porting and justifying the choice of Go should be moved to a different document.
The order of the sections is a little confusing. The Miller per se section seems general and should be closer to the top of the document.

Thanks! Yes, I wrote that during the Go port which has been (wow!) a couple years ago. I was porting from C to Go for about a year (late 2020 and all of 2021 IIRC); and Miller 6 came out in January 2022. So, yes, time to acknowledge the status quo and update that doc. :)

Thanks again for the great feedback!!

@johnkerl
Copy link
Owner

Also @sloanlance if you rebase on current main the spurious-not-your-fault Codespell red CI will go away

@johnkerl
Copy link
Owner

Actually @sloanlance I'll accept this since it's a single-letter mod, and then I'll follow up after with a separate commit. No need to hold this fine PR open for such a little thing. :)

@johnkerl johnkerl self-requested a review August 23, 2023 20:08
@johnkerl johnkerl merged commit e233819 into johnkerl:main Aug 23, 2023
5 of 6 checks passed
@sloanlance sloanlance deleted the 1365-split-filename-options branch August 23, 2023 20:12
@sloanlance
Copy link
Contributor Author

sloanlance commented Aug 23, 2023

@johnkerl, thanks! I was willing to make the change you suggested, but I've been busy with other work today.

Anyway, it was great working with you to make this change. I would be happy to work on other issues, time permitting. Have any issues you'd like to recommend?

I had some other questions about the code that I thought we might address before the merge, but they can be discussed anytime, really.

I was wondering: Rather than the option-handling conditionals, would it make sense to use the flags module from the Go standard library or add on a third-party module like getopts? From my experience with other languages, that often seems to be the way to go, to make sure that option handling is consistent.

In a way, I felt a little uneasy about the -e option for a "negative" feature. Thinking of POSIX options, I considered making that the +e option instead, but I wasn't sure how well that would've been received.

@johnkerl
Copy link
Owner

Commits: deda2a9 4405f73

@johnkerl
Copy link
Owner

@sloanlance happy to discuss task ideas!

I actively avoided the flags library because of three things: one is with things like mlr sort -f a -n b -f c I really want to do my own handling of -f twice. It's weird but it's a Miller thing and it's here to stay, and flags didn't play well with it. The second is there are multiple flags, for main mlr as well as verbs: mlr -x -y verb1 --foo --bar then verb2 --this --that. I wanted complete control over that. The third reason is, well, I've done CLI parsing in a dozen languages over dozens of decades and I have my own opinions about it -- which don't mesh well with this particular flags library's particular opinions.

@johnkerl
Copy link
Owner

Re task ideas there are a bunch labelled help-wanted
https://github.com/johnkerl/miller/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22
but I haven't applied that in a while -- basically any open feature-request is fair game:
https://github.com/johnkerl/miller/issues?q=is%3Aopen+is%3Aissue+label%3Afeature-request

What I actually work on at any given time is a function of how complex the ask is & how much time I have -- I'm rather deeply invested in my current day job and I don't spend quite as much time on Miller as I used to. So you can see the backlog has grown a bit. :)

@sloanlance
Copy link
Contributor Author

I just remembered another question I had…

Was it cool for me to replace the usage of a buffer with string arrays instead? Being new to Go, I didn't know whether there was a specific reason for using buffers.

I went with an array because I just wanted to join the file name parts together and not throw in an extraneous one if the prefix is empty. As it turns out, I couldn't just join the prefix with the other parts and escape it all. The escaping would change slash characters ("/"), too! So I ended up not using the joining the way I had first imagined it.

(And as I write this, I now think of a case where the prefix is not empty, but the file could end up beginning with the joiner string anyway. 😆 Oh, well… I can open another PR later.)

@johnkerl
Copy link
Owner

Was it cool for me to replace the usage of a buffer with string arrays instead? Being new to Go, I didn't know whether there was a specific reason for using buffers.

My understanding is buffers are more efficient at concatenating.

But for this which is just a one-time setup of output-file names, it's not even a measurable difference.

The really important thing (which I did measure a while back) is the things that get done on every single record. There's some info at the end of https://github.com/johnkerl/miller/blob/main/README-dev.md.

@johnkerl
Copy link
Owner

Anyway, it was great working with you to make this change. I would be happy to work on other issues, time permitting. Have any issues you'd like to recommend?

@sloanlance here are a few ideas if you're interested -- happy to chat more if you like: #965 #1082 #1184 #1325

In particular, #1325 should be the quickest as there exist other verbs with -r support which should be relatively easy to imitate.

@johnkerl johnkerl changed the title filename options for split (iss. #1365) Filename options for split (iss. #1365) Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Alternate filename formatting for split
2 participants