-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse "Content-Disposition" headers #187
Comments
Poking around in curl source (curl curl, not this package 🙂) does make me wonder ... could this package somehow access the plain old |
Yes, you would have to parse it out of the header as the curl tool does. |
I am parsing content-disposition In guess_basename() helper here https://github.com/dmi3kno/polite/blob/master/inst/templates/polite_template.R |
I might change my mind in the future, but for now I won't implement this in the
If I were to write such a function I would recommend doing it in R, basically as in tidy_download. I.e. first download to a temporary file and then after the download has succeeded, check the response headers from the handle and possibly rename the file. I do not recommend writing directly to the filename suggested by the content-disposition in C, because it is difficult to implement and makes malformed content-disposition headers possibly result in a crash or security vulnerability. |
We have a pair of functions in usethis (
use_course()
and, in the dev version,use_zip()
) that download a ZIP file and unpack it. They have two user-friendly features:For an hour or two this weekend, I aspired to make a PR to curl to parse the "Content-Disposition" header, paving the way towards a
curl_download()
variant that could determinedestfile
for itself. But I gave up, feeling convinced that to do this with the rigor necessary in the curl package (vs a usethis function for interactive use) is too much work.Leaving a few notes here for discussion, since I bothered to research this. Maybe one day the landscape will change and the cost-benefit analysis looks better. Ideally, an existing parser could just be used/embedded.
Concrete examples of how the header looks in cases I see all the time:
Problem is, lots of weird edge cases are possible, in terms of quoting and specifying the encoding.
curl itself even punts on the
filename*
field:https://github.com/curl/curl/blob/3538026f6f145b2811f4d515992565d6cbe969b0/src/tool_cb_hdr.c#L109-L110
and dealing with this properly is an official TODO:
https://curl.haxx.se/docs/todo.html#UTF_8_filenames_in_Content_Dispo
Main resource on how to use this header:
RFC 6266 Use of the Content-Disposition Header Field in the Hypertext Transfer Protocol (HTTP)
https://tools.ietf.org/html/rfc6266
An extensive test suite exploring lots of edge cases and how various browser handle them (or, rather, fail to):
http://test.greenbytes.de/tech/tc2231/
Some Python efforts, that convey a general sense of neglect/abandonment:
@jeroen it's fine if you want to just close this, w/ or w/o making some comments. I just wanted to put these links somewhere.
The text was updated successfully, but these errors were encountered: