Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove file parameter from UploadFile #1406

Closed
wants to merge 6 commits into from

Conversation

adriangb
Copy link
Member

@adriangb adriangb commented Jan 11, 2022

As per discussion in #1405 and #1312, this parameter is not used in Starlette and is not documented as part of the public API. Having this parameters makes it harder to implement features like the one suggested in #1405.

This PR would remove this parameter.
This may break users tests, but should not break their production code since Starlette doesn't currently use that feature.
For a user to experience a breakage at runtime in their production code, they would have to have been using an undocumented constructor parameter to UploadFile outside of the context of the rest of Starlette, which I'm guessing is no one.

@tomchristie
Copy link
Member

Hrm. So the parameters in the constructor here are meaningful/useful only from a perspective of "UploadFile() should be a nice stand-alone testable interface".

Interestingly after reviewing this, if I was refactoring this, I'd probably do this the other way around...

class UploadFile:
    def __init__(
         self,
         file: typing.Optional[typing.BinaryIO],
         *,
         filename: str = "",
         content_type: str = "",
         headers: typing.Optional[Headers] = None,
    ) -> None:
        ...

In this style the form parsing would be responsible for opening a temporary file.

I kinda prefer this style because it makes sense the actual file I/O and eg. temporary file naming styles policy etc to be outside of this class, and for this to actually be a small wrapper onto "I'm a (binary-only) file-like object, that also has HTTP headers attached."

There's some finer details to talk around about that, but on the whole I think that way around makes more sense. Keep the I/O out of the datastructure itself.

@adriangb
Copy link
Member Author

I see your point Tom. I think that would certainly be an improvement since there'd only be 1 execution path instead of 2 and we'd actually use that 1 execution path in Starlette. I'm happy to make a PR for that instead if we settle on it.

Some practical questions: how will this play with #1405? Does UploadFile assume that the size of the file is 0? Do we add an additional initial_size parameter? Or can the size track "the size written"?

@tomchristie
Copy link
Member

Some practical questions: how will this play with #1405? Does UploadFile assume that the size of the file is 0? Do we add an additional initial_size parameter? Or can the size track "the size written"?

Depends - one option here would be to drop the filename and content_type arguments entirely at that point, and instead have them as properties. (They're derived from the headers anyways so that makes sense to me.) If we did that then it'd also be reasonable to have an optional content_length property too, mirroring what Flask does.

The nice thing then is that the UploadFile() data structure is then just UploadFile(file, headers), which properly reflects the underlying data it actually holds. But that also has a couple of properties...

  • filename - From the Content-Disposition header, with a fallback to file.name.
  • content_type - From the Content-Type header, or None.
  • content_length - An int from the Content-Length header, or None.

@adriangb
Copy link
Member Author

Yup I like that!

@adriangb
Copy link
Member Author

A couple possible issues with this design Tom:

  1. Like you say, file comes from the Content-Disposition field header options. We'd need to duplicate code in the parser and in UploadFile to pull out that option. This may be okay, but I just wanted to call it out.
  2. I don't think clients include a Content-Length header for fields (they do for the overall request, but it's the total number of bytes in the body, not split up by field). There also doesn't seem to be a directive for the Content-Disposition header for size (ref), and TestClient/requests at least does not include anything indicating the size of the field. So we'd be back to having to keep track of the size as we stream the request body as proposed in Feature: Add size attribute to UploadFile #1405

@tomchristie
Copy link
Member

Like you say, file comes from the Content-Disposition field header options. We'd need to duplicate code in the parser and in UploadFile to pull out that option. This may be okay, but I just wanted to call it out.

The filename (as specified by the client performing the upload) comes from the Content-Disposition. I don't think(?) we need that in the parser at all. The path that we use for the file is just whatever the temporary file is given. (right?)

Correct. It's possible that browser clients might include it (since it allows servers to determine up front if they want to accept or reject the upload), but I've no indication if they do or not - it'd need looking into. Even if Content-Length is included by some clients, you still need to verify that it's correct. I'm not necessarily absolutely sold on us including the size parameter (or necessarily absolutely against it) but we can have that conversation separately.

@adriangb
Copy link
Member Author

The filename (as specified by the client performing the upload) comes from the Content-Disposition. I don't think(?) we need that in the parser at all. The path that we use for the file is just whatever the temporary file is given. (right?)

We use the filename to determine if it's a file or form field:

if file is None:
items.append((field_name, _user_safe_decode(data, charset)))
else:
await file.seek(0)
items.append((field_name, file))

I'm not necessarily absolutely sold on us including the size parameter (or necessarily absolutely against it) but we can have that conversation separately.

Fair enough, we can keep that as a separate discussion.

@adriangb adriangb added the refactor Refactor code label Feb 2, 2022
@adriangb
Copy link
Member Author

adriangb commented Feb 3, 2022

Closing in favor of #1413

@adriangb adriangb closed this Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Refactor code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants