Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

actix-multipart: Feature: Add typed multipart form extractor #2883

Merged
merged 14 commits into from
Feb 26, 2023
Merged

actix-multipart: Feature: Add typed multipart form extractor #2883

merged 14 commits into from
Feb 26, 2023

Conversation

jacob-pro
Copy link
Contributor

@jacob-pro jacob-pro commented Sep 20, 2022

PR Type

Feature

PR Checklist

  • Tests for the changes have been added / updated.
  • Documentation comments have been added / updated.
  • A changelog entry has been made for the appropriate packages.
  • Format code with the latest stable rustfmt.
  • (Team) Label with affected crates and semver status.

Overview

This introduces a new multipart form/data extractor for recieving a multipart upload into a struct.

I believe this should be enough to cover all the various features of the existing similar crates as per the discussion here: #2849

An example form looks like:

#[derive(MultipartForm)]
struct Upload {
    description: Option<Text<String>>,
    timestamp: Text<i64>,
    #[multipart(rename="image_set[]")
    image_set: Vec<Tempfile>,
}

async fn route(form: MultipartForm<Upload>) -> impl Responder {
    ...
}

The key feature is that each field just needs to implement the FieldReader trait. This trait allows the user to provide their own abitarary async handler for processing a field, for example they may want to stream the data to S3.

I look forward to your feedback! @robjtede @asonix @JSH32 @e-rhodes

List of features:

  • Optional fields
  • Lists/Vec fields (see RFC)
  • Field renaming
  • User customisable field handlers
  • Global and field level data limits
  • Configurable action on duplicate fields
  • Allow denying unknown fields
  • User customisable error messages
  • Stream fields to temporary files
  • Deserialize integers, floats, enums from plain text fields
  • Deserialize complex structs from JSON fields

Update: If anyone wants to use these features right now I have backported them to my existing libary actix-easy-multipart v3.0.0

@JSH32
Copy link

JSH32 commented Sep 20, 2022

Looks good! What is the purpose of using TextField instead of String directly? Can you show an example of how someone might implement their own field or make their own field compatible with TextField? (just for making it easy to reference later on)

@jacob-pro
Copy link
Contributor Author

jacob-pro commented Sep 20, 2022

@JSH32 yep that is a good question.

We want to allow reading into not just String itself, but also things like integers, floats, enums, i.e. Text<T: DeserializeOwned>. This leads to two problems:

Trait Conflicts

But we can't simultaneously implement FieldReader for any T: DeserializeOwned, whilst also allowing the user to use native Vec and Option, giving us two choices:

Choice 1 (what I have implemented)

#[derive(MultipartForm)]
struct Upload {
    numbers: Vec<Text<i64>>,
}

Choice 2 (we would have to use Vec and Option wrapper types)

#[derive(MultipartForm)]
struct Upload {
    numbers: VecWrapper<i64>,
}

This is because of conflicting trait implementations (although one day this might be solved by specialization)

impl<'t, T: DeserializeOwned> FieldReader<'t> for T 

error[E0119]: conflicting implementations of trait `form::FieldGroupReader<'_>` for type `std::option::Option<_>`
   --> actix-multipart\src\form\mod.rs:233:1
    |
158 | / impl<'t, T> FieldGroupReader<'t> for Option<T>
159 | | where
160 | |     T: FieldReader<'t>,
161 | | {
...   |
194 | |     }
195 | | }
    | |_- first implementation here
...
233 | / impl<'t, T> FieldGroupReader<'t> for T
234 | | where
235 | |     T: FieldReader<'t>,
236 | | {
...   |
272 | |     }
273 | | }
    | |_^ conflicting implementation for `std::option::Option<_>`

For more information about this error, try `rustc --explain E0119`.

(Technically we could implement directly for String itself (rather than generic T), but it seems pointless since we would still need Text<T: DeserializeOwned> to work also)

Deserialization Ambiguity

The multipart standard doesn't define any serialization format for the data within the fields themselves. So even though we may want to use serde to automatically deserialize the contents of a field, there is no correct answer to which serde backend to use.

Instead by using the Text type it allows the user to specically opt-in to using serde_plain. Alternatively they can use the Json type, and the text would be deserialized using serde_json instead.

For example:

#[derive(MultipartForm)]
struct DeserializationMethods {
    json: Json<HashMap<String, String>>,
    plain: Text<String>,
}

async fn send() {
        let mut form = multipart::Form::default();
        // We can send the exact same input, but it is up to the server to choose how to deserialize it
        form.add_text("json", "{\"key1\": \"value1\", \"key2\": \"value2\"}");
        form.add_text("plain", "{\"key1\": \"value1\", \"key2\": \"value2\"}");
        ...
}

Compatibility

Can you show an example of how someone might implement their own field or make their own field compatible with TextField

I'm not 100% sure what you mean by this - but the idea is that Text works with any types that implement Deserialize (provided that they are compatible with serde_plain)

If you wanted to use a more complex type e.g. arbitrary structs & complex enums you could use Json instead.

If you wanted to implement your own field type, then you just need to impl<'t> FieldReader<'t> for YourField - have a look at Bytes, Json, Text, or Tempfile as examples

Copy link
Contributor

@junbl junbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly nitpicks form me. great work, I'm excited to use this! ❤️

actix-multipart/src/form/json.rs Outdated Show resolved Hide resolved
actix-multipart/src/form/json.rs Outdated Show resolved Hide resolved
actix-multipart/src/form/bytes.rs Outdated Show resolved Hide resolved
@jacob-pro
Copy link
Contributor Author

@robjtede when you have time, please can you review this PR - and let me know what you think?

.or_insert_with(|| T::limit(field.name()));
limits.field_limit_remaining = entry.to_owned();

T::handle_field(&req, field, &mut limits, &mut state).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this might be difficult with the &mut limits and &mut state in here, but would there be any way to race handle_field against payload.try_next()

In my crate, actix-form-data, I have a FuturesUnordered that I push the field handler futures into. This allows field handlers to drop their Field stream upon completion and make the remaining work concurrent with the next field handlers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... that is interesting I never even thought about that. I think to do that we would have to synchronise on the limits & state map, which is do-able.

I do wonder though if it actually brings any noticable performance improvement - since the multipart as a whole is being received as a single stream, presumably the next part can't be received until all of the previous part has been buffered - I guess it would depend on how much buffering actix-multipart does internally?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a handler does something like

let bytes = Vec::new();
while let Some(res) = field.next().await {
    bytes.extend_from_slice(&res?);
}
drop(field);
let result =  send_these_bytes_somewhere(bytes).await;

then the send_these_bytes_somewhere could be run concurrently with the next field handler, since the field was completely read and dropped before that is executed.

I use this in my application pict-rs to better allow concurrently uploading files to object storage: https://git.asonix.dog/asonix/pict-rs/src/branch/main/src/store/object_store.rs#L246
This allows making additional requests to object storage without waiting for the existing in-flight requests to complete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see - you're effectively adding your own internal buffer to the handler

Copy link
Contributor Author

@jacob-pro jacob-pro Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean your server memory usage is only bounded by the overall form limit, e.g. if send_these_bytes_somewhere / your object storage backend is running quite slowly, whilst the multipart upload is very fast, then nearly all of the contents could be be read into memory before getting uploaded?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is true. actix-form-data has per-field size limits and total field count limits, so the maximum possible memory usage is max_field_size * max_field_count, and in theory that limit could be hit by slow object storage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done some thinking on this and I think I can get away with using spawn to make this concurrent without changing the design here.

@robjtede robjtede added C-feature Category: new functionality A-multipart project: actix-multipart labels Nov 25, 2022
@robjtede robjtede added B-semver-major breaking change requiring a major version bump and removed C-feature Category: new functionality labels Nov 25, 2022
@robjtede
Copy link
Member

Sorry for the mega delay; finally getting around to reviewing this. Excited by what I've gone through so far :)

@jacob-pro
Copy link
Contributor Author

Thanks for looking at this @robjtede - I'm glad you figured out some improvements to the trait name 😆 !

@vbocan
Copy link

vbocan commented Dec 21, 2022

What is the status of this PR? This is excellent work and I'd really love to have an "official" way to handle forms that have text fields as well as binary files. I am currently evaluating actix-easy-multipart v3.0.0 but I'd rather use actix-multipart with the same features. Thanks @jacob-pro for the hard work!

/// Unknown field
#[display(fmt = "Unsupported field `{}`", _0)]
#[from(ignore)]
UnsupportedField(#[error(not(source))] String),
}

/// Return `BadRequest` for `MultipartError`
impl ResponseError for MultipartError {
fn status_code(&self) -> StatusCode {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robjtede you may also want to add an override to the error_response() function as well

@DuckyBlender
Copy link

Any status update?

@robjtede
Copy link
Member

Thanks for the nudge @DuckyBlender.

Added some trybuild tests. Once CI passes we'll get this merged and released 🎉

@robjtede robjtede enabled auto-merge (squash) February 26, 2023 03:12
@robjtede robjtede merged commit d4b833c into actix:master Feb 26, 2023
@jacob-pro jacob-pro deleted the multipart-forms branch February 26, 2023 10:27
@robjtede robjtede linked an issue Feb 26, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multipart project: actix-multipart B-semver-major breaking change requiring a major version bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for multipart form extractors
7 participants