-
-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
document usage of #[serde(flatten)] more thoroughly #151
Comments
Aw darn, this also fails with strings like I made a failing test in a local branch, and after about 10 minutes of poking around this feels like it might be to blame: Lines 205 to 224 in 2169628
|
Great bug report! Thanks. And thank you for the good reproduction. As far as I can tell, the implementation of The For things like JSON, this works out great because the type of a value is determined by the JSON syntax itself. But CSV is completely untyped, so we have to effectively "guess" what the value is by trying various things. This is the So popping up a level at this point, it's now actually possible to see a solution. The problem is that the csv crate is detecting a number in the data, but you are asking for a #[derive(Deserialize, Debug, PartialEq)]
#[serde(rename_all = "snake_case")]
#[serde(untagged)]
enum Value {
Bool(bool),
U64(u64),
I64(i64),
Float(f64),
String(String),
} Bringing it all together: extern crate csv; // 1.0.5
extern crate serde_derive; // 1.0.88
extern crate serde; // 1.0.88
use std::collections::HashMap;
use serde_derive::Deserialize;
#[derive(Debug, Deserialize)]
struct SomethingEntry {
name: String,
#[serde(flatten)]
values: HashMap<String, Value>,
}
#[derive(Deserialize, Debug, PartialEq)]
#[serde(rename_all = "snake_case")]
#[serde(untagged)]
enum Value {
Bool(bool),
U64(u64),
I64(i64),
Float(f64),
String(String),
}
fn main() {
let source = r#"name,stat
name,Benjamin
maxHealth,300"#;
let mut rdr = csv::Reader::from_reader(source.as_bytes());
let records: Vec<SomethingEntry> = rdr.deserialize()
.map(Result::unwrap)
.collect();
println!("{:?}", records);
} And the output is:
Popping up a level, it does kind of feel like the flatten feature of Serde should be able to figure this out. With that said, I am a mere user of Serde and its constraints are complex and not something I am an expert in. With respect to the csv crate, I'm going to leave this issue open, because I think this would make a great example to add to the cookbook. Namely, "parse some subset of known fields, but put everything else in a catch-all map" feels like a pretty useful tool to have. |
Thanks for the super quick response. This gives me a way forward for my project, which makes me really happy. I guess I also have the option of implementing I agree that this seems like it should work intuitively with just flatten. I wonder if there's room for a trait to implement on keyed collections like |
Oh, this has some other interesting fallout -- things that parse as numbers will be rounded when they try to fit into those types. Given the example above, we can plug in this CSV:
The number value will turn into:
...and thus turns back into a string in the rounded form. Luckily, that's enough to represent the largest phone numbers I'm aware of, but I'm kind of afraid this'll cause subtle behavior changes in other places that people aren't expecting. Similarly, though more unlikely, I'm kind of afraid of rounding affecting values that look like floating point numbers. EDIT: 😱 strings with leading zeroes have their leading zeroes stripped, so some phone numbers get clobbered when being treated as numbers!! |
I don't really think there's anything I can do about this, honestly. This is really a matter for the flatten serde API. It fundamentally works by asking for the deserializer to infer the type of the next value, and the csv deserializer does the best it can. But it can never be perfect because csv is completely untyped. So as long as you're using Serde functionality that relies on inferring types from csv data, it will always be fundamentally flawed. So my suggestion is that this is either something you live with, or you file a bug against serde. With that said... Don't forget, there is always another choice, although an unsatisfactory one: don't use serde. Another choice is to combine serde and raw record parsing. It's a bit circuitous, but you can:
|
I've run into a similar problem with I tried patching rust-csv to change the implementation of
and I have verified that it fixes both my issue and the test case that @LPGhatguy provided above. It seems legitimate to want to consider a CSV row as simply a |
@bchallenor Could you please provide a complete Rust program that compiles, along with any relevant input, your expected output and the actual output? If possible, please briefly describe the problem you're trying to solve. (To be honest, I think it might be better to open a new issue, unless you're 100% confident that this one is the same.) Your suggested work-around by adding a crate feature to disable inference is probably a non-starter because it likely breaks other functionality. This sort of subtle interaction between features via Cargo features is not the kind of complexity I'd like to add to the crate. Instead, I'd like to look at your problem from first principles, however, I do not see a complete Rust program from you here or in the linked issue. |
Sure, I'll put together an example in a new issue, thanks. |
I feel like this is a related issue. In fact I'd consider it more concrete, since const FILE: &[u8] = br#"root_num,root_string,num,string
1,a,1,a
,,1,
1,a,,a
"#;
#[derive(Debug, serde::Deserialize)]
struct Row {
root_num: Option<u32>,
root_string: Option<String>,
#[serde(flatten)]
nest: Nest,
}
#[derive(Debug, serde::Deserialize)]
struct Nest {
num: Option<u32>,
string: Option<String>,
}
fn main() {
for x in csv::Reader::from_reader(std::io::Cursor::new(FILE)).into_deserialize() {
match x {
Ok(r @ Row { .. }) => println!("{:?}", r),
Err(err) => println!("{}", err),
}
}
} outputs
Look at it go! |
I am trying out csv writer with a use csv::Writer;
use serde::Serialize;
use serde_with::with_prefix;
#[derive(Serialize, Default)]
struct B {
field: usize,
}
#[derive(Serialize, Default)]
struct A {
other_field: usize,
#[serde(flatten, with = "b")]
b: B,
}
with_prefix!(b "b.");
let data = A::default();
let mut wtr = Writer::from_writer(vec![]);
wtr.serialize(data).unwrap();
let csv_line = String::from_utf8(wtr.into_inner().unwrap()).unwrap();
println!("{}", csv_line); Neither struct has any dynamic fields but I'm getting |
@twitu does it work without the |
use csv::Writer;
use serde::Serialize;
#[derive(Serialize, Default)]
struct B {
field: usize,
}
#[derive(Serialize, Default)]
struct A {
other_field: usize,
#[serde(flatten)]
b: B,
}
let data = A::default();
let mut wtr = Writer::from_writer(vec![]);
wtr.serialize(data).unwrap();
let csv_line = String::from_utf8(wtr.into_inner().unwrap()).unwrap();
println!("{}", csv_line); Nope, this is failing as well. So using flatten with rust csv is not at all possible? |
It should have been supported better. I guess a workaround is to use serde_json::Value as a middle value and convert it to custome structs |
I got a cool bug report on a project of mine that uses the csv crate: rojo-rbx/rojo#145
I deserialize files into a
Vec
of a struct with a shape like this:The actual structure uses
#[serde(flatten)]
to capture extra columns since it's used in a localization format for a game engine.What version of the
csv
crate are you using?1.0.5
Briefly describe the question, bug or feature request.
Deserializing a field that uses
#[serde(flatten)]
on aHashMap<String, String>
fails if the value in a record looks like a number.Include a complete program demonstrating a problem.
Playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8d7a007589ba375a79a55b1d825ffdb5
What is the observed behavior of the code above?
The program panics, since csv returns an error:
What is the expected or desired behavior of the code above?
The first entry should be have a name of
name
andvalues
set to{"stat": "Benjamin"}
.The second entry should have a name of
maxHealth
andvalues
set to{"stat": "300"}
.Since the code works for most inputs, I don't expect it to start failing to parse just because the user put something into the spreadsheet that looks like a number.
The text was updated successfully, but these errors were encountered: