Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv reader: too big number causing a parse error when infer_types is on #333

Closed
1 task done
Flow86 opened this issue Sep 15, 2021 · 3 comments
Closed
1 task done

Comments

@Flow86
Copy link
Contributor

Flow86 commented Sep 15, 2021

I'm parsing a CSV file

Drive,DeviceName,State,Temp(C),LoopState,LoopState
0:0,5000C50093B7DF54,Normal,29,OK,OK
8:0,5001173100E7BDFC,Normal,33,OK,OK

with

jsoncons::csv::csv_options options;
options.assume_header(true);
options.field_delimiter(true);
const auto csv = jsoncons::csv::decode_csv<jsoncons::json>(csv, options);
// ...

It throws me a parse error exception with line/column information pointing to the comma after "FC" (last line from CSV)

it seems it tries to interpret the number 5001173100E7BDFC as scientific notation: 5001173100 E 7BDFC and of course can't fit it into one.

Shouldn't it fall back to "string" if it can't infer a numerical type?

The parse error is really unexpected there and took us a while to figure out whats wrong.

What compiler, architecture, and operating system?

  • Compiler: MSVC, clang
  • Architecture (e.g. x86, x64): x64 and x86
  • Operating system: Windows 10, Ubuntu

What jsoncons library version?

  • Latest release 0.167.1
@Flow86 Flow86 added the Bug label Sep 15, 2021
@danielaparker
Copy link
Owner

danielaparker commented Sep 15, 2021

This

options.field_delimiter(true);

isn't right, should be

options.field_delimiter(',');

I'm assuming that's a typo, though.

Your example should work. This test on Windows 10 produces the same results with both VS and clang:

#include <jsoncons/json.hpp>
#include <jsoncons_ext/csv/csv.hpp>

using jsoncons::json;

int main()
{
    std::string data = R"(Drive,DeviceName,State,Temp(C),LoopState,LoopState
0:0,5000C50093B7DF54,Normal,29,OK,OK
8:0,5001173100E7BDFC,Normal,33,OK,OK
)";

    jsoncons::csv::csv_options options;
    options.assume_header(true);
    options.field_delimiter(',');

    try
    {
        const auto csv = jsoncons::csv::decode_csv<jsoncons::json>(data, options);

        std::cout << pretty_print(csv) << "\n";
    }
    catch (const std::exception& e)
    {
        std::cout << e.what() << "\n";
    }
}

Output:

[
    {
        "DeviceName": "5000C50093B7DF54",
        "Drive": "0:0",
        "LoopState": "OK",
        "State": "Normal",
        "Temp(C)": 29
    },
    {
        "DeviceName": "5001173100E7BDFC",
        "Drive": "8:0",
        "LoopState": "OK",
        "State": "Normal",
        "Temp(C)": 33
    }
]

@Flow86
Copy link
Contributor Author

Flow86 commented Sep 16, 2021

oh yes, that was a copy&paste.

I checked again and found out I was accidently on 0.157.1
After using the latest version I still got an error.

Perhaps I stripped down the data too much

Drive,DeviceName,State,Temp(C),LoopState,LoopState
0:0,5000C50093B7DF54,Normal,29,OK,OK
1:0,5000C50093B808FC,Normal,29,OK,OK
2:0,5000C50093B83414,Normal,31,OK,OK
3:0,5000C50093B7F87C,Normal,29,OK,OK
4:0,5000C50093B84928,Normal,31,OK,OK
5:0,5000C50093B83250,Normal,31,OK,OK
6:0,5000C50093B846C0,Normal,30,OK,OK
7:0,5000C50093B84720,Normal,30,OK,OK
8:0,5001173100E7BDFC,Normal,33,OK,OK
9:0,5001173100E95978,Normal,32,OK,OK
10:0,5001173100E7C37C,Normal,34,OK,OK
11:0,5001173100E79700,Normal,34,OK,OK

I now get:

The received JSON data could not be parsed: Invalid digit at line 11 and column 21

on the old version, it broke at line 10 column 21.

                    V
9:0,5001173100E95978,Normal,32,OK,OK

@danielaparker
Copy link
Owner

danielaparker commented Sep 16, 2021

Okay, it's failing on line 11, with this one

5001173100E95978

It infers that's a number. I think this is a situation where you have to specify the column types explicitly.

There's still a defect, the parser recognized 5001173100E95978 as a JSON number, but tried to convert it into an integer rather than a float, because there was no decimal point. But with the E, it should have gone with the float. We'll fix that in the next release.

If it's more convenient, you can just specify the first two types, it'll infer the rest:

#include <jsoncons/json.hpp>
#include <jsoncons_ext/csv/csv.hpp>

using jsoncons::json;

int main()
{
    std::string data = R"(Drive,DeviceName,State,Temp(C),LoopState,LoopState
8:0,5001173100E7BDFC,Normal,33,OK,OK
9:0,5001173100E95978,Normal,32,OK,OK
)";

    jsoncons::csv::csv_options options;
    options.assume_header(true)
           .field_delimiter(',')
           .column_types("string,string");

    try
    {
        const auto csv = jsoncons::csv::decode_csv<jsoncons::json>(data, options);

        std::cout << pretty_print(csv) << "\n";
    }
    catch (const std::exception& e)
    {
        std::cout << e.what() << "\n";
    }
}

Output:

[
    {
        "DeviceName": "5001173100E7BDFC",
        "Drive": "8:0",
        "LoopState": "OK",
        "State": "Normal",
        "Temp(C)": 33
    },
    {
        "DeviceName": "5001173100E95978",
        "Drive": "9:0",
        "LoopState": "OK",
        "State": "Normal",
        "Temp(C)": 32
    }
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants