Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv does not work for species name with a comma #1372

Closed
BangShiuh opened this issue Aug 23, 2022 · 13 comments · Fixed by #1508
Closed

read_csv does not work for species name with a comma #1372

BangShiuh opened this issue Aug 23, 2022 · 13 comments · Fixed by #1508
Labels
bug help wanted Species Issues related to species definitions

Comments

@BangShiuh
Copy link
Contributor

Problem description

read_csv does not properly handle a comma in a species name.
Ex. CH(1,3).

gas = ct.Solution("gri30_mod.yaml")
states = ct.SolutionArray(gas)
gas.TPX = 300, ct.one_atm, "O2:1.0"
states.append(gas.state)
states.write_csv("output.csv")
states2 = ct.SolutionArray(gas)
states2.read_csv("output.csv")

Line #2 (got "x" columns instead of "x+1")
x is the number of variables.

System information

  • Cantera version: 3.0.0

Attachments

Additional context

@ischoegl
Copy link
Member

ischoegl commented Aug 23, 2022

@BangShiuh ... I think that csv being a 'comma-separated-list' by definition/default, this behavior is expected. So I do not believe that this is improper handling of output, but a problematic (i.e. difficult-to-parse) choice of species name.

I'd recommend using a different output format (although even YAML may face similar issues; HDF may not), or replace the comma in the species name with a different character.

@bryanwweber
Copy link
Member

I'm surprised this isn't handled by the write_csv method. You can quote values with commas in CSV files so that this doesn't happen. Doesn't write_csv use NumPy or pandas underneath?

@ischoegl
Copy link
Member

ischoegl commented Aug 23, 2022

write_csv uses the csv package, while read_csv uses numpy.genfromtxt (with delimiter set to ','). I believe there are other internal writers in use beyond what is defined in composite.py, so there isn't a uniform approach to CSV. I.e. I don't think that there can be any guarantees for a species name that contains a comma at the moment.

@bryanwweber
Copy link
Member

I think this is worth fixing... This is a common enough convention for species naming to indicate a particular shape of the molecule, so I don't see "don't use those species" as a long term solution. As I said this is a known/solved problem if you have to embed the delimiter in the value for any *sv type file, so it may just take a little more configuration to resolve on the write side.

@ischoegl
Copy link
Member

I think this is worth fixing

I don’t have an issue with this, but would like to see solutions that are not API-centric.

@bryanwweber bryanwweber added bug help wanted Species Issues related to species definitions labels Sep 26, 2022
@corykinney
Copy link
Contributor

I think this is worth fixing

I don’t have an issue with this, but would like to see solutions that are not API-centric.

Could you clarify what you mean when you say API-centric?

I think this could be easily solved by switching from np.genfromtxt to pd.read_csv, as the latter supports commas enclosed in quotes, which is the widely accepted approach (see point 6 in the definition of the CSV format) to formatting fields with commas in them.

@bryanwweber
Copy link
Member

Thanks for the link @cory-kinney! I think we probably don't want to introduce a hard dependency on pandas to this code. This link suggests a fix using the csv module which seems like it would work, although we'd probably want to adapt it for our specific use: https://stackoverflow.com/a/41930090

I'm not sure what Ingmar meant about API specific. I suspect he means that ideally the fix would be able to work for all the code that needs to read csv input, not just this one place in the Python interface.

@ischoegl
Copy link
Member

ischoegl commented Dec 29, 2022

I'm not sure what Ingmar meant about API specific. I suspect he means that ideally the fix would be able to work for all the code that needs to read csv input, not just this one place in the Python interface.

#1385 introduces a new C++ SolutionArray that will work for any API. At the moment, it only covers Sim1D for HDF and YAML formats, but follow-up work will establish a link to the Python loaders, where support for CSV will be relatively straight-forward.

PS: likewise thanks for the link … I haven’t found a simple CSV library for C++ (and I’m not sure that it’s worth introducing a dependency for a relatively simple task), so this document is a nice reference

@bryanwweber
Copy link
Member

I haven’t found a simple CSV library for C++

Looks like it should be fairly simple with boost: https://bravenewmethod.com/2016/09/17/quick-and-robust-c-csv-reader-with-boost/

@corykinney
Copy link
Contributor

corykinney commented Dec 29, 2022

Of course! I found the link on Stack Overflow, naturally.

Looks like it should be fairly simple with boost: https://bravenewmethod.com/2016/09/17/quick-and-robust-c-csv-reader-with-boost/

This looks like it would cover the main edge cases that we would encounter. No need to reinvent the wheel.

@ischoegl
Copy link
Member

This looks like it would cover the main edge cases that we would encounter. No need to reinvent the wheel.

Yup. PR’s are always welcome 😜

@speth
Copy link
Member

speth commented Jan 3, 2023

There should be no need to use Boost for this, by the way -- the C++ standard library now includes <regex> (which we use elsewhere) and has I believe essentially the same API as the Boost version. See https://en.cppreference.com/w/cpp/regex. Anything available in C++14 or older is fair game as far as Cantera is concerned.

@ischoegl
Copy link
Member

Now that #1426 is merged, all the infrastructure is in place to handle this in C++ as part of SolutionArray. I created a new feature request in Cantera/enhancements#163, as I currently don't have the bandwidth to tackle this myself.

@speth speth moved this from Triage to Deferred in Cantera 3.0 Release Planning Jun 14, 2023
@ischoegl ischoegl moved this from Deferred to In Progress in Cantera 3.0 Release Planning Jun 20, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Cantera 3.0 Release Planning Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug help wanted Species Issues related to species definitions
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

5 participants