Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV file containing empty line isn't parsed #13

Open
mb21 opened this issue Jul 11, 2015 · 6 comments
Open

CSV file containing empty line isn't parsed #13

mb21 opened this issue Jul 11, 2015 · 6 comments
Labels

Comments

@mb21
Copy link

mb21 commented Jul 11, 2015

$ cat a.md 

```{.table caption="capt" source="b.csv"}
```

$ cat b.csv 
foo,bar
,
foo,bar

$ pandoc --filter pandoc-csv2table a.md 
<p>+-------+-------+ | foo | bar | +=======+=======+ +-------+-------+ | foo | bar | +-------+-------+</p>
<p>Table: capt</p>
@baig
Copy link
Owner

baig commented Jul 11, 2015

Your CSV is invalid. It should be like this:

foo,bar
foo,bar

@mb21
Copy link
Author

mb21 commented Jul 11, 2015

Well, CSV isn't a particularly well-defined format. But every spreadsheet software I know of would parse my csv file as one containing an empty line (in fact, it was generated by google sheets). So I would expect:

| foo  | bar |
|------|-----|
|      |     |
| foo  | bar |

Actually, it's not even Text.CSV, in ghci:

Prelude Text.CSV> parseCSVFromFile "b.csv"
Right [["foo","bar"],["",""],["foo","bar"],[""]]

I wonder why the filter decides to print the rendered table as markdown wrapped in a paragraph of all things... you wrote earlier "as an intermediate step it pipes the CSV contents through Pandoc's Markdown Reader." I still don't understand that design decision: why not convert the list of lists we got from Text.CSV directly to a Text.Pandoc.Definition.Table?

@baig
Copy link
Owner

baig commented Jul 11, 2015

Then this seems like a csv parser issue. The filter uses an external csv parser which implements csv parsing as defined in RFC 4180.

@mb21
Copy link
Author

mb21 commented Jul 11, 2015

See my updated message above. Also, I was curious, so I checked out the RFC's BNF grammar. It defines a record as one or more comma-separated fields, and a field as escaped or non-escaped where non-escaped is zero or more TEXTDATA, so the file is valid...

@baig
Copy link
Owner

baig commented Jul 11, 2015

I still don't understand that design decision: why not convert the list of lists we got from Text.CSV directly to a Text.Pandoc.Definition.Table?

Because pandoc tables allow markdown inside their cells.

I'll have to see where the filter is going wrong but don't hold your breath in the meantime. I am finalizing my dissertation and it might be a while before I look into it.

Also, pull requests are welcomed and appreciated.

@baig baig added the bug label Jul 11, 2015
@mb21
Copy link
Author

mb21 commented Jul 12, 2015

okay :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants