-
I would like to get some statistics on an irregular csv file, where some rows have fewer fields than others, and while the following error below is useful to know that some rows have fewer fields, the error prevents collecting statistics like printing the actual number of fields, NF, per row: (I need to identify which rows have fewer fields in order to insert blank columns, i.e., two consecutive commas, in the appropriate location in the row... not at the end of the row.) So I'm trying to treat the csv file as number-indexed, which allows me to avoid the above error, but returns incorrect/inflated row counts because comma is my NIDX field delimiter but it also appears inside values enclosed in double quotes. I found the discussion below from 2020 where it's proposed that the miller parser may respect double quotes for NIDX in a future version: Did that ever happen? or is there another way to collect a row-by-row count of the number of fields per row? Here's the mlr expression I'm using, treating the irregular csv file as nidx to avoid the mismatch error: Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
Hi,
You could run
to get
|
Beta Was this translation helpful? Give feedback.
@osevill you could add row number in a second step. So you could use my script to create the output.
And at the end use awk to add row number:
awk -F',' '{print NR "," $0}' file.csv >output.csv