Skip to content

Commit

Permalink
codespell
Browse files Browse the repository at this point in the history
  • Loading branch information
johnkerl committed Aug 20, 2022
1 parent 7c9d0e2 commit d8be06b
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 33 deletions.
3 changes: 0 additions & 3 deletions .github/workflows/codespell.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,4 @@ jobs:
with:
check_filenames: true
ignore_words_file: .codespellignore
# ignore_words_list: denom,inout,iput,nd,nin,numer,te,wee
# There is a word "RO" in docs/src/shapes-of-data.md.in and docs/src/shapes-of-data.md
# which is listed in .codespellignore but which codespell refuses to ignore. Not sure why.
skip: "*.csv,*.dkvp,*.txt,*.js,*.html,*.map,./tags,./test/cases,./docs/src/shapes-of-data.md.in,./docs/src/shapes-of-data.md"
2 changes: 1 addition & 1 deletion docs/src/data/colours.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR
KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR
masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz
masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah
29 changes: 13 additions & 16 deletions docs/src/shapes-of-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Use the `file` command to see if there are CR/LF terminators (in this case, ther
<b>file data/colours.csv </b>
</pre>
<pre class="pre-non-highlight-in-pair">
data/colours.csv: UTF-8 Unicode text
data/colours.csv: Unicode text, UTF-8 text
</pre>

Look at the file to find names of fields:
Expand All @@ -45,18 +45,15 @@ Look at the file to find names of fields:
<b>cat data/colours.csv </b>
</pre>
<pre class="pre-non-highlight-in-pair">
KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR
masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Witter;Biały;Alb;Beyaz
KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR
masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz
masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah
</pre>

Extract a few fields:

<pre class="pre-highlight-in-pair">
<b>mlr --csv cut -f KEY,PL,RO data/colours.csv </b>
</pre>
<pre class="pre-non-highlight-in-pair">
(only blank lines appear)
<pre class="pre-highlight-non-pair">
<b>mlr --csv cut -f KEY,PL,TO data/colours.csv </b>
</pre>

Use XTAB output format to get a sharper picture of where records/fields are being split:
Expand All @@ -65,12 +62,12 @@ Use XTAB output format to get a sharper picture of where records/fields are bein
<b>mlr --icsv --oxtab cat data/colours.csv </b>
</pre>
<pre class="pre-non-highlight-in-pair">
KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Witter;Biały;Alb;Beyaz
KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz

KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah
KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah
</pre>

Using XTAB output format makes it clearer that `KEY;DE;...;RO;TR` is being treated as a single field name in the CSV header, and likewise each subsequent line is being treated as a single field value. This is because the default field separator is a comma but we have semicolons here. Use XTAB again with different field separator (`--fs semicolon`):
Using XTAB output format makes it clearer that `KEY;DE;...;TR` is being treated as a single field name in the CSV header, and likewise each subsequent line is being treated as a single field value. This is because the default field separator is a comma but we have semicolons here. Use XTAB again with different field separator (`--fs semicolon`):

<pre class="pre-highlight-in-pair">
<b>mlr --icsv --ifs semicolon --oxtab cat data/colours.csv </b>
Expand All @@ -83,9 +80,9 @@ ES Blanco
FI Valkoinen
FR Blanc
IT Bianco
NL Witter
NL Wit
PL Biały
RO Alb
TO Alb
TR Beyaz

KEY masterdata_colourcode_2
Expand All @@ -97,17 +94,17 @@ FR Noir
IT Nero
NL Zwart
PL Czarny
RO Negru
TO Negru
TR Siyah
</pre>

Using the new field-separator, retry the cut:

<pre class="pre-highlight-in-pair">
<b>mlr --csv --fs semicolon cut -f KEY,PL,RO data/colours.csv </b>
<b>mlr --csv --fs semicolon cut -f KEY,PL,TO data/colours.csv </b>
</pre>
<pre class="pre-non-highlight-in-pair">
KEY;PL;RO
KEY;PL;TO
masterdata_colourcode_1;Biały;Alb
masterdata_colourcode_2;Czarny;Negru
</pre>
Expand Down
25 changes: 12 additions & 13 deletions docs/src/shapes-of-data.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -18,35 +18,34 @@ Use the `file` command to see if there are CR/LF terminators (in this case, ther

GENMD-CARDIFY-HIGHLIGHT-ONE
file data/colours.csv
data/colours.csv: UTF-8 Unicode text
data/colours.csv: Unicode text, UTF-8 text
GENMD-EOF

Look at the file to find names of fields:

GENMD-CARDIFY-HIGHLIGHT-ONE
cat data/colours.csv
KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR
masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Witter;Biały;Alb;Beyaz
KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR
masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz
masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah
GENMD-EOF

Extract a few fields:

GENMD-CARDIFY-HIGHLIGHT-ONE
mlr --csv cut -f KEY,PL,RO data/colours.csv
(only blank lines appear)
mlr --csv cut -f KEY,PL,TO data/colours.csv
GENMD-EOF

Use XTAB output format to get a sharper picture of where records/fields are being split:

GENMD-CARDIFY-HIGHLIGHT-ONE
mlr --icsv --oxtab cat data/colours.csv
KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Witter;Biały;Alb;Beyaz
KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz

KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah
KEY;DE;EN;ES;FI;FR;IT;NL;PL;TO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah
GENMD-EOF

Using XTAB output format makes it clearer that `KEY;DE;...;RO;TR` is being treated as a single field name in the CSV header, and likewise each subsequent line is being treated as a single field value. This is because the default field separator is a comma but we have semicolons here. Use XTAB again with different field separator (`--fs semicolon`):
Using XTAB output format makes it clearer that `KEY;DE;...;TR` is being treated as a single field name in the CSV header, and likewise each subsequent line is being treated as a single field value. This is because the default field separator is a comma but we have semicolons here. Use XTAB again with different field separator (`--fs semicolon`):

GENMD-CARDIFY-HIGHLIGHT-ONE
mlr --icsv --ifs semicolon --oxtab cat data/colours.csv
Expand All @@ -57,9 +56,9 @@ ES Blanco
FI Valkoinen
FR Blanc
IT Bianco
NL Witter
NL Wit
PL Biały
RO Alb
TO Alb
TR Beyaz

KEY masterdata_colourcode_2
Expand All @@ -71,15 +70,15 @@ FR Noir
IT Nero
NL Zwart
PL Czarny
RO Negru
TO Negru
TR Siyah
GENMD-EOF

Using the new field-separator, retry the cut:

GENMD-CARDIFY-HIGHLIGHT-ONE
mlr --csv --fs semicolon cut -f KEY,PL,RO data/colours.csv
KEY;PL;RO
mlr --csv --fs semicolon cut -f KEY,PL,TO data/colours.csv
KEY;PL;TO
masterdata_colourcode_1;Biały;Alb
masterdata_colourcode_2;Czarny;Negru
GENMD-EOF
Expand Down

0 comments on commit d8be06b

Please sign in to comment.