Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and unit-test regex-capture reset logic #1451

Merged
merged 2 commits into from
Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 15 additions & 25 deletions docs/src/manpage.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ MILLER(1) MILLER(1)
insertion-ordered hash map. This encompasses a variety of data
formats, including but not limited to the familiar CSV, TSV, and JSON.
(Miller can handle positionally-indexed data as a special case.) This
manpage documents mlr 6.10.0.
manpage documents mlr 6.10.0-dev.

1mEXAMPLES0m
mlr --icsv --opprint cat example.csv
Expand Down Expand Up @@ -220,19 +220,18 @@ MILLER(1) MILLER(1)
is_numeric is_present is_string joink joinkv joinv json_parse json_stringify
kurtosis latin1_to_utf8 leafcount leftpad length localtime2gmt localtime2nsec
localtime2sec log log10 log1p logifit lstrip madd mapdiff mapexcept mapselect
mapsum match matchx max maxlen md5 mean meaneb median mexp min minlen mmul
mode msub nsec2gmt nsec2gmtdate nsec2localdate nsec2localtime null_count os
percentile percentiles pow qnorm reduce regextract regextract_or_else rightpad
round roundm rstrip sec2dhms sec2gmt sec2gmtdate sec2hms sec2localdate
sec2localtime select sgn sha1 sha256 sha512 sin sinh skewness sort
sort_collection splita splitax splitkv splitkvx splitnv splitnvx sqrt ssub
stddev strfntime strfntime_local strftime strftime_local string strip strlen
strpntime strpntime_local strptime strptime_local sub substr substr0 substr1
sum sum2 sum3 sum4 sysntime system systime systimeint tan tanh tolower toupper
truncate typeof unflatten unformat unformatx upntime uptime urand urand32
urandelement urandint urandrange utf8_to_latin1 variance version ! != !=~ % &
&& * ** + - . .* .+ .- ./ / // < << <= <=> == =~ > >= >> >>> ?: ?? ??? ^ ^^ |
|| ~
mapsum max maxlen md5 mean meaneb median mexp min minlen mmul mode msub
nsec2gmt nsec2gmtdate nsec2localdate nsec2localtime null_count os percentile
percentiles pow qnorm reduce regextract regextract_or_else rightpad round
roundm rstrip sec2dhms sec2gmt sec2gmtdate sec2hms sec2localdate sec2localtime
select sgn sha1 sha256 sha512 sin sinh skewness sort sort_collection splita
splitax splitkv splitkvx splitnv splitnvx sqrt ssub stddev strfntime
strfntime_local strftime strftime_local string strip strlen strpntime
strpntime_local strptime strptime_local sub substr substr0 substr1 sum sum2
sum3 sum4 sysntime system systime systimeint tan tanh tolower toupper truncate
typeof unflatten unformat unformatx upntime uptime urand urand32 urandelement
urandint urandrange utf8_to_latin1 variance version ! != !=~ % & && * ** + - .
.* .+ .- ./ / // < << <= <=> == =~ > >= >> >>> ?: ?? ??? ^ ^^ | || ~

1mCOMMENTS-IN-DATA FLAGS0m
Miller lets you put comments in your data, such as
Expand Down Expand Up @@ -569,6 +568,7 @@ MILLER(1) MILLER(1)
since direct-to-screen output for large files has its
own overhead.
--no-hash-records See --hash-records.
--norc Do not load a .mlrrc file.
--nr-progress-mod {m} With m a positive integer: print filename and record
count to os.Stderr every m input records.
--ofmt {format} E.g. `%.18f`, `%.0f`, `%9.6e`. Please use
Expand Down Expand Up @@ -2651,16 +2651,6 @@ MILLER(1) MILLER(1)
1mmapsum0m
(class=collections #args=variadic) With 0 args, returns empty map. With >= 1 arg, returns a map with key-value pairs from all arguments. Rightmost collisions win, e.g. 'mapsum({1:2,3:4},{1:5})' is '{1:5,3:4}'.

1mmatch0m
(class=string #args=2) TODO: WRITE ME
Example:
TODO: WRITE ME

1mmatchx0m
(class=string #args=2) TODO: WRITE ME
Example:
TODO: WRITE ME

1mmax0m
(class=math #args=variadic) Max of n numbers; null loses. The min and max functions also recurse into arrays and maps, so they can be used to get min/max stats on array/map values.

Expand Down Expand Up @@ -3660,5 +3650,5 @@ MILLER(1) MILLER(1)



2023-12-16 MILLER(1)
2023-12-19 MILLER(1)
</pre>
40 changes: 15 additions & 25 deletions docs/src/manpage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ MILLER(1) MILLER(1)
insertion-ordered hash map. This encompasses a variety of data
formats, including but not limited to the familiar CSV, TSV, and JSON.
(Miller can handle positionally-indexed data as a special case.) This
manpage documents mlr 6.10.0.
manpage documents mlr 6.10.0-dev.

1mEXAMPLES0m
mlr --icsv --opprint cat example.csv
Expand Down Expand Up @@ -199,19 +199,18 @@ MILLER(1) MILLER(1)
is_numeric is_present is_string joink joinkv joinv json_parse json_stringify
kurtosis latin1_to_utf8 leafcount leftpad length localtime2gmt localtime2nsec
localtime2sec log log10 log1p logifit lstrip madd mapdiff mapexcept mapselect
mapsum match matchx max maxlen md5 mean meaneb median mexp min minlen mmul
mode msub nsec2gmt nsec2gmtdate nsec2localdate nsec2localtime null_count os
percentile percentiles pow qnorm reduce regextract regextract_or_else rightpad
round roundm rstrip sec2dhms sec2gmt sec2gmtdate sec2hms sec2localdate
sec2localtime select sgn sha1 sha256 sha512 sin sinh skewness sort
sort_collection splita splitax splitkv splitkvx splitnv splitnvx sqrt ssub
stddev strfntime strfntime_local strftime strftime_local string strip strlen
strpntime strpntime_local strptime strptime_local sub substr substr0 substr1
sum sum2 sum3 sum4 sysntime system systime systimeint tan tanh tolower toupper
truncate typeof unflatten unformat unformatx upntime uptime urand urand32
urandelement urandint urandrange utf8_to_latin1 variance version ! != !=~ % &
&& * ** + - . .* .+ .- ./ / // < << <= <=> == =~ > >= >> >>> ?: ?? ??? ^ ^^ |
|| ~
mapsum max maxlen md5 mean meaneb median mexp min minlen mmul mode msub
nsec2gmt nsec2gmtdate nsec2localdate nsec2localtime null_count os percentile
percentiles pow qnorm reduce regextract regextract_or_else rightpad round
roundm rstrip sec2dhms sec2gmt sec2gmtdate sec2hms sec2localdate sec2localtime
select sgn sha1 sha256 sha512 sin sinh skewness sort sort_collection splita
splitax splitkv splitkvx splitnv splitnvx sqrt ssub stddev strfntime
strfntime_local strftime strftime_local string strip strlen strpntime
strpntime_local strptime strptime_local sub substr substr0 substr1 sum sum2
sum3 sum4 sysntime system systime systimeint tan tanh tolower toupper truncate
typeof unflatten unformat unformatx upntime uptime urand urand32 urandelement
urandint urandrange utf8_to_latin1 variance version ! != !=~ % & && * ** + - .
.* .+ .- ./ / // < << <= <=> == =~ > >= >> >>> ?: ?? ??? ^ ^^ | || ~

1mCOMMENTS-IN-DATA FLAGS0m
Miller lets you put comments in your data, such as
Expand Down Expand Up @@ -548,6 +547,7 @@ MILLER(1) MILLER(1)
since direct-to-screen output for large files has its
own overhead.
--no-hash-records See --hash-records.
--norc Do not load a .mlrrc file.
--nr-progress-mod {m} With m a positive integer: print filename and record
count to os.Stderr every m input records.
--ofmt {format} E.g. `%.18f`, `%.0f`, `%9.6e`. Please use
Expand Down Expand Up @@ -2630,16 +2630,6 @@ MILLER(1) MILLER(1)
1mmapsum0m
(class=collections #args=variadic) With 0 args, returns empty map. With >= 1 arg, returns a map with key-value pairs from all arguments. Rightmost collisions win, e.g. 'mapsum({1:2,3:4},{1:5})' is '{1:5,3:4}'.

1mmatch0m
(class=string #args=2) TODO: WRITE ME
Example:
TODO: WRITE ME

1mmatchx0m
(class=string #args=2) TODO: WRITE ME
Example:
TODO: WRITE ME

1mmax0m
(class=math #args=variadic) Max of n numbers; null loses. The min and max functions also recurse into arrays and maps, so they can be used to get min/max stats on array/map values.

Expand Down Expand Up @@ -3639,4 +3629,4 @@ MILLER(1) MILLER(1)



2023-12-16 MILLER(1)
2023-12-19 MILLER(1)
1 change: 1 addition & 0 deletions docs/src/reference-main-flag-list.md
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,7 @@ These are flags which don't fit into any other category.
* `--no-dedupe-field-names`: By default, if an input record has a field named `x` and another also named `x`, the second will be renamed `x_2`, and so on. With this flag provided, the second `x`'s value will replace the first `x`'s value when the record is read. This flag has no effect on JSON input records, where duplicate keys always result in the last one's value being retained.
* `--no-fflush`: Let buffered output not be written after every output record. The default is flush output after every record if the output is to the terminal, or less often if the output is to a file or a pipe. The default is a significant performance optimization for large files. Use this flag to allow less-frequent updates when output is to the terminal. This is unlikely to be a noticeable performance improvement, since direct-to-screen output for large files has its own overhead.
* `--no-hash-records`: See --hash-records.
* `--norc`: Do not load a .mlrrc file.
* `--nr-progress-mod {m}`: With m a positive integer: print filename and record count to os.Stderr every m input records.
* `--ofmt {format}`: E.g. `%.18f`, `%.0f`, `%9.6e`. Please use sprintf-style codes (https://pkg.go.dev/fmt) for floating-point numbers. If not specified, default formatting is used. See also the `fmtnum` function and the `format-values` verb.
* `--ofmte {n}`: Use --ofmte 6 as shorthand for --ofmt %.6e, etc.
Expand Down
40 changes: 40 additions & 0 deletions docs/src/reference-main-regular-expressions.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,46 @@ GENMD-EOF

* Up to nine matches are supported: `\1` through `\9`, while `\0` is the entire match string; `\15` is treated as `\1` followed by an unrelated `5`.

## Resetting captures

If you use `(...)` in your regular expression, then up to 9 matches are supported for the `=~`
operator, and an arbitrary number of matches are supported for the `match` DSL function.

* Before any match is done, `"\1"` etc. in a string evaluate to themselves.
* After a successful match is done, `"\1"` etc. in a string evaluate to the matched substring.
* After an unsuccessful match is done, `"\1"` etc. in a string evaluate to the empty string.
* You can match against `null` to reset to the original state.

GENMD-CARDIFY-HIGHLIGHT-ONE
mlr repl

[mlr] "\1:\2"
"\1:\2"

[mlr] "abc" =~ "..."
true

[mlr] "\1:\2"
":"

[mlr] "abc" =~ "(.).(.)"
true

[mlr] "\1:\2"
"a:c"

[mlr] "abc" =~ "(.)x(.)"
false

[mlr] "\1:\2"
":"

[mlr] "abc" =~ null

[mlr] "\1:\2"
"\1:\2"
GENMD-EOF

## More information

Regular expressions are those supported by the [Go regexp package](https://pkg.go.dev/regexp), which in turn are of type [RE2](https://github.com/google/re2/wiki/Syntax) except for `\C`:
Expand Down
40 changes: 15 additions & 25 deletions man/manpage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ MILLER(1) MILLER(1)
insertion-ordered hash map. This encompasses a variety of data
formats, including but not limited to the familiar CSV, TSV, and JSON.
(Miller can handle positionally-indexed data as a special case.) This
manpage documents mlr 6.10.0.
manpage documents mlr 6.10.0-dev.

1mEXAMPLES0m
mlr --icsv --opprint cat example.csv
Expand Down Expand Up @@ -199,19 +199,18 @@ MILLER(1) MILLER(1)
is_numeric is_present is_string joink joinkv joinv json_parse json_stringify
kurtosis latin1_to_utf8 leafcount leftpad length localtime2gmt localtime2nsec
localtime2sec log log10 log1p logifit lstrip madd mapdiff mapexcept mapselect
mapsum match matchx max maxlen md5 mean meaneb median mexp min minlen mmul
mode msub nsec2gmt nsec2gmtdate nsec2localdate nsec2localtime null_count os
percentile percentiles pow qnorm reduce regextract regextract_or_else rightpad
round roundm rstrip sec2dhms sec2gmt sec2gmtdate sec2hms sec2localdate
sec2localtime select sgn sha1 sha256 sha512 sin sinh skewness sort
sort_collection splita splitax splitkv splitkvx splitnv splitnvx sqrt ssub
stddev strfntime strfntime_local strftime strftime_local string strip strlen
strpntime strpntime_local strptime strptime_local sub substr substr0 substr1
sum sum2 sum3 sum4 sysntime system systime systimeint tan tanh tolower toupper
truncate typeof unflatten unformat unformatx upntime uptime urand urand32
urandelement urandint urandrange utf8_to_latin1 variance version ! != !=~ % &
&& * ** + - . .* .+ .- ./ / // < << <= <=> == =~ > >= >> >>> ?: ?? ??? ^ ^^ |
|| ~
mapsum max maxlen md5 mean meaneb median mexp min minlen mmul mode msub
nsec2gmt nsec2gmtdate nsec2localdate nsec2localtime null_count os percentile
percentiles pow qnorm reduce regextract regextract_or_else rightpad round
roundm rstrip sec2dhms sec2gmt sec2gmtdate sec2hms sec2localdate sec2localtime
select sgn sha1 sha256 sha512 sin sinh skewness sort sort_collection splita
splitax splitkv splitkvx splitnv splitnvx sqrt ssub stddev strfntime
strfntime_local strftime strftime_local string strip strlen strpntime
strpntime_local strptime strptime_local sub substr substr0 substr1 sum sum2
sum3 sum4 sysntime system systime systimeint tan tanh tolower toupper truncate
typeof unflatten unformat unformatx upntime uptime urand urand32 urandelement
urandint urandrange utf8_to_latin1 variance version ! != !=~ % & && * ** + - .
.* .+ .- ./ / // < << <= <=> == =~ > >= >> >>> ?: ?? ??? ^ ^^ | || ~

1mCOMMENTS-IN-DATA FLAGS0m
Miller lets you put comments in your data, such as
Expand Down Expand Up @@ -548,6 +547,7 @@ MILLER(1) MILLER(1)
since direct-to-screen output for large files has its
own overhead.
--no-hash-records See --hash-records.
--norc Do not load a .mlrrc file.
--nr-progress-mod {m} With m a positive integer: print filename and record
count to os.Stderr every m input records.
--ofmt {format} E.g. `%.18f`, `%.0f`, `%9.6e`. Please use
Expand Down Expand Up @@ -2630,16 +2630,6 @@ MILLER(1) MILLER(1)
1mmapsum0m
(class=collections #args=variadic) With 0 args, returns empty map. With >= 1 arg, returns a map with key-value pairs from all arguments. Rightmost collisions win, e.g. 'mapsum({1:2,3:4},{1:5})' is '{1:5,3:4}'.

1mmatch0m
(class=string #args=2) TODO: WRITE ME
Example:
TODO: WRITE ME

1mmatchx0m
(class=string #args=2) TODO: WRITE ME
Example:
TODO: WRITE ME

1mmax0m
(class=math #args=variadic) Max of n numbers; null loses. The min and max functions also recurse into arrays and maps, so they can be used to get min/max stats on array/map values.

Expand Down Expand Up @@ -3639,4 +3629,4 @@ MILLER(1) MILLER(1)



2023-12-16 MILLER(1)
2023-12-19 MILLER(1)
54 changes: 16 additions & 38 deletions man/mlr.1
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@
.\" Title: mlr
.\" Author: [see the "AUTHOR" section]
.\" Generator: ./mkman.rb
.\" Date: 2023-12-16
.\" Date: 2023-12-19
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "MILLER" "1" "2023-12-16" "\ \&" "\ \&"
.TH "MILLER" "1" "2023-12-19" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Portability definitions
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -47,7 +47,7 @@ on integer-indexed fields: if the natural data structure for the latter is the
array, then Miller's natural data structure is the insertion-ordered hash map.
This encompasses a variety of data formats, including but not limited to the
familiar CSV, TSV, and JSON. (Miller can handle positionally-indexed data as
a special case.) This manpage documents mlr 6.10.0.
a special case.) This manpage documents mlr 6.10.0-dev.
.SH "EXAMPLES"
.sp

Expand Down Expand Up @@ -246,19 +246,18 @@ is_nonempty_map is_not_array is_not_empty is_not_map is_not_null is_null
is_numeric is_present is_string joink joinkv joinv json_parse json_stringify
kurtosis latin1_to_utf8 leafcount leftpad length localtime2gmt localtime2nsec
localtime2sec log log10 log1p logifit lstrip madd mapdiff mapexcept mapselect
mapsum match matchx max maxlen md5 mean meaneb median mexp min minlen mmul
mode msub nsec2gmt nsec2gmtdate nsec2localdate nsec2localtime null_count os
percentile percentiles pow qnorm reduce regextract regextract_or_else rightpad
round roundm rstrip sec2dhms sec2gmt sec2gmtdate sec2hms sec2localdate
sec2localtime select sgn sha1 sha256 sha512 sin sinh skewness sort
sort_collection splita splitax splitkv splitkvx splitnv splitnvx sqrt ssub
stddev strfntime strfntime_local strftime strftime_local string strip strlen
strpntime strpntime_local strptime strptime_local sub substr substr0 substr1
sum sum2 sum3 sum4 sysntime system systime systimeint tan tanh tolower toupper
truncate typeof unflatten unformat unformatx upntime uptime urand urand32
urandelement urandint urandrange utf8_to_latin1 variance version ! != !=~ % &
&& * ** + - . .* .+ .- ./ / // < << <= <=> == =~ > >= >> >>> ?: ?? ??? ^ ^^ |
|| ~
mapsum max maxlen md5 mean meaneb median mexp min minlen mmul mode msub
nsec2gmt nsec2gmtdate nsec2localdate nsec2localtime null_count os percentile
percentiles pow qnorm reduce regextract regextract_or_else rightpad round
roundm rstrip sec2dhms sec2gmt sec2gmtdate sec2hms sec2localdate sec2localtime
select sgn sha1 sha256 sha512 sin sinh skewness sort sort_collection splita
splitax splitkv splitkvx splitnv splitnvx sqrt ssub stddev strfntime
strfntime_local strftime strftime_local string strip strlen strpntime
strpntime_local strptime strptime_local sub substr substr0 substr1 sum sum2
sum3 sum4 sysntime system systime systimeint tan tanh tolower toupper truncate
typeof unflatten unformat unformatx upntime uptime urand urand32 urandelement
urandint urandrange utf8_to_latin1 variance version ! != !=~ % & && * ** + - .
\&.* .+ .- ./ / // < << <= <=> == =~ > >= >> >>> ?: ?? ??? ^ ^^ | || ~
.fi
.if n \{\
.RE
Expand Down Expand Up @@ -667,6 +666,7 @@ These are flags which don't fit into any other category.
since direct-to-screen output for large files has its
own overhead.
--no-hash-records See --hash-records.
--norc Do not load a .mlrrc file.
--nr-progress-mod {m} With m a positive integer: print filename and record
count to os.Stderr every m input records.
--ofmt {format} E.g. `%.18f`, `%.0f`, `%9.6e`. Please use
Expand Down Expand Up @@ -3939,28 +3939,6 @@ localtime2sec("2001-02-03 04:05:06", "Asia/Istanbul") = 981165906"
.fi
.if n \{\
.RE
.SS "match"
.if n \{\
.RS 0
.\}
.nf
(class=string #args=2) TODO: WRITE ME
Example:
TODO: WRITE ME
.fi
.if n \{\
.RE
.SS "matchx"
.if n \{\
.RS 0
.\}
.nf
(class=string #args=2) TODO: WRITE ME
Example:
TODO: WRITE ME
.fi
.if n \{\
.RE
.SS "max"
.if n \{\
.RS 0
Expand Down
Loading
Loading