diff --git a/docs/src/manpage.md b/docs/src/manpage.md index a59a96a5cd..bbb6447b17 100644 --- a/docs/src/manpage.md +++ b/docs/src/manpage.md @@ -2170,11 +2170,11 @@ FUNCTIONS FOR FILTER/PUT (class=math #args=1) e**x - 1. flatten - (class=collections #args=2,3) Flattens multi-level maps to single-level ones. Useful for nested JSON-like structures for non-JSON file formats like CSV. + (class=collections #args=2,3) Flattens multi-level maps to single-level ones. Useful for nested JSON-like structures for non-JSON file formats like CSV. With two arguments, the first argument is a map (maybe $*) and the second argument is the flatten separator. With three arguments, the first argument is prefix, the second is the flatten separator, and the third argument is a map, and flatten($*, ".") is the same as flatten("", ".", $*). See "Flatten/unflatten: converting between JSON and tabular formats" at https://miller.readthedocs.io for more information. Examples: + flatten({"a":[1,2],"b":3}, ".") is {"a.1": 1, "a.2": 2, "b": 3}. flatten("a", ".", {"b": { "c": 4 }}) is {"a.b.c" : 4}. flatten("", ".", {"a": { "b": 3 }}) is {"a.b" : 3}. - Two-argument version: flatten($*, ".") is the same as flatten("", ".", $*). float (class=conversion #args=1) Convert int/float/bool/string to float. @@ -2222,7 +2222,7 @@ FUNCTIONS FOR FILTER/PUT gmt2sec("2001-02-03T04:05:06Z") = 981173106 gsub - (class=string #args=3) '$name=gsub($name, "old", "new")' (replace all). + (class=string #args=3) '$name = gsub($name, "old", "new")': replace all, with support for regular expressions. Capture groups \1 through \9 in the new part are matched from (...) in the old part, and must be used within the same call to gsub -- they don't persist for subsequent DSL statements. See also =~ and regextract. See also "Regular expressions" at https://miller.readthedocs.io. Examples: gsub("ababab", "ab", "XY") gives "XYXYXY" gsub("abc.def", ".", "X") gives "XXXXXXX" @@ -2414,10 +2414,16 @@ FUNCTIONS FOR FILTER/PUT Map example: reduce({"a":1, "b":3, "c": 5}, func(acck,accv,ek,ev) {return {"sum_of_squares": accv + ev**2}}) returns {"sum_of_squares": 35}. regextract - (class=string #args=2) '$name=regextract($name, "[A-Z]{3}[0-9]{2}")' + (class=string #args=2) Extracts a substring (the first, if there are multiple matches), matching a regular expression, from the input. Does not use capture groups; see also the =~ operator which does. + Examples: + regextract("index ab09 file", "[a-z][a-z][0-9][0-9]") gives "ab09" + regextract("index a999 file", "[a-z][a-z][0-9][0-9]") gives (absent), which will result in an assignment not happening. regextract_or_else - (class=string #args=3) '$name=regextract_or_else($name, "[A-Z]{3}[0-9]{2}", "default")' + (class=string #args=3) Like regextract but the third argument is the return value in case the input string (first argument) doesn't match the pattern (second argument). + Examples: + regextract_or_else("index ab09 file", "[a-z][a-z][0-9][0-9]", "nonesuch") gives "ab09" + regextract_or_else("index a999 file", "[a-z][a-z][0-9][0-9]", "nonesuch") gives "nonesuch" round (class=math #args=1) Round to nearest integer. @@ -2529,7 +2535,7 @@ FUNCTIONS FOR FILTER/PUT ssub("abc.def", ".", "X") gives "abcXdef" strftime - (class=time #args=2) Formats seconds since the epoch as timestamp. Format strings are as in the C library (please see "man strftime" on your system), with the Miller-specific addition of "%1S" through "%9S" which format the seconds with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.) See also strftime_local. + (class=time #args=2) Formats seconds since the epoch as timestamp. Format strings are mostly as in the C library (see "man strftime" on your system), with the Miller-specific addition of "%1S" through "%9S" which format the seconds with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.) See also strftime_local. See also "DSL datetime/timezone functions" at https://miller.readthedocs.io for more information on the differences from the C library. Examples: strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z" strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z" @@ -2567,7 +2573,7 @@ FUNCTIONS FOR FILTER/PUT strptime_local("2015-08-28 13:33:21", "%Y-%m-%d %H:%M:%S", "Asia/Istanbul") = 1440758001 sub - (class=string #args=3) '$name=sub($name, "old", "new")' (replace once). + (class=string #args=3) '$name = sub($name, "old", "new")': replace once (first match, if there are multiple matches), with support for regular expressions. Capture groups \1 through \9 in the new part are matched from (...) in the old part, and must be used within the same call to sub -- they don't persist for subsequent DSL statements. See also =~ and regextract. See also "Regular expressions" at https://miller.readthedocs.io. Examples: sub("ababab", "ab", "XY") gives "XYabab" sub("abc.def", ".", "X") gives "Xbc.def" @@ -2612,7 +2618,7 @@ FUNCTIONS FOR FILTER/PUT (class=typing #args=1) Convert argument to type of argument (e.g. "str"). For debug. unflatten - (class=collections #args=2) Reverses flatten. Useful for nested JSON-like structures for non-JSON file formats like CSV. See also arrayify. + (class=collections #args=2) Reverses flatten. Useful for nested JSON-like structures for non-JSON file formats like CSV. The first argument is a map, and the second argument is the flatten separator. See also arrayify. See "Flatten/unflatten: converting between JSON and tabular formats" at https://miller.readthedocs.io for more information. Example: unflatten({"a.b.c" : 4}, ".") is {"a": "b": { "c": 4 }}. @@ -2720,7 +2726,10 @@ FUNCTIONS FOR FILTER/PUT (class=boolean #args=2) String/numeric equality. Mixing number and string results in string compare. =~ - (class=boolean #args=2) String (left-hand side) matches regex (right-hand side), e.g. '$name =~ "^a.*b$"'. + (class=boolean #args=2) String (left-hand side) matches regex (right-hand side), e.g. '$name =~ "^a.*b$"'. Capture groups \1 through \9 are matched from (...) in the right-hand side, and can be used within subsequent DSL statements. See also "Regular expressions" at https://miller.readthedocs.io. + Examples: + With if-statement: if ($url =~ "http.*com") { ... } + Without if-statement: given $line = "index ab09 file", and $line =~ "([a-z][a-z])([0-9][0-9])", then $label = "[\1:\2]", $label is "[ab:09]" > (class=boolean #args=2) String/numeric greater-than. Mixing number and string results in string compare. diff --git a/docs/src/manpage.txt b/docs/src/manpage.txt index a595e2ba23..c6fc7a2e61 100644 --- a/docs/src/manpage.txt +++ b/docs/src/manpage.txt @@ -2149,11 +2149,11 @@ FUNCTIONS FOR FILTER/PUT (class=math #args=1) e**x - 1. flatten - (class=collections #args=2,3) Flattens multi-level maps to single-level ones. Useful for nested JSON-like structures for non-JSON file formats like CSV. + (class=collections #args=2,3) Flattens multi-level maps to single-level ones. Useful for nested JSON-like structures for non-JSON file formats like CSV. With two arguments, the first argument is a map (maybe $*) and the second argument is the flatten separator. With three arguments, the first argument is prefix, the second is the flatten separator, and the third argument is a map, and flatten($*, ".") is the same as flatten("", ".", $*). See "Flatten/unflatten: converting between JSON and tabular formats" at https://miller.readthedocs.io for more information. Examples: + flatten({"a":[1,2],"b":3}, ".") is {"a.1": 1, "a.2": 2, "b": 3}. flatten("a", ".", {"b": { "c": 4 }}) is {"a.b.c" : 4}. flatten("", ".", {"a": { "b": 3 }}) is {"a.b" : 3}. - Two-argument version: flatten($*, ".") is the same as flatten("", ".", $*). float (class=conversion #args=1) Convert int/float/bool/string to float. @@ -2201,7 +2201,7 @@ FUNCTIONS FOR FILTER/PUT gmt2sec("2001-02-03T04:05:06Z") = 981173106 gsub - (class=string #args=3) '$name=gsub($name, "old", "new")' (replace all). + (class=string #args=3) '$name = gsub($name, "old", "new")': replace all, with support for regular expressions. Capture groups \1 through \9 in the new part are matched from (...) in the old part, and must be used within the same call to gsub -- they don't persist for subsequent DSL statements. See also =~ and regextract. See also "Regular expressions" at https://miller.readthedocs.io. Examples: gsub("ababab", "ab", "XY") gives "XYXYXY" gsub("abc.def", ".", "X") gives "XXXXXXX" @@ -2393,10 +2393,16 @@ FUNCTIONS FOR FILTER/PUT Map example: reduce({"a":1, "b":3, "c": 5}, func(acck,accv,ek,ev) {return {"sum_of_squares": accv + ev**2}}) returns {"sum_of_squares": 35}. regextract - (class=string #args=2) '$name=regextract($name, "[A-Z]{3}[0-9]{2}")' + (class=string #args=2) Extracts a substring (the first, if there are multiple matches), matching a regular expression, from the input. Does not use capture groups; see also the =~ operator which does. + Examples: + regextract("index ab09 file", "[a-z][a-z][0-9][0-9]") gives "ab09" + regextract("index a999 file", "[a-z][a-z][0-9][0-9]") gives (absent), which will result in an assignment not happening. regextract_or_else - (class=string #args=3) '$name=regextract_or_else($name, "[A-Z]{3}[0-9]{2}", "default")' + (class=string #args=3) Like regextract but the third argument is the return value in case the input string (first argument) doesn't match the pattern (second argument). + Examples: + regextract_or_else("index ab09 file", "[a-z][a-z][0-9][0-9]", "nonesuch") gives "ab09" + regextract_or_else("index a999 file", "[a-z][a-z][0-9][0-9]", "nonesuch") gives "nonesuch" round (class=math #args=1) Round to nearest integer. @@ -2508,7 +2514,7 @@ FUNCTIONS FOR FILTER/PUT ssub("abc.def", ".", "X") gives "abcXdef" strftime - (class=time #args=2) Formats seconds since the epoch as timestamp. Format strings are as in the C library (please see "man strftime" on your system), with the Miller-specific addition of "%1S" through "%9S" which format the seconds with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.) See also strftime_local. + (class=time #args=2) Formats seconds since the epoch as timestamp. Format strings are mostly as in the C library (see "man strftime" on your system), with the Miller-specific addition of "%1S" through "%9S" which format the seconds with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.) See also strftime_local. See also "DSL datetime/timezone functions" at https://miller.readthedocs.io for more information on the differences from the C library. Examples: strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z" strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z" @@ -2546,7 +2552,7 @@ FUNCTIONS FOR FILTER/PUT strptime_local("2015-08-28 13:33:21", "%Y-%m-%d %H:%M:%S", "Asia/Istanbul") = 1440758001 sub - (class=string #args=3) '$name=sub($name, "old", "new")' (replace once). + (class=string #args=3) '$name = sub($name, "old", "new")': replace once (first match, if there are multiple matches), with support for regular expressions. Capture groups \1 through \9 in the new part are matched from (...) in the old part, and must be used within the same call to sub -- they don't persist for subsequent DSL statements. See also =~ and regextract. See also "Regular expressions" at https://miller.readthedocs.io. Examples: sub("ababab", "ab", "XY") gives "XYabab" sub("abc.def", ".", "X") gives "Xbc.def" @@ -2591,7 +2597,7 @@ FUNCTIONS FOR FILTER/PUT (class=typing #args=1) Convert argument to type of argument (e.g. "str"). For debug. unflatten - (class=collections #args=2) Reverses flatten. Useful for nested JSON-like structures for non-JSON file formats like CSV. See also arrayify. + (class=collections #args=2) Reverses flatten. Useful for nested JSON-like structures for non-JSON file formats like CSV. The first argument is a map, and the second argument is the flatten separator. See also arrayify. See "Flatten/unflatten: converting between JSON and tabular formats" at https://miller.readthedocs.io for more information. Example: unflatten({"a.b.c" : 4}, ".") is {"a": "b": { "c": 4 }}. @@ -2699,7 +2705,10 @@ FUNCTIONS FOR FILTER/PUT (class=boolean #args=2) String/numeric equality. Mixing number and string results in string compare. =~ - (class=boolean #args=2) String (left-hand side) matches regex (right-hand side), e.g. '$name =~ "^a.*b$"'. + (class=boolean #args=2) String (left-hand side) matches regex (right-hand side), e.g. '$name =~ "^a.*b$"'. Capture groups \1 through \9 are matched from (...) in the right-hand side, and can be used within subsequent DSL statements. See also "Regular expressions" at https://miller.readthedocs.io. + Examples: + With if-statement: if ($url =~ "http.*com") { ... } + Without if-statement: given $line = "index ab09 file", and $line =~ "([a-z][a-z])([0-9][0-9])", then $label = "[\1:\2]", $label is "[ab:09]" > (class=boolean #args=2) String/numeric greater-than. Mixing number and string results in string compare. diff --git a/docs/src/reference-dsl-builtin-functions.md b/docs/src/reference-dsl-builtin-functions.md index 8eef038d8b..533198b4ff 100644 --- a/docs/src/reference-dsl-builtin-functions.md +++ b/docs/src/reference-dsl-builtin-functions.md @@ -304,7 +304,10 @@ pow (class=arithmetic #args=2) Exponentiation. Same as **, but as a function. ### =~
-=~ (class=boolean #args=2) String (left-hand side) matches regex (right-hand side), e.g. '$name =~ "^a.*b$"'. +=~ (class=boolean #args=2) String (left-hand side) matches regex (right-hand side), e.g. '$name =~ "^a.*b$"'. Capture groups \1 through \9 are matched from (...) in the right-hand side, and can be used within subsequent DSL statements. See also "Regular expressions" at https://miller.readthedocs.io. +Examples: +With if-statement: if ($url =~ "http.*com") { ... } +Without if-statement: given $line = "index ab09 file", and $line =~ "([a-z][a-z])([0-9][0-9])", then $label = "[\1:\2]", $label is "[ab:09]"@@ -389,11 +392,11 @@ depth (class=collections #args=1) Prints maximum depth of map/array. Scalars ha ### flatten
-flatten (class=collections #args=2,3) Flattens multi-level maps to single-level ones. Useful for nested JSON-like structures for non-JSON file formats like CSV. +flatten (class=collections #args=2,3) Flattens multi-level maps to single-level ones. Useful for nested JSON-like structures for non-JSON file formats like CSV. With two arguments, the first argument is a map (maybe $*) and the second argument is the flatten separator. With three arguments, the first argument is prefix, the second is the flatten separator, and the third argument is a map, and flatten($*, ".") is the same as flatten("", ".", $*). See "Flatten/unflatten: converting between JSON and tabular formats" at https://miller.readthedocs.io for more information. Examples: +flatten({"a":[1,2],"b":3}, ".") is {"a.1": 1, "a.2": 2, "b": 3}. flatten("a", ".", {"b": { "c": 4 }}) is {"a.b.c" : 4}. flatten("", ".", {"a": { "b": 3 }}) is {"a.b" : 3}. -Two-argument version: flatten($*, ".") is the same as flatten("", ".", $*).@@ -465,7 +468,7 @@ mapsum (class=collections #args=variadic) With 0 args, returns empty map. With ### unflatten
-unflatten (class=collections #args=2) Reverses flatten. Useful for nested JSON-like structures for non-JSON file formats like CSV. See also arrayify. +unflatten (class=collections #args=2) Reverses flatten. Useful for nested JSON-like structures for non-JSON file formats like CSV. The first argument is a map, and the second argument is the flatten separator. See also arrayify. See "Flatten/unflatten: converting between JSON and tabular formats" at https://miller.readthedocs.io for more information. Example: unflatten({"a.b.c" : 4}, ".") is {"a": "b": { "c": 4 }}.@@ -939,7 +942,7 @@ format("{}:{}:{}", 1,2,3,4) gives "1:2:3". ### gsub
-gsub (class=string #args=3) '$name=gsub($name, "old", "new")' (replace all). +gsub (class=string #args=3) '$name = gsub($name, "old", "new")': replace all, with support for regular expressions. Capture groups \1 through \9 in the new part are matched from (...) in the old part, and must be used within the same call to gsub -- they don't persist for subsequent DSL statements. See also =~ and regextract. See also "Regular expressions" at https://miller.readthedocs.io. Examples: gsub("ababab", "ab", "XY") gives "XYXYXY" gsub("abc.def", ".", "X") gives "XXXXXXX" @@ -957,13 +960,19 @@ lstrip (class=string #args=1) Strip leading whitespace from string. ### regextract-regextract (class=string #args=2) '$name=regextract($name, "[A-Z]{3}[0-9]{2}")' +regextract (class=string #args=2) Extracts a substring (the first, if there are multiple matches), matching a regular expression, from the input. Does not use capture groups; see also the =~ operator which does. +Examples: +regextract("index ab09 file", "[a-z][a-z][0-9][0-9]") gives "ab09" +regextract("index a999 file", "[a-z][a-z][0-9][0-9]") gives (absent), which will result in an assignment not happening.### regextract_or_else-regextract_or_else (class=string #args=3) '$name=regextract_or_else($name, "[A-Z]{3}[0-9]{2}", "default")' +regextract_or_else (class=string #args=3) Like regextract but the third argument is the return value in case the input string (first argument) doesn't match the pattern (second argument). +Examples: +regextract_or_else("index ab09 file", "[a-z][a-z][0-9][0-9]", "nonesuch") gives "ab09" +regextract_or_else("index a999 file", "[a-z][a-z][0-9][0-9]", "nonesuch") gives "nonesuch"@@ -995,7 +1004,7 @@ strlen (class=string #args=1) String length. ### sub-sub (class=string #args=3) '$name=sub($name, "old", "new")' (replace once). +sub (class=string #args=3) '$name = sub($name, "old", "new")': replace once (first match, if there are multiple matches), with support for regular expressions. Capture groups \1 through \9 in the new part are matched from (...) in the old part, and must be used within the same call to sub -- they don't persist for subsequent DSL statements. See also =~ and regextract. See also "Regular expressions" at https://miller.readthedocs.io. Examples: sub("ababab", "ab", "XY") gives "XYabab" sub("abc.def", ".", "X") gives "Xbc.def" @@ -1219,7 +1228,7 @@ sec2localtime(1234567890.123456, 6, "Asia/Istanbul") = "2009-02-14 01:31:30.1234 ### strftime-strftime (class=time #args=2) Formats seconds since the epoch as timestamp. Format strings are as in the C library (please see "man strftime" on your system), with the Miller-specific addition of "%1S" through "%9S" which format the seconds with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.) See also strftime_local. +strftime (class=time #args=2) Formats seconds since the epoch as timestamp. Format strings are mostly as in the C library (see "man strftime" on your system), with the Miller-specific addition of "%1S" through "%9S" which format the seconds with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.) See also strftime_local. See also "DSL datetime/timezone functions" at https://miller.readthedocs.io for more information on the differences from the C library. Examples: strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z" strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z" diff --git a/docs/src/reference-dsl-time.md b/docs/src/reference-dsl-time.md index e1ca1a6745..c20e453c83 100644 --- a/docs/src/reference-dsl-time.md +++ b/docs/src/reference-dsl-time.md @@ -332,11 +332,11 @@ Thursday, January 1, 1970 You can get the seconds since the Miller process start using [uptime](reference-dsl-builtin-functions.md#uptime): --color shape flag k index quantity rate u +mlr --c2p --from example.csv put '$u=uptime()'+color shape flag k index quantity rate u yellow triangle true 1 11 43.6498 9.8870 0.0011110305786132812 red square true 2 15 79.2778 0.0130 0.0011241436004638672 red circle true 3 16 13.8103 2.9010 0.0011250972747802734 diff --git a/docs/src/reference-dsl-time.md.in b/docs/src/reference-dsl-time.md.in index 7646cc3ce2..1e35084c41 100644 --- a/docs/src/reference-dsl-time.md.in +++ b/docs/src/reference-dsl-time.md.in @@ -235,8 +235,8 @@ GENMD-EOF You can get the seconds since the Miller process start using [uptime](reference-dsl-builtin-functions.md#uptime): - GENMD-CARDIFY-HIGHLIGHT-ONE +mlr --c2p --from example.csv put '$u=uptime()' color shape flag k index quantity rate u yellow triangle true 1 11 43.6498 9.8870 0.0011110305786132812 red square true 2 15 79.2778 0.0130 0.0011241436004638672 diff --git a/docs/src/reference-main-regular-expressions.md b/docs/src/reference-main-regular-expressions.md index 75cfa4593b..0df53e0595 100644 --- a/docs/src/reference-main-regular-expressions.md +++ b/docs/src/reference-main-regular-expressions.md @@ -20,6 +20,8 @@ Miller lets you use regular expressions (of the [types accepted by Go](https://p * In `mlr filter` with `=~` or `!=~`, e.g. `mlr filter '$url =~ "http.*com"'` +* In `mlr put` with `regextract`, e.g. `mlr put '$output = regextract($input, "[a-z][a-z][0-9][0-9]")` + * In `mlr put` with `sub` or `gsub`, e.g. `mlr put '$url = sub($url, "http.*com", "")'` * In `mlr having-fields`, e.g. `mlr having-fields --any-matching '^sda[0-9]'` diff --git a/docs/src/reference-main-regular-expressions.md.in b/docs/src/reference-main-regular-expressions.md.in index 6272a4b0c1..e81f245528 100644 --- a/docs/src/reference-main-regular-expressions.md.in +++ b/docs/src/reference-main-regular-expressions.md.in @@ -4,6 +4,8 @@ Miller lets you use regular expressions (of the [types accepted by Go](https://p * In `mlr filter` with `=~` or `!=~`, e.g. `mlr filter '$url =~ "http.*com"'` +* In `mlr put` with `regextract`, e.g. `mlr put '$output = regextract($input, "[a-z][a-z][0-9][0-9]")` + * In `mlr put` with `sub` or `gsub`, e.g. `mlr put '$url = sub($url, "http.*com", "")'` * In `mlr having-fields`, e.g. `mlr having-fields --any-matching '^sda[0-9]'` diff --git a/internal/pkg/dsl/cst/builtin_function_manager.go b/internal/pkg/dsl/cst/builtin_function_manager.go index de74392500..407df3cdcf 100644 --- a/internal/pkg/dsl/cst/builtin_function_manager.go +++ b/internal/pkg/dsl/cst/builtin_function_manager.go @@ -317,9 +317,16 @@ func makeBuiltinFunctionLookupTable() []BuiltinFunctionInfo { }, { - name: "=~", - class: FUNC_CLASS_BOOLEAN, - help: `String (left-hand side) matches regex (right-hand side), e.g. '$name =~ "^a.*b$"'.`, + name: "=~", + class: FUNC_CLASS_BOOLEAN, + help: `String (left-hand side) matches regex (right-hand side), e.g. +'$name =~ "^a.*b$"'. +Capture groups \1 through \9 are matched from (...) in the right-hand side, and can be +used within subsequent DSL statements. See also "Regular expressions" at ` + lib.DOC_URL + `.`, + examples: []string{ + `With if-statement: if ($url =~ "http.*com") { ... }`, + `Without if-statement: given $line = "index ab09 file", and $line =~ "([a-z][a-z])([0-9][0-9])", then $label = "[\1:\2]", $label is "[ab:09]"`, + }, regexCaptureBinaryFunc: bifs.BIF_string_matches_regexp, }, @@ -411,17 +418,27 @@ func makeBuiltinFunctionLookupTable() []BuiltinFunctionInfo { }, { - name: "regextract", - class: FUNC_CLASS_STRING, - help: `'$name=regextract($name, "[A-Z]{3}[0-9]{2}")'`, + name: "regextract", + class: FUNC_CLASS_STRING, + help: `Extracts a substring (the first, if there are multiple matches), matching a +regular expression, from the input. Does not use capture groups; see also the =~ operator which does.`, binaryFunc: bifs.BIF_regextract, + examples: []string{ + `regextract("index ab09 file", "[a-z][a-z][0-9][0-9]") gives "ab09"`, + `regextract("index a999 file", "[a-z][a-z][0-9][0-9]") gives (absent), which will result in an assignment not happening.`, + }, }, { - name: "regextract_or_else", - class: FUNC_CLASS_STRING, - help: `'$name=regextract_or_else($name, "[A-Z]{3}[0-9]{2}", "default")'`, + name: "regextract_or_else", + class: FUNC_CLASS_STRING, + help: `Like regextract but the third argument is the return value in case the input string (first +argument) doesn't match the pattern (second argument).`, ternaryFunc: bifs.BIF_regextract_or_else, + examples: []string{ + `regextract_or_else("index ab09 file", "[a-z][a-z][0-9][0-9]", "nonesuch") gives "ab09"`, + `regextract_or_else("index a999 file", "[a-z][a-z][0-9][0-9]", "nonesuch") gives "nonesuch"`, + }, }, { @@ -456,9 +473,12 @@ func makeBuiltinFunctionLookupTable() []BuiltinFunctionInfo { }, { - name: "sub", - class: FUNC_CLASS_STRING, - help: `'$name=sub($name, "old", "new")' (replace once).`, + name: "sub", + class: FUNC_CLASS_STRING, + help: `'$name = sub($name, "old", "new")': replace once (first match, if there are multiple matches), +with support for regular expressions. Capture groups \1 through \9 in the new part are matched from (...) in +the old part, and must be used within the same call to sub -- they don't persist for subsequent DSL +statements. See also =~ and regextract. See also "Regular expressions" at ` + lib.DOC_URL + `.`, ternaryFunc: bifs.BIF_sub, examples: []string{ `sub("ababab", "ab", "XY") gives "XYabab"`, @@ -470,9 +490,12 @@ func makeBuiltinFunctionLookupTable() []BuiltinFunctionInfo { }, { - name: "gsub", - class: FUNC_CLASS_STRING, - help: `'$name=gsub($name, "old", "new")' (replace all).`, + name: "gsub", + class: FUNC_CLASS_STRING, + help: `'$name = gsub($name, "old", "new")': replace all, with support for regular expressions. +Capture groups \1 through \9 in the new part are matched from (...) in the old part, and must be +used within the same call to gsub -- they don't persist for subsequent DSL statements. See also +=~ and regextract. See also "Regular expressions" at ` + lib.DOC_URL + `.`, ternaryFunc: bifs.BIF_gsub, examples: []string{ `gsub("ababab", "ab", "XY") gives "XYXYXY"`, @@ -983,10 +1006,11 @@ is supplied.`, { name: "strftime", class: FUNC_CLASS_TIME, - help: `Formats seconds since the epoch as timestamp. Format strings are as in the C library -(please see "man strftime" on your system), with the Miller-specific addition of "%1S" through "%9S" which + help: `Formats seconds since the epoch as timestamp. Format strings are mostly as in the C library +(see "man strftime" on your system), with the Miller-specific addition of "%1S" through "%9S" which format the seconds with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.) See also -strftime_local.`, +strftime_local. See also "DSL datetime/timezone functions" at ` + lib.DOC_URL + ` for more information on the +differences from the C library.`, examples: []string{ `strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z"`, `strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z"`, @@ -1574,17 +1598,34 @@ single-element arrays.`, name: "flatten", class: FUNC_CLASS_COLLECTIONS, help: `Flattens multi-level maps to single-level ones. Useful for nested JSON-like structures -for non-JSON file formats like CSV.`, +for non-JSON file formats like CSV. With two arguments, the first argument is a map (maybe $*) and +the second argument is the flatten separator. With three arguments, the first argument is prefix, +the second is the flatten separator, and the third argument is a map, and flatten($*, ".") is the +same as flatten("", ".", $*). See "Flatten/unflatten: converting between JSON and tabular formats" +at ` + lib.DOC_URL + ` for more information.`, examples: []string{ + `flatten({"a":[1,2],"b":3}, ".") is {"a.1": 1, "a.2": 2, "b": 3}.`, `flatten("a", ".", {"b": { "c": 4 }}) is {"a.b.c" : 4}.`, `flatten("", ".", {"a": { "b": 3 }}) is {"a.b" : 3}.`, - `Two-argument version: flatten($*, ".") is the same as flatten("", ".", $*).`, }, binaryFunc: bifs.BIF_flatten_binary, ternaryFunc: bifs.BIF_flatten, hasMultipleArities: true, }, + { + name: "unflatten", + class: FUNC_CLASS_COLLECTIONS, + help: `Reverses flatten. Useful for nested JSON-like structures for non-JSON file formats like CSV. +The first argument is a map, and the second argument is the flatten separator. See also arrayify. +See "Flatten/unflatten: converting between JSON and tabular formats" at ` + lib.DOC_URL + ` for more +information.`, + examples: []string{ + `unflatten({"a.b.c" : 4}, ".") is {"a": "b": { "c": 4 }}.`, + }, + binaryFunc: bifs.BIF_unflatten, + }, + { name: "get_keys", class: FUNC_CLASS_COLLECTIONS, @@ -1674,17 +1715,6 @@ from all arguments. Rightmost collisions win, e.g. 'mapsum({1:2,3:4},{1:5})' is variadicFunc: bifs.BIF_mapsum, }, - { - name: "unflatten", - class: FUNC_CLASS_COLLECTIONS, - help: `Reverses flatten. Useful for nested JSON-like structures for non-JSON file formats like CSV. -See also arrayify.`, - examples: []string{ - `unflatten({"a.b.c" : 4}, ".") is {"a": "b": { "c": 4 }}.`, - }, - binaryFunc: bifs.BIF_unflatten, - }, - // ---------------------------------------------------------------- // FUNC_CLASS_HOFS diff --git a/man/manpage.txt b/man/manpage.txt index a595e2ba23..c6fc7a2e61 100644 --- a/man/manpage.txt +++ b/man/manpage.txt @@ -2149,11 +2149,11 @@ FUNCTIONS FOR FILTER/PUT (class=math #args=1) e**x - 1. flatten - (class=collections #args=2,3) Flattens multi-level maps to single-level ones. Useful for nested JSON-like structures for non-JSON file formats like CSV. + (class=collections #args=2,3) Flattens multi-level maps to single-level ones. Useful for nested JSON-like structures for non-JSON file formats like CSV. With two arguments, the first argument is a map (maybe $*) and the second argument is the flatten separator. With three arguments, the first argument is prefix, the second is the flatten separator, and the third argument is a map, and flatten($*, ".") is the same as flatten("", ".", $*). See "Flatten/unflatten: converting between JSON and tabular formats" at https://miller.readthedocs.io for more information. Examples: + flatten({"a":[1,2],"b":3}, ".") is {"a.1": 1, "a.2": 2, "b": 3}. flatten("a", ".", {"b": { "c": 4 }}) is {"a.b.c" : 4}. flatten("", ".", {"a": { "b": 3 }}) is {"a.b" : 3}. - Two-argument version: flatten($*, ".") is the same as flatten("", ".", $*). float (class=conversion #args=1) Convert int/float/bool/string to float. @@ -2201,7 +2201,7 @@ FUNCTIONS FOR FILTER/PUT gmt2sec("2001-02-03T04:05:06Z") = 981173106 gsub - (class=string #args=3) '$name=gsub($name, "old", "new")' (replace all). + (class=string #args=3) '$name = gsub($name, "old", "new")': replace all, with support for regular expressions. Capture groups \1 through \9 in the new part are matched from (...) in the old part, and must be used within the same call to gsub -- they don't persist for subsequent DSL statements. See also =~ and regextract. See also "Regular expressions" at https://miller.readthedocs.io. Examples: gsub("ababab", "ab", "XY") gives "XYXYXY" gsub("abc.def", ".", "X") gives "XXXXXXX" @@ -2393,10 +2393,16 @@ FUNCTIONS FOR FILTER/PUT Map example: reduce({"a":1, "b":3, "c": 5}, func(acck,accv,ek,ev) {return {"sum_of_squares": accv + ev**2}}) returns {"sum_of_squares": 35}. regextract - (class=string #args=2) '$name=regextract($name, "[A-Z]{3}[0-9]{2}")' + (class=string #args=2) Extracts a substring (the first, if there are multiple matches), matching a regular expression, from the input. Does not use capture groups; see also the =~ operator which does. + Examples: + regextract("index ab09 file", "[a-z][a-z][0-9][0-9]") gives "ab09" + regextract("index a999 file", "[a-z][a-z][0-9][0-9]") gives (absent), which will result in an assignment not happening. regextract_or_else - (class=string #args=3) '$name=regextract_or_else($name, "[A-Z]{3}[0-9]{2}", "default")' + (class=string #args=3) Like regextract but the third argument is the return value in case the input string (first argument) doesn't match the pattern (second argument). + Examples: + regextract_or_else("index ab09 file", "[a-z][a-z][0-9][0-9]", "nonesuch") gives "ab09" + regextract_or_else("index a999 file", "[a-z][a-z][0-9][0-9]", "nonesuch") gives "nonesuch" round (class=math #args=1) Round to nearest integer. @@ -2508,7 +2514,7 @@ FUNCTIONS FOR FILTER/PUT ssub("abc.def", ".", "X") gives "abcXdef" strftime - (class=time #args=2) Formats seconds since the epoch as timestamp. Format strings are as in the C library (please see "man strftime" on your system), with the Miller-specific addition of "%1S" through "%9S" which format the seconds with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.) See also strftime_local. + (class=time #args=2) Formats seconds since the epoch as timestamp. Format strings are mostly as in the C library (see "man strftime" on your system), with the Miller-specific addition of "%1S" through "%9S" which format the seconds with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.) See also strftime_local. See also "DSL datetime/timezone functions" at https://miller.readthedocs.io for more information on the differences from the C library. Examples: strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z" strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z" @@ -2546,7 +2552,7 @@ FUNCTIONS FOR FILTER/PUT strptime_local("2015-08-28 13:33:21", "%Y-%m-%d %H:%M:%S", "Asia/Istanbul") = 1440758001 sub - (class=string #args=3) '$name=sub($name, "old", "new")' (replace once). + (class=string #args=3) '$name = sub($name, "old", "new")': replace once (first match, if there are multiple matches), with support for regular expressions. Capture groups \1 through \9 in the new part are matched from (...) in the old part, and must be used within the same call to sub -- they don't persist for subsequent DSL statements. See also =~ and regextract. See also "Regular expressions" at https://miller.readthedocs.io. Examples: sub("ababab", "ab", "XY") gives "XYabab" sub("abc.def", ".", "X") gives "Xbc.def" @@ -2591,7 +2597,7 @@ FUNCTIONS FOR FILTER/PUT (class=typing #args=1) Convert argument to type of argument (e.g. "str"). For debug. unflatten - (class=collections #args=2) Reverses flatten. Useful for nested JSON-like structures for non-JSON file formats like CSV. See also arrayify. + (class=collections #args=2) Reverses flatten. Useful for nested JSON-like structures for non-JSON file formats like CSV. The first argument is a map, and the second argument is the flatten separator. See also arrayify. See "Flatten/unflatten: converting between JSON and tabular formats" at https://miller.readthedocs.io for more information. Example: unflatten({"a.b.c" : 4}, ".") is {"a": "b": { "c": 4 }}. @@ -2699,7 +2705,10 @@ FUNCTIONS FOR FILTER/PUT (class=boolean #args=2) String/numeric equality. Mixing number and string results in string compare. =~ - (class=boolean #args=2) String (left-hand side) matches regex (right-hand side), e.g. '$name =~ "^a.*b$"'. + (class=boolean #args=2) String (left-hand side) matches regex (right-hand side), e.g. '$name =~ "^a.*b$"'. Capture groups \1 through \9 are matched from (...) in the right-hand side, and can be used within subsequent DSL statements. See also "Regular expressions" at https://miller.readthedocs.io. + Examples: + With if-statement: if ($url =~ "http.*com") { ... } + Without if-statement: given $line = "index ab09 file", and $line =~ "([a-z][a-z])([0-9][0-9])", then $label = "[\1:\2]", $label is "[ab:09]" > (class=boolean #args=2) String/numeric greater-than. Mixing number and string results in string compare. diff --git a/man/mlr.1 b/man/mlr.1 index a92c084be5..19ed125202 100644 --- a/man/mlr.1 +++ b/man/mlr.1 @@ -2974,11 +2974,11 @@ Map example: every({"a": "foo", "b": "bar"}, func(k,v) {return $[k] == v}) .RS 0 .\} .nf - (class=collections #args=2,3) Flattens multi-level maps to single-level ones. Useful for nested JSON-like structures for non-JSON file formats like CSV. + (class=collections #args=2,3) Flattens multi-level maps to single-level ones. Useful for nested JSON-like structures for non-JSON file formats like CSV. With two arguments, the first argument is a map (maybe $*) and the second argument is the flatten separator. With three arguments, the first argument is prefix, the second is the flatten separator, and the third argument is a map, and flatten($*, ".") is the same as flatten("", ".", $*). See "Flatten/unflatten: converting between JSON and tabular formats" at https://miller.readthedocs.io for more information. Examples: +flatten({"a":[1,2],"b":3}, ".") is {"a.1": 1, "a.2": 2, "b": 3}. flatten("a", ".", {"b": { "c": 4 }}) is {"a.b.c" : 4}. flatten("", ".", {"a": { "b": 3 }}) is {"a.b" : 3}. -Two-argument version: flatten($*, ".") is the same as flatten("", ".", $*). .fi .if n \{\ .RE @@ -3098,7 +3098,7 @@ gmt2sec("2001-02-03T04:05:06Z") = 981173106 .RS 0 .\} .nf - (class=string #args=3) '$name=gsub($name, "old", "new")' (replace all). + (class=string #args=3) '$name = gsub($name, "old", "new")': replace all, with support for regular expressions. Capture groups \e1 through \e9 in the new part are matched from (...) in the old part, and must be used within the same call to gsub -- they don't persist for subsequent DSL statements. See also =~ and regextract. See also "Regular expressions" at https://miller.readthedocs.io. Examples: gsub("ababab", "ab", "XY") gives "XYXYXY" gsub("abc.def", ".", "X") gives "XXXXXXX" @@ -3626,7 +3626,10 @@ Map example: reduce({"a":1, "b":3, "c": 5}, func(acck,accv,ek,ev) {return {"sum_ .RS 0 .\} .nf - (class=string #args=2) '$name=regextract($name, "[A-Z]{3}[0-9]{2}")' + (class=string #args=2) Extracts a substring (the first, if there are multiple matches), matching a regular expression, from the input. Does not use capture groups; see also the =~ operator which does. +Examples: +regextract("index ab09 file", "[a-z][a-z][0-9][0-9]") gives "ab09" +regextract("index a999 file", "[a-z][a-z][0-9][0-9]") gives (absent), which will result in an assignment not happening. .fi .if n \{\ .RE @@ -3635,7 +3638,10 @@ Map example: reduce({"a":1, "b":3, "c": 5}, func(acck,accv,ek,ev) {return {"sum_ .RS 0 .\} .nf - (class=string #args=3) '$name=regextract_or_else($name, "[A-Z]{3}[0-9]{2}", "default")' + (class=string #args=3) Like regextract but the third argument is the return value in case the input string (first argument) doesn't match the pattern (second argument). +Examples: +regextract_or_else("index ab09 file", "[a-z][a-z][0-9][0-9]", "nonesuch") gives "ab09" +regextract_or_else("index a999 file", "[a-z][a-z][0-9][0-9]", "nonesuch") gives "nonesuch" .fi .if n \{\ .RE @@ -3903,7 +3909,7 @@ ssub("abc.def", ".", "X") gives "abcXdef" .RS 0 .\} .nf - (class=time #args=2) Formats seconds since the epoch as timestamp. Format strings are as in the C library (please see "man strftime" on your system), with the Miller-specific addition of "%1S" through "%9S" which format the seconds with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.) See also strftime_local. + (class=time #args=2) Formats seconds since the epoch as timestamp. Format strings are mostly as in the C library (see "man strftime" on your system), with the Miller-specific addition of "%1S" through "%9S" which format the seconds with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.) See also strftime_local. See also "DSL datetime/timezone functions" at https://miller.readthedocs.io for more information on the differences from the C library. Examples: strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z" strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z" @@ -3983,7 +3989,7 @@ strptime_local("2015-08-28 13:33:21", "%Y-%m-%d %H:%M:%S", "Asia/Istanbul") .RS 0 .\} .nf - (class=string #args=3) '$name=sub($name, "old", "new")' (replace once). + (class=string #args=3) '$name = sub($name, "old", "new")': replace once (first match, if there are multiple matches), with support for regular expressions. Capture groups \e1 through \e9 in the new part are matched from (...) in the old part, and must be used within the same call to sub -- they don't persist for subsequent DSL statements. See also =~ and regextract. See also "Regular expressions" at https://miller.readthedocs.io. Examples: sub("ababab", "ab", "XY") gives "XYabab" sub("abc.def", ".", "X") gives "Xbc.def" @@ -4106,7 +4112,7 @@ sub("prefix4529:suffix8567", "suffix([0-9]+)", "name\e1") gives "prefix4529:name .RS 0 .\} .nf - (class=collections #args=2) Reverses flatten. Useful for nested JSON-like structures for non-JSON file formats like CSV. See also arrayify. + (class=collections #args=2) Reverses flatten. Useful for nested JSON-like structures for non-JSON file formats like CSV. The first argument is a map, and the second argument is the flatten separator. See also arrayify. See "Flatten/unflatten: converting between JSON and tabular formats" at https://miller.readthedocs.io for more information. Example: unflatten({"a.b.c" : 4}, ".") is {"a": "b": { "c": 4 }}. .fi @@ -4406,7 +4412,10 @@ Int-valued example: '$n=floor(20+urand()*11)'. .RS 0 .\} .nf - (class=boolean #args=2) String (left-hand side) matches regex (right-hand side), e.g. '$name =~ "^a.*b$"'. + (class=boolean #args=2) String (left-hand side) matches regex (right-hand side), e.g. '$name =~ "^a.*b$"'. Capture groups \e1 through \e9 are matched from (...) in the right-hand side, and can be used within subsequent DSL statements. See also "Regular expressions" at https://miller.readthedocs.io. +Examples: +With if-statement: if ($url =~ "http.*com") { ... } +Without if-statement: given $line = "index ab09 file", and $line =~ "([a-z][a-z])([0-9][0-9])", then $label = "[\e1:\e2]", $label is "[ab:09]" .fi .if n \{\ .RE diff --git a/todo.txt b/todo.txt index 41ad4345f6..7f9369b2a0 100644 --- a/todo.txt +++ b/todo.txt @@ -19,6 +19,17 @@ RELEASES * plan 6.2.0 ? YAML +================================================================ +908 + +* is_not_empty (class=typing #args=1) False if field is present in input with empty value, true + otherwise -> That means that and absent value will return true. While technically it makes sense, + many people will ask if "is not empty" to get the green flag before processing the value, which will + be nonexistent if the field is absent. Would it make sense to consider absent fields as empty ones? + Probably you already though about those concerns, but just in case. + + -> engage in issue + ================================================================ FEATURES