Skip to content

Releases: johnkerl/miller

Allow scientific notation in DSL literals; mlr bar --auto

27 Nov 02:46
Compare
Choose a tag to compare
  • Miller has always supported scientific notation in field values, e.g x=1e6. However, it had never supported scientific notation in DSL literals, e.g. mlr put '$y = $x + 1e6. This release fixes that.
  • Additionally, mlr bar now has a ---auto flag which holds all records in memory and computes limits from the data, so you don't have to compute them separately and pass them in via --lo and --hi.

Integer and float arithmetic, improved documentation, minor feature enhancements

24 Nov 03:46
Compare
Choose a tag to compare

Integer/float arithmetic

The key feature of the 3.0.0 release, and the reason for the major version increment, is that previously all numbers were scanned into mlr put and mlr filter functions as floating-point -- then, only recast to integer as necessary for integer operations. Since IEEE doubles have 53 bits of precision (52 mantissa bits along with implicit leading one) while 64-bit integers have 64, this meant that full 64-bit integer signficance could not be passed through Miller functions.

As of the 3.0.0 release, numbers in Miller are int (64 bits) or float (double-precision). Numbers scannable as integers are treated as integers. The sum, difference, and product of two integers is another integer -- except when overflow would occur, at which point a floating-point result is produced. Integer division is pythonic, namely, 7/2 is 3.5, and 7//2 is 3. Mixed integer/float operations produce float. Bitwise operators are now supported.

You now have more control over arithmetic, not less. The only real compatibility change is that some numbers will now be printing like 123 rather than 123.0000.

For full details please see http://johnkerl.org/miller/doc/reference.html#Arithmetic.

New functions for filter and put

  • Since integers are now fully supported in mlr put and mlr filter, it is now possible to have the bitwise operators | ^ & << >>. These operate on 64-bit integers and produce 64-bit-integer results.
  • Modular arithmetic is implemented by madd, msub, mmul, and mexp.
  • urandint and urand32 are in addition to the existing urand.
  • sgn complements abs.
  • strftime and strptime are generalizations of sec2gmt and gmt2sec. There are pass-throughs to system strftime and strptime; see your local manpages for available time-formatting options.
  • Please see http://johnkerl.org/miller/doc/reference.html#Functions_for_filter_and_put for more information.

Verbs

I/O options

  • mlr --xvright for XTAB output
  • mlr --headerless-csv-output for CSV/CSV-lite output

Documentation

Iterative stats, exclude-filter, implicit-CSV-header, and other features

27 Oct 01:55
Compare
Choose a tag to compare
  • mlr stats1 and stats2 now support a -s feature in which means, linear regressions, etc. evolve record-by-record as new records appear over time. This is particularly useful in tail -f contexts. See also http://johnkerl.org/miller/doc/reference.html#stats1 and http://johnkerl.org/miller/doc/reference.html#stats2.
  • mlr filter now supports a -x flag to negate the sense of the filter: instead of editing logic expressions e.g. from mlr filter '$x < 10 || $x > 20' to mlr filter '$x >= 10 && $x <= 20', you can simply do mlr filter -x '$x < 10 || $x > 20'. See also http://johnkerl.org/miller/doc/reference.html#filter.
  • In the event a CSV file lacks header lines, you can use mlr --implicit-csv-header to add positional header 1,2,3,.... You can also convert those to desired text using mlr label. See also http://johnkerl.org/miller/doc/reference.html#label.
  • Heterogeneity support is improved for sort, stats1, stats2, step, head, tail, top, sample, uniq, and count-distinct. See also #79.
  • mlr stats2 now has a logistic-regression feature, but I recommend treating it as experimental until some numerical-stability issues involving my naïve Newton-Raphson solver are worked out -- namely, it doesn't converge in all cases.

http://johnkerl.org/miller/releases/miller-2.3.2/doc/

Bug fix for mlr top -a

19 Oct 03:49
Compare
Choose a tag to compare

Memory management was incorrect in mlr top -a.

Regex support, gsub, reservoir sampling, iterative stats, and other features

17 Oct 23:05
Compare
Choose a tag to compare

Regex support

gsub function

In addition to the existing sub function: replace-all in addition to replace-once. Includes regex support.
http://johnkerl.org/miller/doc/reference.html#Functions_for_filter_and_put

Reservoir sampling

http://johnkerl.org/miller/doc/reference.html#sample

Iterative stats1/stats2

Use mlr stats1 -s ... or mlr stats2 -s ... to print averages, min/max, correlation, etc. on every record. Useful in tail -f contexts when you want to see statistics evolving as the data evolve in time.

http://johnkerl.org/miller/doc/reference.html#stats1
http://johnkerl.org/miller/doc/reference.html#stats2

Minor

  • Initial delta for mlr step -a delta is now 0, matching initial 1 for mlr step -a ratio
  • Usage messages consistently go to stdout when asked for via -h, and stderr in case of command-line syntax errors
  • Online help is confined to 80-character column width, except for mlr -f which is all single-line greppable
  • Header/data length mismatch error messages for CSV/CSV-lite now include file/line context

Autoconfig support

24 Sep 02:24
Compare
Choose a tag to compare

Multi-character RS,FS,PS

21 Sep 01:34
Compare
Choose a tag to compare

You can process CRLF-terminated DKVP files with mlr --dkvp --rs crlf.
You can process LF-terminated CSV files with mlr --csv --rs lf.
You can process TSV using mlr --fs tab; you can convert TSV to CSV using mlr --ifs tab --ofs comma.
Along with many more possibilities.
Please see mlr -h for more information.

There is one minor, backward-incompatible change which I felt not worth calling this 3.0.0: default field separator for NIDX format is now space, not comma.

Improved read performance for RFC4180 CSV

08 Sep 02:58
Compare
Choose a tag to compare

Resolves #51

RFC-compliant CSV input is now about 60% faster than at initial feature release (https://github.com/johnkerl/miller/releases/tag/v2.0.0). It remains about 50% slower than CSV-lite.

Reduce tar-file size

06 Sep 02:16
Compare
Choose a tag to compare

Incremental read-performance increase for CSV format

02 Sep 00:57
Compare
Choose a tag to compare

While #51 is still underway, already there is nearly a 2x read-performance increase in v2.1.1 over v2.1.0.