Skip to content
/ outlier Public

A command-line tool for detecting outliers in text files

License

Notifications You must be signed in to change notification settings

gomore/outlier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

outlier

A CLI for detecting outliers in text files. The utility loads a reference file and a target file of text strings. The frequencies are compared and the string of the target file are presented back ordered according to how over-represented they are compared to the reference-file. The result can be reversed using tac or sed to find strings that are under-represented:

 outlier foo bar | sed '1!G;h;$!d'

Building

Requires lein to build. To build an executable binary called outlier run

lein bin 

This will also copy the binary to ~/bin.

Usage

 outlier reference-file.txt target-file.txt

The output format is documented in lein --help.

TODO

  • Read the lines from a stream using lazy-seq, reducers, or core.async. Just avoid loading everything into mem!
  • Print result lazily and terminate if stream closes
  • Add the possibility of reading target file from standard in.

License

Copyright © 2017 GoMore

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

About

A command-line tool for detecting outliers in text files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published