Skip to content

t-cordonnier/Culter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DESCRIPTION:
============

Culter is a set of Ruby classes to divide text into phrases (segments).
Currently it implements only a very simple algorithm.

USAGE:
======

This is a library to be used inside Ruby programs.
See bin/culter.rb for example of usage.

The script bin/culter.rb will act as an Unix text filter: 
STDIN will contain paragraphs, and STDOUT will contain them cut by phrases.
Takes two parameters: segmentation rules file (see samples and specs for supported formats), and language.
Without parameter it will run the simple segmenter.
Option -v gives more details also in STDOUT during segmentation (for debug)

The script bin/culter-conv.rb enables to convert between all supported file formats (SRX, CSCX, CSEX).
If you convert from CS[CE] to SRX, templates will be converted to their equivalent in SRX;
you can select between human mode (each combination in separate rule, easier to read)
and machine mode (all possibilities combined to a single rule, faster to parse by computer but harder to read by humans).
If you convert from CSE to lower format, specific features are simply removed.

The script bin/culter-join.rb enables to use the join feature of CSCE[XY] formats:
in these formats you can specify whenever spaces should be added or not when joining segments to a paragraph.

The script bin/ensis.rb opens a GUI editor compliant with all rule file formats.
Parameter: name of the file.
Works with JRuby and IronRuby without any more library. Works with standard CRuby but you need to install Gtk2 for Ruby.

LICENSE:
========

Please read the file LICENSE for the very last version of conditions applied to this code.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages