Skip to content

Commit

Permalink
Refactor with final changes before 1.0.0
Browse files Browse the repository at this point in the history
*   Update docs;
*   Remove benchmark.

Closes GH-23.
  • Loading branch information
wooorm committed Sep 16, 2015
1 parent 707686d commit da92a14
Show file tree
Hide file tree
Showing 4 changed files with 80 additions and 156 deletions.
88 changes: 0 additions & 88 deletions benchmark.js

This file was deleted.

2 changes: 1 addition & 1 deletion bower.json
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@
"build/",
"components/",
"coverage/",
"lib/",
"node_modules/",
"benchmark.js",
"build.js",
"index.js",
"test.js",
Expand Down
4 changes: 1 addition & 3 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@
"istanbul": "^0.3.0",
"jscs": "^2.0.0",
"jscs-jsdoc": "^1.0.0",
"matcha": "^0.6.0",
"mdast": "^1.0.0",
"mdast-comment-config": "^1.0.0",
"mdast-github": "^1.0.0",
Expand All @@ -54,7 +53,6 @@
"build-bundle": "browserify index.js -s Retext > retext.js",
"postbuild-bundle": "esmangle retext.js > retext.min.js",
"build-md": "mdast . --quiet",
"build": "npm run build-md && npm run build-bundle",
"benchmark": "matcha benchmark.js"
"build": "npm run build-md && npm run build-bundle"
}
}
142 changes: 78 additions & 64 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,19 @@

[![Build Status](https://img.shields.io/travis/wooorm/retext.svg)](https://travis-ci.org/wooorm/retext) [![Coverage Status](https://img.shields.io/codecov/c/github/wooorm/retext.svg)](https://codecov.io/github/wooorm/retext) [![Code Climate](http://img.shields.io/codeclimate/github/wooorm/retext.svg)](https://codeclimate.com/github/wooorm/retext)

> **Retext is going to [change
> soon](https://github.com/wooorm/retext/issues/23). You probably wan’t to use
> the [next, stable, version](https://github.com/wooorm/retext/tree/feature/stable).**
**retext** is an extensible natural language system—by default using
[**parse-latin**](https://github.com/wooorm/parse-latin) to transform natural
language into **[NLCST](https://github.com/wooorm/nlcst/)**.
**Retext** provides a pluggable system for analysing and manipulating natural
language in JavaScript. NodeJS and the browser. Tests provide 100% coverage.
**retext** is an extensible natural language processor with support for
multiple languages. **Retext** provides a pluggable system for analysing
and manipulating natural language in JavaScript. Node and the browser.
100% coverage.

> Rather than being a do-all library for Natural Language Processing (such as
> [NLTK](http://www.nltk.org) or [OpenNLP](https://opennlp.apache.org)),
> **retext** aims to be useful for more practical use cases (such as censoring
> profane words or decoding emoticons, but the possibilities are endless)
> instead of more academic goals (research purposes).
> **retext** aims to be useful for more practical use cases (such as checking
> for [insensitive words](https://github.com/wooorm/alex) or decoding
> [emoticons](https://github.com/wooorm/retext-emoji)) instead of more academic
> goals (research purposes).
> **retext** is inherently modular—it uses plugins (similar to
> [rework](https://github.com/reworkcss/rework/) for CSS) instead of providing
> [mdast](https://github.com/wooorm/mdast/) for markdown) instead of providing
> everything out of the box (such as
> [Natural](https://github.com/NaturalNode/natural)). This makes **retext** a
> viable tool for use on the web.
Expand All @@ -38,8 +34,8 @@ globals module, [uncompressed](retext.js) and [compressed](retext.min.js).
## Usage

The following example uses [**retext-emoji**](https://github.com/wooorm/retext-emoji)
(to show emoji) and [**retext-smartypants**](https://github.com/wooorm/retext-smartypants)
(for smart punctuation).
to show emoji and [**retext-smartypants**](https://github.com/wooorm/retext-smartypants)
for smart punctuation.

Require dependencies:

Expand All @@ -60,28 +56,27 @@ var processor = retext().use(smartypants).use(emoji, {
Process a document:

```javascript
var doc = processor.process(
'The three wise monkeys [. . .] sometimes called the ' +
'three mystic apes--are a pictorial maxim. Together ' +
'they embody the proverbial principle to ("see no evil, ' +
'hear no evil, speak no evil"). The three monkeys are ' +
'Mizaru (:see_no_evil:), covering his eyes, who sees no ' +
'evil; Kikazaru (:hear_no_evil:), covering his ears, ' +
'who hears no evil; and Iwazaru (:speak_no_evil:), ' +
'covering his mouth, who speaks no evil.'
);
var doc = processor.process([
'The three wise monkeys [. . .] sometimes called the three mystic',
'apes--are a pictorial maxim. Together they embody the proverbial',
'principle to ("see no evil, hear no evil, speak no evil"). The',
'three monkeys are Mizaru (:see_no_evil:), covering his eyes, who',
'sees no evil; Kikazaru (:hear_no_evil:), covering his ears, who',
'hears no evil; and Iwazaru (:speak_no_evil:), covering his mouth,',
'who speaks no evil.'
].join('\n'));
```

Yields (you need a browser which supports emoji to see them):

```text
The three wise monkeys […] sometimes called the three
mystic apes—are a pictorial maxim. Together they
embody the proverbial principle to (“see no evil,
hear no evil, speak no evil”). The three monkeys are
Mizaru (🙈), covering his eyes, who sees no evil;
Kikazaru (🙉), covering his ears, who hears no evil;
and Iwazaru (🙊), covering his mouth, who speaks no evil.
The three wise monkeys […] sometimes called the three mystic
apes—are a pictorial maxim. Together they embody the proverbial
principle to (“see no evil, hear no evil, speak no evil”). The
three monkeys are Mizaru (🙈), covering his eyes, who
sees no evil; Kikazaru (🙉), covering his ears, who
hears no evil; and Iwazaru (🙊), covering his mouth,
who speaks no evil.
```

## API
Expand All @@ -106,13 +101,13 @@ Change the way [**retext**](#api) works by using a [plugin](#plugin).

**Returns**

`Object`: an instance of Retext: The returned object functions just like
`Object` an instance of Retext: The returned object functions just like
**retext** (it has the same methods), but caches the `use`d plugins. This
provides the ability to chain `use` calls to use multiple plugins, but
ensures the functioning of the **retext** module does not change for other
dependents.

### [retext](#api).process(value\[, done\])
### [retext](#api).process(value\[, [done](#function-doneerr-file-doc)\])

Parse a text document, apply plugins to it, and compile it into
something else.
Expand All @@ -123,30 +118,47 @@ something else.

**Parameters**

* `value` (`string`) — Text document;
* `value` ([`VFile`](https://github.com/wooorm/vfile) or `string`)
— Text document;

* `done` (`function(err, doc, file)`, optional) — Callback invoked when the
output is generated with either an error, or a result. Only strictly
needed when async plugins are used.
* `done` ([`Function`](#function-doneerr-file-doc), optional).

**Returns**

`string` or `null`: A document. Formatted in whatever plugins generate.
The result is `null` if a plugin is asynchronous, in which case the callback
`done` should’ve been passed (don’t worry: plugin creators make sure you know
its async).
`string?`: A document. Formatted in whatever plugins generate. The result is
`null` if a plugin is asynchronous, in which case the callback `done` should’ve
been passed (don’t worry: plugin creators make sure you know its async).

### function done(err, [file](https://github.com/wooorm/vfile), doc)

Callback invoked when the output is generated with either an error, or the
processed document (represented as a virtual file and a string).

**Parameters**

* `err` (`Error?`) — Reason of failure;
* `file` ([`VFile?`](https://github.com/wooorm/vfile)) — Virtual file;
* `doc` (`string?`) — Generated document.

## Plugin

### function attacher([retext](#api)\[, options\])

A plugin is a function, which takes the **Retext** instance a user attached
the plugin on as a first parameter and optional configuration as a second
parameter.

A plugin can return a `transformer`.

### plugin
### function transformer([node](https://github.com/wooorm/nlcst), [file](https://github.com/wooorm/vfile)\[, next\])

A plugin is simply a function, with `function(retext[, options])` as its
signature. The first argument is the **Retext** instance a user attached the
plugin to. The plugin is invoked when a user `use`s the plugin (not when a
document is parsed) and enables the plugin to modify retext.
A transformer changes the provided document (represented as a node and a
virtual file).

The plugin can return another function: `function(NLCSTNode, file[, next])`.
This function is invoked when a document is parsed.
Transformers can be asynchronous, in which case `next` must be invoked
(optionally with an error) when done.

## Plugins
## List of Plugins

* [retext-directionality](https://github.com/wooorm/retext-directionality)
— (**[demo](http://wooorm.github.io/retext-directionality/)**)
Expand All @@ -160,10 +172,19 @@ This function is invoked when a document is parsed.
— (**[demo](http://wooorm.github.io/retext-double-metaphone/)**)
— Implementation of the Double Metaphone algorithm;

* [retext-dutch](https://github.com/wooorm/retext-dutch)
— Dutch language support;

* [retext-english](https://github.com/wooorm/retext-english)
— English language support;

* [retext-emoji](https://github.com/wooorm/retext-emoji)
— (**[demo](http://wooorm.github.io/retext-emoji/)**)
— Encode or decode [Gemojis](https://github.com/github/gemoji);

* [retext-equality](https://github.com/wooorm/retext-equality)
— Warn about possible insensitive, inconsiderate language;

* [retext-keywords](https://github.com/wooorm/retext-keywords)
— (**[demo](http://wooorm.github.io/retext-keywords/)**)
— Extract keywords and keyphrases;
Expand Down Expand Up @@ -206,37 +227,30 @@ This function is invoked when a document is parsed.

## List of Utilities

Although not **retext** plug-ins, the following projects are useful when
working with the [CST](https://github.com/wooorm/nlcst):
The following projects are useful when working with the syntax tree,
[NLCST](https://github.com/wooorm/nlcst):

* [wooorm/nlcst-to-string](https://github.com/wooorm/nlcst-to-string)
— Stringify a node;

* [wooorm/nlcst-is-literal](https://github.com/wooorm/nlcst-is-literal)
— Check whether a node is meant literally;

* [wooorm/nlcst-test](https://github.com/wooorm/nlcst-test)
— Validate a NLCST node;

In addition, see [`wooorm/unist`](https://github.com/wooorm/unist#unist-node-utilties)
for other utilities which work with **retext** nodes, but also with
[**mdast**](https://github.com/wooorm/mdast) nodes.

And finally, see [`wooorm/vfile`](https://github.com/wooorm/vfile#related-tools)
for a list of utilities for working with virtual files.

## Benchmark

On a MacBook Air, it parses about 2 big articles, 25 sections, or 230
paragraphs per second.

```text
retext.parse(value, callback);
325 op/s » A paragraph (5 sentences, 100 words)
33 op/s » A section (10 paragraphs, 50 sentences, 1,000 words)
3 op/s » An article (100 paragraphs, 500 sentences, 10,000 words)
```

## Related

* [nlcst](https://github.com/wooorm/nlcst)
* [unist](https://github.com/wooorm/unist)
* [unified](https://github.com/wooorm/unified)

## License

Expand Down

0 comments on commit da92a14

Please sign in to comment.