-
-
Notifications
You must be signed in to change notification settings - Fork 42
TextModels
Text models are used to process text. They can be used for sentiment analysis, text classification, translation, summarization, and other tasks. Text models are avaialble in the @visheratin/web-ai/text
subpackage.
Sequence-to-sequence models are used to transform the text into another text. Examples of such transformations are translation, summarization, and grammar correction.
Sequence-to-sequence models have the type ModelType.Seq2Seq
.
Using the model identifier:
import { TextModel } from "@visheratin/web-ai/text";
const result = await TextModel.create("grammar-t5-efficient-tiny");
const model = result.model;
Using the model metadata:
import { Seq2SeqModel, Metadata } from "@visheratin/web-ai/text";
const metadata: Metadata = {
modelPaths: new Map<string, string>([
[
"encoder",
"https://huggingface.co/visheratin/t5-efficient-tiny-grammar-correction/resolve/main/encoder_model.onnx",
],
[
"decoder",
"https://huggingface.co/visheratin/t5-efficient-tiny-grammar-correction/resolve/main/decoder_with_past_model.onnx",
],
]),
tokenizerPath:
"https://huggingface.co/visheratin/t5-efficient-tiny-grammar-correction/resolve/main/tokenizer.json",
};
const model = new Seq2SeqModel(metadata);
const elapsed = await model.init();
console.log(elapsed);
The processing is done using a process()
method.
Seq2Seq
models output text:
const input = "Test text input"
const output = await model.process(input)
console.log(output.text)
console.log(`Sentence of length ${input.length} (${output.tokensNum} tokens) was processed in ${output.elapsed} seconds`)
Seq2Seq
models also support output streaming via processStream()
method:
const input = "Test text input"
let output = "";
for await (const piece of model.processStream(value)) {
output = output.concat(piece);
}
console.log(output)
If a Seq2Seq
model supports task-specific prefixes (e.g., summarize
or translate
), you can use them
to specify what kind of processing is needed:
const input = "Test text input"
const output = await model.process(input, "summarize")
console.log(output.text)
console.log(`Sentence of length ${input.length} (${output.tokensNum} tokens) was processed in ${output.elapsed} seconds`)
If the model does not allow the specified prefix, the error will be thrown.
Feature extraction models are used to transform the text into an array of numbers - embedding. Generated vectors are useful for semantic search or cluster analysis because embeddings of semantically similar text are similar and can be compared using cosine similarity.
Feature extraction models have the type ModelType.FeatureExtraction
.
Using the model identifier:
import { TextModel } from "@visheratin/web-ai/text";
const result = await TextModel.create("mini-lm-v2");
const model = result.model;
Using the model metadata:
import { FeatureExtractionModel, Metadata } from "@visheratin/web-ai/text";
const metadata: Metadata = {
modelPaths: new Map<string, string>([
[
"encoder",
"https://web-ai-models.org/text/feature-extraction/miniLM-v2/model.onnx.gz",
],
]),
tokenizerPath:
"https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/tokenizer.json",
};
const model = new FeatureExtractionModel(metadata);
const elapsed = await model.init();
console.log(elapsed);
FeatureExtraction
models output numeric array:
const input = "Test text input"
const output = await model.process(input)
console.log(output.result)
console.log(`Sentence of length ${input.length} (${output.tokensNum} tokens) was processed in ${output.elapsed} seconds`)