Work with grapheme, words, and sentences with small, simple, and fast API using
Intl.Segmenter
npm install segmenter
Intl.Segmenter
is supported in all major browsers and94%
of users have it available — it's time for adoption.- If you have a use case other than iterating over all graphemes/words/sentences in a text, then
Intl.Segmenter
might be a little hard to work with. - In many cases, working with graphemes is preferable to characters. Graphemes are what the end user sees. For example, the emoji
👨🔧️
is a single grapheme but consists of 6 characters.for
loop will make 6 iterations,for of
looping👨🔧️
will make 4 iterations — it's confusing, just use graphemes. - Before
Intl.Segmenter
, working with graphemes required libraries likegraphemer
that is94KB
in size.
import { graphemeAt, graphemeRangeAt, wordAt, wordRangeAt } from "segmenter";
graphemeAt("👨🔧️ the fixer", 3); // 👨🔧️
graphemeRangeAt("👨🔧️ the fixer", 3); // { start: 0, end: 6 }
wordAt("hello-world"); // "hello"
wordRangeAt("hello-world"); // { start: 0, end: 5 }
Get the grapheme at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
Get the start
and end
positions of the grapheme at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
Get all graphemes in the string
as Array
.
Get the word at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
Get the start
and end
positions of the word at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
Get all words in the string
as Array
.
Note: Intl.Segmenter
doesn't do a perfect job of detecting sentences. For example, I went to Dr. Smith's office
will be split into two sentences.
Get the sentence at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
Get the start
and end
positions of the sentence at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
Get all sentences in the string
as Array
.