idna: optimize punycode handling, add amortized allocation API, improve error reporting #653

djc · 2020-11-27T21:55:15Z

Benchmarks before:

test to_ascii_merged         ... bench:       2,102 ns/iter (+/- 309)
test to_ascii_puny_label     ... bench:       1,214 ns/iter (+/- 220)
test to_ascii_simple         ... bench:         246 ns/iter (+/- 34)
test to_unicode_ascii        ... bench:          74 ns/iter (+/- 14)
test to_unicode_merged_label ... bench:       2,127 ns/iter (+/- 400)
test to_unicode_puny_label   ... bench:       1,328 ns/iter (+/- 353)

Benchmarks after:

test to_ascii_merged         ... bench:       1,937 ns/iter (+/- 497)
test to_ascii_puny_label     ... bench:       1,147 ns/iter (+/- 254)
test to_ascii_simple         ... bench:         247 ns/iter (+/- 74)
test to_unicode_ascii        ... bench:          78 ns/iter (+/- 14)
test to_unicode_merged_label ... bench:       1,848 ns/iter (+/- 239)
test to_unicode_puny_label   ... bench:       1,158 ns/iter (+/- 312)

Across a few runs, I see performance improvements like this:

to_ascii_merged: 4%
to_ascii_puny_label: 12%
to_ascii_simple: 3%
to_unicode_ascii: -5% (but on a much smaller base)
to_unicode_merged_label: 14%
to_unicode_puny_label: 13%

This pretty much comes from the first four commits (each commit is logically separate and can be reviewed as such, although some of the first commits don't pay off in performance until the fourth one). The fifth commit adds a separate API that allows reusing allocations -- any performance benefits of this aren't reflected in the benchmarks because they only do one operation per benchmark. The final three commits instead improve error handling (although they don't change the API).

All of this comes at the cost of +300 SLOC in the idna crate.

I'm using this for my work project, so would like to get something like this merged.

djc · 2020-11-27T21:56:52Z

Also happy to split this in separate PRs if that helps.

idna/src/uts46.rs

valenting · 2020-12-01T08:54:57Z

Great work on this one. I'll merge it ASAP.

djc added 7 commits November 27, 2020 22:37

idna: refactor mapping as an iterator API

f4bed62

idna: gather punycode insertions separately to avoid memcpys

f98cb61

idna: separate decoding of encoded characters and merging

d3ce8ff

idna: use iterator interface to yield characters from punycode decoder

3e12bd3

idna: add Idna API with amortized allocation

0a7bcaa

idna: factor out is_err() method for Errors

ac9bbd6

idna: more concise Debug output for Errors

75af776

djc changed the title ~~No alloc idna~~ idna: optimize punycode handling, add amortized allocation API, improve error reporting Nov 27, 2020

valenting reviewed Dec 1, 2020

View reviewed changes

idna/src/uts46.rs Outdated Show resolved Hide resolved

idna: split validity criteria into more specific error variants

dd9cffe

djc force-pushed the no-alloc-idna branch from 9e81e26 to dd9cffe Compare December 1, 2020 08:57

valenting merged commit 91409d6 into servo:master Dec 1, 2020

djc mentioned this pull request Jun 11, 2024

idna to_unicode() API has degraded in 1.0 #938

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idna: optimize punycode handling, add amortized allocation API, improve error reporting #653

idna: optimize punycode handling, add amortized allocation API, improve error reporting #653

djc commented Nov 27, 2020

djc commented Nov 27, 2020

valenting commented Dec 1, 2020

idna: optimize punycode handling, add amortized allocation API, improve error reporting #653

idna: optimize punycode handling, add amortized allocation API, improve error reporting #653

Conversation

djc commented Nov 27, 2020

djc commented Nov 27, 2020

valenting commented Dec 1, 2020