v1.1.3 (2024-07-25)

Bug Fixes

last number not identified in an isolated value pair I am using the Lexer to parse json chunks that I get in stdin one line at a time.

I appreciate my use case isn't the intended one for this lib as all the tests deal with fully formed json objects and nowhere it states that this lib will work with chunks. But since it works perfectly for my use case, I thought others might benefit from this fix. But also feel free to ignore it.

This PR fixes a problem where the last number value won't be identified as a number because numbers don't have a terminator like strings. In normal circumstances the next token would serve as the terminator (comma, curly brackets, etc) but if the number line is the last one in an json object, the string just ends and the object closing curly brace is in the next line so the number gets ignored.

This fix checks if the last processed token was Number and returns the token type accordingly.

Happy to add any further improvements or additional tests that would be beneficial

Commit Statistics

8 commits contributed to the release over the course of 1635 calendar days.
1635 days passed between releases.
1 commit was understood as conventional.
0 issues like '(#ID)' were seen in commit messages

Commit Details

view details

Uncategorized
- Avoid using the 'alpha' suffix (1b9c1d7)
- Last number not identified in an isolated value pair (927c987)
- Isolated value pairs fix (2b47444)
- Optimize includes (3fe2df2)
- Add new badges (3c9fb9e)
- Add clippy and cargo-fmt lints (6129e80)
- Create rust.yml (ee18ffa)
- (cargo-release) start next development iteration 1.1.3-alpha.0 (0668d4a)

v1.1.2 (2020-02-01)

Commit Statistics

2 commits contributed to the release.
0 commits were understood as conventional.
0 issues like '(#ID)' were seen in commit messages

Commit Details

view details

Uncategorized
- Add readme, too (88f67ea)
- (cargo-release) start next development iteration 1.1.2-alpha.0 (bfd6675)

v1.1.1 (2020-02-01)

Chore

v1.0.1

Documentation

README misses BufferType argument to Lexer
change description

Bug Fixes

handle more string escapes
handle numbers with exponents
syntax error
only run benches on nightly

Commit Statistics

16 commits contributed to the release over the course of 1124 calendar days.
1124 days passed between releases.
7 commits were understood as conventional.
0 issues like '(#ID)' were seen in commit messages

Commit Details

view details

Uncategorized
- Use criterion for benchmarks (2be30e9)
- Improve readme (0b036d6)
- Clippy (08057a6)
- Simplification and modernization (4f168d4)
- Merge pull request #12 from FauxFaux/patch-1 (520763f)
- README misses BufferType argument to Lexer (bcaaebb)
- Cut 1.1.0 (5be61b2)
- Merge pull request #11 from heycam/num-str-fixes (df6d401)
- Handle more string escapes (df37dc8)
- Handle numbers with exponents (e1afcbc)
- Add crates badge (b53e873)
- Apply latest rustfmt (0750473)
- Syntax error (1235550)
- Only run benches on nightly (067e93a)
- V1.0.1 (79ae678)
- Change description (cc24d29)

v1.1.0 (2019-02-08)

This adds support for Number tokens with exponents and String tokens with the full range of escapes that are allowed.

v1.0.1 (2017-01-02)

Changed the headline of the crate to be more descriptive.

v1.0.0 (2017-01-02)

Improvements

travis: more rust versions and no travis-cargo (fd4cd367)

Other

more rust versions and no travis-cargo The latter is only useful for doc-uploading, yet we don't need that anymore in the time of docs.rs

Chore

format everything
remove do-not-edit note A left-over of the original

[skip ci]

Bug Fixes

benchmark: remove old code (f969498a)

remove old code It didn't compile anymore, and also probably wasn't really testing us anyway.

Commit Statistics

8 commits contributed to the release over the course of 559 calendar days.
604 days passed between releases.
4 commits were understood as conventional.
0 issues like '(#ID)' were seen in commit messages

Commit Details

view details

Uncategorized
- Format everything (fb0758e)
- Remove old code (f969498)
- More rust versions and no travis-cargo (fd4cd36)
- Merge pull request #10 from cmr/master (da0d2c8)
- Relicense to dual MIT/Apache-2.0 (853e77a)
- Merge pull request #8 from jnicholls/master (9303648)
- Rust 1.x stable compatibility fix. (d2f6a43)
- Remove do-not-edit note (030f580)

v0.3.0 (2015-05-09)

Features

iterator-ext more fun with Token-Iterators (15dc5c5f, closes #3)

Refactor

use IntoItertor We use IntoIterator in place of Iterator which should provide more flexibility when feeding the Lexer. However, in our tests, we can't actually use that, unfortunately, due to the consuming semantics (and literals cannot be consumed ... ).

However, it's a nice proof of concept and doesn't hurt.

Chore

v0.3.0

Commit Statistics

2 commits contributed to the release.
2 commits were understood as conventional.
0 issues like '(#ID)' were seen in commit messages

Commit Details

view details

Uncategorized
- V0.3.0 (0f540c6)
- Use IntoItertor (de49700)

v0.2.0 (2015-05-09)

Chore

v0.2.0

New Features

more fun with Token-Iterators
- attaches constructor utilities to all Iterator<Item=Token> to ease using them. It's similar to (former) IteratorExt which would put chain() and nth for instance.

Commit Statistics

2 commits contributed to the release.
2 commits were understood as conventional.
0 issues like '(#ID)' were seen in commit messages

Commit Details

view details

Uncategorized
- V0.2.0 (f40b44c)
- More fun with Token-Iterators (15dc5c5)

v0.1.1 (2015-05-08)

Chore

set v0.1.1

Bug Fixes

token-reader offset-map for target buffer (57768da1, closes #6)

offset-map for target buffer Previously we would keep writing the first bytes of our destination buffer, as we wouldn't compute any offset at all. Now we produce slices of exactly the right size, and could could verify that this is working.

Commit Statistics

2 commits contributed to the release.
2 commits were understood as conventional.
0 issues like '(#ID)' were seen in commit messages

Commit Details

view details

Uncategorized
- Set v0.1.1 (0d125f2)
- Offset-map for target buffer (57768da)

v0.0.1 (2015-05-06)

Chore

initial commit

Commit Statistics

1 commit contributed to the release.
1 commit was understood as conventional.
0 issues like '(#ID)' were seen in commit messages

Commit Details

view details

Uncategorized
- Initial commit (f3b4120)

v0.1.0 (2015-05-08)

Features

lexer
- support for Buffer enum (a3e72b50)
- number lexing (f952f087)
- filtering iterator infrastucture (fb94ea9c)
- true and false value lexing (97ae9080)
- null value lexing (dc2f9a25)
- string value parsing (e9b60721)
- datastructures and basic tests (f66ea5ff)
null-filter initial implementation (97adcb85)
token-reader
- machine serialization works (458928dd)
- infrastructure setup (96dac096)

Refactor

optimize buffer usage
- only push characters when we have a buffer and are dealing with strings or numbers.
- added some performance tests to show the difference. We are not quite back at 280MB/s, but down to 220MB/s for the optimal/Span case. The Bytes case goes down to 137MB/s.
use enum for state Instead of using many different variables for handling the state, we use an enumeration. That way, we don't unnecessarily initialize memory that will never be used, and lower our requirements for stack space.
separarte lexer and filters
- FilterNull now has its own module
- Lexer and friends have their own module
string value fast-path Technically it's not a fast-path, but it makes the code more uniform and easier to understand. It will be the default template for all other lexing we do, for instance when implementing booleans.

Other

update to match current state We also anticipate pretty-printing, which technically isn't there yet.
high-speed serialize tests Producers are optimized for performance to show exactly how fast a span token producer is compared to a bytes token producer.

That way, performance improvements can exactly be quantified.
added benchmarks Under optimial conditions (source string is known) we remove null values from 144MB/s, fully streamed we do 104MB/s.

An acceptable result, considering the unoptimized buffer handling, using a deque might improve this a lot.

However, we manage to retrieve invalid tokens, which we have to handle somehow, and also don't expect here.
make it general
- What's formerly known as NullFilter can now filter out all key-value pairs with a given key-value type. That way, null can be filtered, as well as numbers, for example.
- Made certain dual-branch matches an if-clause, which moved everything furthe to the left again, while making the code easier to read.
operate on u8 instead of char This improved throughput from 230MB/s to 280MB/s, which is quite worth the while. In case of json, only the values within Strings are actually potentially unicode, everything else is not
added benchmark Also renamed test-related files to match their purpose a bit better.
benchmark and string_value tests
- verify escaping in string values works
- added more complex benchmark, lexing 415MB/s
- tested unclosed string value

New Features

machine serialization works In a first version, we show that serialization without any space works as expected according to very first tests.

More tests have to be conducted to be sure.
infrastructure setup
- added TokenReader type with basic API
- improve existing filter tests to verify TokenReader operation
support for Buffer enum It cuts our speed in half, currently, but allows to choose between high-speed Span mode and half-speed buffer mode. That way, all applications I see can be catered with best performance.
initial implementation It is known to not work in all cases, i.e. it can only take one key-value pair at a time (no consecutive ones), but besides that is a pretty optimal implementation (even though it aint a pretty one).

However, the test still fails, we match nothing for some reason. Must be evaluated later.
number lexing Even though our performance dropped from 332MB/s to 274MB/s, we are happy as the implementation cleaned up our span handling considerably and thus made the code even more maintainable. Cleaning up the span handling also made it faster, i.e. slowest number parsing was at 232MB/s.
filtering iterator infrastucture
- FilterNull frame would allow implementing lexical token pattern matching to remove null values.
true and false value lexing
- including test for both
- refactored test case for null value easily test booleans as well.
null value lexing
- including tests for normal null values and invalid ones
string value parsing Including test
datastructures and basic tests The tests still fail as the actual lexer implementation is still to be done.

Documentation

added remaining documentation I believe it's best not to add redundant information into the library docs, but instead refer to the tests and benchmarks.
state why numbers won't be lexed Also it's not required to solve our actual problem.
usage added

Chore

clog config + changelog
also run benchmarks
set GH_TOKEN ... instead of TOKEN
with doc-upload Never worked for me, but let's try it one more time.
no nightly please As we didn't set the package unstable
added secret minor format adjustment

Improvements

lexer operate on u8 instead of char (d5a694d1)
null-filter make it general (431f051d)
README update to match current state (75181ff6)

Bug Fixes

null-filter
- removed possible overflow (50c9f81c)
- proper comma handling (321fa592)
- handle consecutive null values (96e20e65)
- minor fix to make it work (e489bffa)
proper comma handling (321fa592)
handle consecutive null values (96e20e65)
minor fix to make it work (e489bffa)

minor fix to make it work It's still far from perfect, but a good proof of concept
handle consecutive null values With this in place, we handle null value filtering pretty well, as the tests indicate too.

However, we may still leave a trailing comma in non-null values which could be a problem and thus shouldn't be done !
proper comma handling
- Added support for leading , characters, which have to be removed conditionally.
- Added tests to verify this works in valid streams, and even invalid ones.
removed possible overflow Previously it was possible to over-allocate memory by feeding us lots of , characters. This was alleviated by allowing a one-token look-ahead (implemented through put-back).
handle whitespace at end of source Previously we would consider such whitespace invalid. Now we explictly set the invalid state, which is ... explicit = better :) !

Added we added a test to show this actually works.

Commit Statistics

35 commits contributed to the release over the course of 2 calendar days.
2 days passed between releases.
35 commits were understood as conventional.
0 issues like '(#ID)' were seen in commit messages

Commit Details

view details

Uncategorized
- Added remaining documentation (077fe41)
- Clog config + changelog (ffb3c71)
- Update to match current state (75181ff)
- Handle whitespace at end of source (1d57bc9)
- High-speed serialize tests (a5e3c3d)
- Added benchmarks (8c5e9f2)
- Machine serialization works (458928d)
- Infrastructure setup (96dac09)
- Make it general (431f051)
- Optimize buffer usage (08ad49b)
- Support for Buffer enum (a3e72b5)
- Operate on u8 instead of char (d5a694d)
- Removed possible overflow (50c9f81)
- Proper comma handling (321fa59)
- Handle consecutive null values (96e20e6)
- Added benchmark (43a1119)
- Minor fix to make it work (e489bff)
- Initial implementation (97adcb8)
- Number lexing (f952f08)
- Use enum for state (e924f03)
- Separarte lexer and filters (0a7e5c7)
- Filtering iterator infrastucture (fb94ea9)
- State why numbers won't be lexed (270f57c)
- True and false value lexing (97ae908)
- Also run benchmarks (32cd37b)
- Set GH_TOKEN (b5ab5a1)
- With doc-upload (5287268)
- No nightly please (dc439b6)
- String value fast-path (dd40f6e)
- Null value lexing (dc2f9a2)
- Benchmark and string_value tests (d4782c8)
- String value parsing (e9b6072)
- usage added (e9cc19e)
- Added secret (dbf42ec)
- Datastructures and basic tests (f66ea5f)

lexer handle whitespace at end of source (https://github.com/Byron/json-tools/commit/1d57bc923cb34b6daf1105691b700815a82cc0c11d57bc92)

Files

changelog.md

Latest commit

History

changelog.md

File metadata and controls

v1.1.3 (2024-07-25)

Bug Fixes

Commit Statistics

Commit Details

v1.1.2 (2020-02-01)

Commit Statistics

Commit Details

v1.1.1 (2020-02-01)

Chore

Documentation

Bug Fixes

Commit Statistics

Commit Details

v1.1.0 (2019-02-08)

v1.0.1 (2017-01-02)

v1.0.0 (2017-01-02)

Improvements

Other

Chore

Bug Fixes

Commit Statistics

Commit Details

v0.3.0 (2015-05-09)

Features

Refactor

Chore

Commit Statistics

Commit Details

v0.2.0 (2015-05-09)

Chore

New Features

Commit Statistics

Commit Details

v0.1.1 (2015-05-08)

Chore

Bug Fixes

Commit Statistics

Commit Details

v0.0.1 (2015-05-06)

Chore

Commit Statistics

Commit Details

v0.1.0 (2015-05-08)

Features

Refactor

Other

New Features

Documentation

Chore

Improvements

Bug Fixes

Commit Statistics

Commit Details