Skip to content

Latest commit

 

History

History
551 lines (404 loc) · 28.2 KB

changelog.md

File metadata and controls

551 lines (404 loc) · 28.2 KB

v1.1.3 (2024-07-25)

Bug Fixes

  • last number not identified in an isolated value pair I am using the Lexer to parse json chunks that I get in stdin one line at a time.

    I appreciate my use case isn't the intended one for this lib as all the tests deal with fully formed json objects and nowhere it states that this lib will work with chunks. But since it works perfectly for my use case, I thought others might benefit from this fix. But also feel free to ignore it.

    This PR fixes a problem where the last number value won't be identified as a number because numbers don't have a terminator like strings. In normal circumstances the next token would serve as the terminator (comma, curly brackets, etc) but if the number line is the last one in an json object, the string just ends and the object closing curly brace is in the next line so the number gets ignored.

    This fix checks if the last processed token was Number and returns the token type accordingly.

    Happy to add any further improvements or additional tests that would be beneficial

Commit Statistics

  • 8 commits contributed to the release over the course of 1635 calendar days.
  • 1635 days passed between releases.
  • 1 commit was understood as conventional.
  • 0 issues like '(#ID)' were seen in commit messages

Commit Details

view details
  • Uncategorized
    • Avoid using the 'alpha' suffix (1b9c1d7)
    • Last number not identified in an isolated value pair (927c987)
    • Isolated value pairs fix (2b47444)
    • Optimize includes (3fe2df2)
    • Add new badges (3c9fb9e)
    • Add clippy and cargo-fmt lints (6129e80)
    • Create rust.yml (ee18ffa)
    • (cargo-release) start next development iteration 1.1.3-alpha.0 (0668d4a)

v1.1.2 (2020-02-01)

Commit Statistics

  • 2 commits contributed to the release.
  • 0 commits were understood as conventional.
  • 0 issues like '(#ID)' were seen in commit messages

Commit Details

view details
  • Uncategorized
    • Add readme, too (88f67ea)
    • (cargo-release) start next development iteration 1.1.2-alpha.0 (bfd6675)

v1.1.1 (2020-02-01)

Chore

  • v1.0.1

Documentation

  • README misses BufferType argument to Lexer
  • change description

Bug Fixes

  • handle more string escapes
  • handle numbers with exponents
  • syntax error
  • only run benches on nightly

Commit Statistics

  • 16 commits contributed to the release over the course of 1124 calendar days.
  • 1124 days passed between releases.
  • 7 commits were understood as conventional.
  • 0 issues like '(#ID)' were seen in commit messages

Commit Details

view details
  • Uncategorized

v1.1.0 (2019-02-08)

This adds support for Number tokens with exponents and String tokens with the full range of escapes that are allowed.

v1.0.1 (2017-01-02)

Changed the headline of the crate to be more descriptive.

v1.0.0 (2017-01-02)

Improvements

  • travis: more rust versions and no travis-cargo (fd4cd367)

Other

  • more rust versions and no travis-cargo The latter is only useful for doc-uploading, yet we don't need that anymore in the time of docs.rs

Chore

  • format everything

  • remove do-not-edit note A left-over of the original

    [skip ci]

Bug Fixes

  • remove old code It didn't compile anymore, and also probably wasn't really testing us anyway.

Commit Statistics

  • 8 commits contributed to the release over the course of 559 calendar days.
  • 604 days passed between releases.
  • 4 commits were understood as conventional.
  • 0 issues like '(#ID)' were seen in commit messages

Commit Details

view details
  • Uncategorized
    • Format everything (fb0758e)
    • Remove old code (f969498)
    • More rust versions and no travis-cargo (fd4cd36)
    • Merge pull request #10 from cmr/master (da0d2c8)
    • Relicense to dual MIT/Apache-2.0 (853e77a)
    • Merge pull request #8 from jnicholls/master (9303648)
    • Rust 1.x stable compatibility fix. (d2f6a43)
    • Remove do-not-edit note (030f580)

v0.3.0 (2015-05-09)

Features

  • iterator-ext more fun with Token-Iterators (15dc5c5f, closes #3)

Refactor

  • use IntoItertor We use IntoIterator in place of Iterator which should provide more flexibility when feeding the Lexer. However, in our tests, we can't actually use that, unfortunately, due to the consuming semantics (and literals cannot be consumed ... ).

    However, it's a nice proof of concept and doesn't hurt.

Chore

  • v0.3.0

Commit Statistics

  • 2 commits contributed to the release.
  • 2 commits were understood as conventional.
  • 0 issues like '(#ID)' were seen in commit messages

Commit Details

view details

v0.2.0 (2015-05-09)

Chore

  • v0.2.0

New Features

  • more fun with Token-Iterators
    • attaches constructor utilities to all Iterator<Item=Token> to ease using them. It's similar to (former) IteratorExt which would put chain() and nth for instance.

Commit Statistics

  • 2 commits contributed to the release.
  • 2 commits were understood as conventional.
  • 0 issues like '(#ID)' were seen in commit messages

Commit Details

view details
  • Uncategorized

v0.1.1 (2015-05-08)

Chore

  • set v0.1.1

Bug Fixes

  • token-reader offset-map for target buffer (57768da1, closes #6)
  • offset-map for target buffer Previously we would keep writing the first bytes of our destination buffer, as we wouldn't compute any offset at all. Now we produce slices of exactly the right size, and could could verify that this is working.

Commit Statistics

  • 2 commits contributed to the release.
  • 2 commits were understood as conventional.
  • 0 issues like '(#ID)' were seen in commit messages

Commit Details

view details
  • Uncategorized

v0.0.1 (2015-05-06)

Chore

  • initial commit

Commit Statistics

  • 1 commit contributed to the release.
  • 1 commit was understood as conventional.
  • 0 issues like '(#ID)' were seen in commit messages

Commit Details

view details
  • Uncategorized

v0.1.0 (2015-05-08)

Features

Refactor

  • optimize buffer usage
    • only push characters when we have a buffer and are dealing with strings or numbers.
    • added some performance tests to show the difference. We are not quite back at 280MB/s, but down to 220MB/s for the optimal/Span case. The Bytes case goes down to 137MB/s.
  • use enum for state Instead of using many different variables for handling the state, we use an enumeration. That way, we don't unnecessarily initialize memory that will never be used, and lower our requirements for stack space.
  • separarte lexer and filters
    • FilterNull now has its own module
    • Lexer and friends have their own module
  • string value fast-path Technically it's not a fast-path, but it makes the code more uniform and easier to understand. It will be the default template for all other lexing we do, for instance when implementing booleans.

Other

  • update to match current state We also anticipate pretty-printing, which technically isn't there yet.

  • high-speed serialize tests Producers are optimized for performance to show exactly how fast a span token producer is compared to a bytes token producer.

    That way, performance improvements can exactly be quantified.

  • added benchmarks Under optimial conditions (source string is known) we remove null values from 144MB/s, fully streamed we do 104MB/s.

    An acceptable result, considering the unoptimized buffer handling, using a deque might improve this a lot.

    However, we manage to retrieve invalid tokens, which we have to handle somehow, and also don't expect here.

  • make it general

    • What's formerly known as NullFilter can now filter out all key-value pairs with a given key-value type. That way, null can be filtered, as well as numbers, for example.
    • Made certain dual-branch matches an if-clause, which moved everything furthe to the left again, while making the code easier to read.
  • operate on u8 instead of char This improved throughput from 230MB/s to 280MB/s, which is quite worth the while. In case of json, only the values within Strings are actually potentially unicode, everything else is not

  • added benchmark Also renamed test-related files to match their purpose a bit better.

  • benchmark and string_value tests

    • verify escaping in string values works
    • added more complex benchmark, lexing 415MB/s
    • tested unclosed string value

New Features

  • machine serialization works In a first version, we show that serialization without any space works as expected according to very first tests.

    More tests have to be conducted to be sure.

  • infrastructure setup

    • added TokenReader type with basic API
    • improve existing filter tests to verify TokenReader operation
  • support for Buffer enum It cuts our speed in half, currently, but allows to choose between high-speed Span mode and half-speed buffer mode. That way, all applications I see can be catered with best performance.

  • initial implementation It is known to not work in all cases, i.e. it can only take one key-value pair at a time (no consecutive ones), but besides that is a pretty optimal implementation (even though it aint a pretty one).

    However, the test still fails, we match nothing for some reason. Must be evaluated later.

  • number lexing Even though our performance dropped from 332MB/s to 274MB/s, we are happy as the implementation cleaned up our span handling considerably and thus made the code even more maintainable. Cleaning up the span handling also made it faster, i.e. slowest number parsing was at 232MB/s.

  • filtering iterator infrastucture

    • FilterNull frame would allow implementing lexical token pattern matching to remove null values.
  • true and false value lexing

    • including test for both
    • refactored test case for null value easily test booleans as well.
  • null value lexing

    • including tests for normal null values and invalid ones
  • string value parsing Including test

  • datastructures and basic tests The tests still fail as the actual lexer implementation is still to be done.

Documentation

  • added remaining documentation I believe it's best not to add redundant information into the library docs, but instead refer to the tests and benchmarks.
  • state why numbers won't be lexed Also it's not required to solve our actual problem.
  • usage added

Chore

  • clog config + changelog
  • also run benchmarks
  • set GH_TOKEN ... instead of TOKEN
  • with doc-upload Never worked for me, but let's try it one more time.
  • no nightly please As we didn't set the package unstable
  • added secret minor format adjustment

Improvements

  • lexer operate on u8 instead of char (d5a694d1)
  • null-filter make it general (431f051d)
  • README update to match current state (75181ff6)

Bug Fixes

  • minor fix to make it work It's still far from perfect, but a good proof of concept

  • handle consecutive null values With this in place, we handle null value filtering pretty well, as the tests indicate too.

    However, we may still leave a trailing comma in non-null values which could be a problem and thus shouldn't be done !

  • proper comma handling

    • Added support for leading , characters, which have to be removed conditionally.
    • Added tests to verify this works in valid streams, and even invalid ones.
  • removed possible overflow Previously it was possible to over-allocate memory by feeding us lots of , characters. This was alleviated by allowing a one-token look-ahead (implemented through put-back).

  • handle whitespace at end of source Previously we would consider such whitespace invalid. Now we explictly set the invalid state, which is ... explicit = better :) !

    Added we added a test to show this actually works.

Commit Statistics

  • 35 commits contributed to the release over the course of 2 calendar days.
  • 2 days passed between releases.
  • 35 commits were understood as conventional.
  • 0 issues like '(#ID)' were seen in commit messages

Commit Details

view details
lexer handle whitespace at end of source (https://github.com/Byron/json-tools/commit/1d57bc923cb34b6daf1105691b700815a82cc0c11d57bc92)