Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use streaming JSON parser (ijson) #42

Open
karlicoss opened this issue Mar 18, 2023 · 5 comments
Open

use streaming JSON parser (ijson) #42

karlicoss opened this issue Mar 18, 2023 · 5 comments

Comments

@karlicoss
Copy link
Contributor

I guess not a super big deal since we use caching, but it does give significant (almost 2x speedups)

Had good success using it for a couple of DALs https://github.com/karlicoss/exporthelpers/blob/804b8afa070d8017ad15710a2a179e71ea60316f/dal_helper.py#L140-L171 (made it an optional dependency for backwards compatibility since ijson involves some binaries which might be unavailable for some platforms)

related: #40

@purarue
Copy link
Owner

purarue commented Mar 18, 2023

Ah yeah, totally down for adding this, falling back to default behaviour if it fails

@purarue
Copy link
Owner

purarue commented Oct 1, 2023

have been thinking more about this with me adding more formats to browserexport, will probably create a meta-package like you have in exporthelpers that this will have as a dependency

@karlicoss
Copy link
Contributor Author

Another relevant thing that may be worth extracting from HPI is a library for accessing compressed stuff karlicoss/kompress#10
I think after a few years unfortunately there isn't anything existing

@karlicoss
Copy link
Contributor Author

started extracting kompress stuff here btw https://github.com/karlicoss/kompress -- will add more docs and think if needs any refactoring and then will move HPI and bleanser to use it

@purarue
Copy link
Owner

purarue commented Oct 2, 2023

looks good

I think the only thing it doesnt meet my usecase for is .gz files (not .tar.gz files)

like here: https://github.com/seanbreckenridge/browserexport/blob/734bc46e9200cc888d8146c31d55e7caa039c4e2/browserexport/parse.py#L73

gzip has the same rb -> rt problem lzma does

will PR that, would be nice to be able to use that in my tools instead of re-implementing it everywhere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants