Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use streaming html parser #40

Closed
purarue opened this issue Feb 4, 2023 · 3 comments
Closed

use streaming html parser #40

purarue opened this issue Feb 4, 2023 · 3 comments
Assignees

Comments

@purarue
Copy link
Owner

purarue commented Feb 4, 2023

loading the whole html document into memory is pretty expensive memory wise, could either use a streaming html parser, or maybe split the file before loading it?

@purarue
Copy link
Owner Author

purarue commented Mar 20, 2023

tried using lxml for this, havent been able to figure it out yet

09307da

@purarue
Copy link
Owner Author

purarue commented Sep 30, 2023

If anyone else has libraries they'd recommend here, I'm very open to suggestions, all my experiments haven't gone well

@purarue
Copy link
Owner Author

purarue commented Oct 1, 2023

ended up just using an html tokenizer in go

this is all legacy anyways, so I dont know if anyone else is ever even going to use this, is more for my own usage

@purarue purarue closed this as completed Oct 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant