Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Horrible Result #34

Open
smaragdus opened this issue Jan 17, 2020 · 5 comments
Open

Horrible Result #34

smaragdus opened this issue Jan 17, 2020 · 5 comments
Assignees
Labels
bug html-parser html parser problem / bug

Comments

@smaragdus
Copy link

Hello,

I tried to save a single page (this article) using 'Save Page' command and the result was horrible, I would say that the generated EPUB file is unusable.

Screens:

CoolReader 3 0 56-42 - 2020-01-17 - 001

CoolReader 3 0 56-42 - 2020-01-17 - 002

For me such EPUB files are unreadable.

I am using Save as eBook version 1.3.5 with Cent Browser (Chromium-based) version 4.1.7.182 on Windows 8.

I hope that this extension is still in development and and new releases would fix such issues.

Regards

@alexadam
Copy link
Owner

I tried to save the article with the latest firefox and it doesn't even work - it freezes the extension. This is clearly a bug. But there are some problem with that web page too, you can see the HTML validator log here: https://validator.w3.org/nu/?doc=https%3A%2F%2Fwww.unz.com%2Farticle%2Fagainst-mishima%2F

If you find more pages that don't work as expected, please post the links here. I'll try to create some tests and then do a major release with more updates

Thank you!

@alexadam alexadam self-assigned this Jan 17, 2020
@alexadam alexadam added the bug label Jan 17, 2020
@smaragdus
Copy link
Author

@alexadam

Thanks for your quick response. If it happens that I come across pages which cause problems I will let you know posting the links here,

Regards

@alexadam alexadam added the html-parser html parser problem / bug label Feb 24, 2020
@Verfallsdatum
Copy link

Hi, I'm also having this problem but with the scientificamerican.com website (article example here)

the output file is not great...

Cosmic String Gravitational Waves Could Solve Antimatter Mystery - Scientific American (3).zip

thank you!

@alexadam
Copy link
Owner

alexadam commented Mar 6, 2020

@Verfallsdatum the SA page is full of html errors & a 'fatal' error that stopped the validator :) https://validator.w3.org/nu/?doc=https%3A%2F%2Fwww.scientificamerican.com%2Farticle%2Fcosmic-string-gravitational-waves-could-solve-antimatter-mystery%2F

I did some updates and I prepare a new release. In the next version you won't see that garbage html in the output ebook, but some error code/message.

I use a simple html parser that cannot handle errors and I'm still working on a method to fix this. The problem is, even if I 'force' the output of the ebook (for ex. just copy paste the html without parsing it) most ebook readers won't open it and throw an error... because they cannot read it - so the solution is to find and fix the error before. Any ideas are welcome!

ps. I investigated more and it seems that the error comes from a link in the social share box. A quick fix is to create a custom style that hides it, like this:

url regex:
scientificamerican\.com\/article\/

css:
.article-grid__share {
display: none;
}
.share-box {
display: none;
}

@Verfallsdatum
Copy link

Ah yes indeed, removing all the elements that contain errors does fix this issue. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug html-parser html parser problem / bug
Projects
None yet
Development

No branches or pull requests

3 participants