-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Soup not properly parsing HTML from gtoolkit.com #7
Comments
I managed to eventually get it to properly parse the html from gtoolkit.com, but only after manually removing duplicate DOM elements that shared the same class name from the source body |
issue still persists in Pharo 10 |
Can you provide a HTML sample that is not working? |
view-source:https://gtoolkit.com/ |
I've run into a similar problem. Its because the retrieveContents for pharo.org returns a ByteString and the retrieveContents for gtoolkit.com returns a WideString. It appears that Soup has an issue with WideStrings. |
Soup code is old so it should be probably ported to modern pharo. Pharo manages well the encodings and the rest. |
Seems to work fine on pharo.org, but not gtoolkit.com
The text was updated successfully, but these errors were encountered: