Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some CSS query find elements very slowly #1956

Closed
821938089 opened this issue May 16, 2023 · 7 comments
Closed

Some CSS query find elements very slowly #1956

821938089 opened this issue May 16, 2023 · 7 comments
Assignees
Milestone

Comments

@821938089
Copy link
Contributor

Example document : https://www.123duw.com/dudu-38/51311/
CSS query : dl > dt:nth-of-type(2) ~ dd > a
In jsoup, It took about 4 minutes.

In Chrome, it took only 50ms to complete.
image

jsoup version : 1.16.1

@jhy
Copy link
Owner

jhy commented May 29, 2023

What platform are you on? I tried this and on my M1, it took 7.1 seconds (first run, no hotspot etc). Certainly it would be good to improve that, but I wonder if something else is impacting the speed?

@821938089
Copy link
Contributor Author

821938089 commented May 29, 2023

I am running on Android platform and my phone chip is Snapdragon 660.

[Snipped large images]

@821938089
Copy link
Contributor Author

This amount of operations looks like an exponential increase.

You can try how long this document takes: https://www.123duw.com/dudu-32/50731/

@jhy
Copy link
Owner

jhy commented May 29, 2023

OK, surprised that it performs so differently on Android. I guess the combination of the high query count and the memory allocation in nth-of-type is causing the difference in.

Yes, I believe the ~ (preceding sibling) combinator is causing the query count to go exponential. Similar to repeatedly backtracking to try every variant. If we can rein that in, would expect a massive performance improvement.

@jhy jhy self-assigned this May 29, 2023
@jhy jhy closed this as completed in 10ef981 May 29, 2023
@jhy jhy added the fixed label May 29, 2023
@jhy jhy added this to the 1.16.2 milestone May 29, 2023
@jhy
Copy link
Owner

jhy commented May 29, 2023

Thanks for reporting this! I have optimized the selector by changing the query evaluation order of the any preceding sibling operator, and by memoizing the results. On my M1 this moves the execution time from ~ 7 seconds to 0.01 seconds.

I also improved the memory consumption (by removing ephemeral calls to children(), which allocates a new Elements list on each hit).

So I expect you will see a good improvement; if you could test with a snapshot version and report back that would be great!

Future work would be to improve in what order the And evaluator is executed, so simpler expressions (like a tag test) occur before slower expressions like nth-of-type.

@821938089
Copy link
Contributor Author

Yes, I see a huge improvement. The execution time has been reduced from 4 min 30 s to 125 ms.

Test Result

IMG_20230529_165548

@jhy
Copy link
Owner

jhy commented May 30, 2023

I made the memoization a bit more generic, so that it applies to each of the Structural Evaluators. And made it more correct so that reused Evaluators have a ThreadLocal memo.

c57e683

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants