Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for sometime ( webpage loading fully)before taking the screenshot #22

Closed
chenyixin-2 opened this issue Jun 12, 2019 · 12 comments
Closed

Comments

@chenyixin-2
Copy link

Hi, I am not very familiar with phantomjs and chrome's api.
So how should I change the source code to take the screenshot after the webpage is fully-loaded ?

@maaaaz
Copy link
Owner

maaaaz commented Jun 12, 2019

Hello,

To my understanding there's no easy way to know if a page is fully loaded or not. That's why I chose the lazy rendering method which allow good results.

So try to play with the -t timeout option.

Otherwise:

  • if you use the phantomjs renderer, play with the webscreenshot.js file
  • if you use the chrome renderer, unfortunately there's no way to change how chrome behaves without using a driver like selenium (or another method), which is currently not implemented in webscreenshot

Cheers.

@maaaaz maaaaz closed this as completed Jun 12, 2019
@0xmilan
Copy link

0xmilan commented Jan 6, 2020

The -t timeout option has no effect on this, the default value is 30 seconds already.

I've been experimenting with increasing ajaxTimeout and maxTimeout in webscreenshot.js.
Here is an example with the default values of 400 and 800:
berkeley_empty

Here is the screenshot after adding 1000 to both values (1400 and 1800):
berkeley_full

Can we add an option (-W, --wait) to pass these values to the python script?

@maaaaz
Copy link
Owner

maaaaz commented Jan 7, 2020

@milangfx, thanks.
When you increased these values, did you encounter more screenshot failures due to the -t timeout value conflicting with the phantomjs ones (like, more screenshots fail to finish because they wait longer) ?
Do you think 1400 and 1800 could be safely used as default values ?

@0xmilan
Copy link

0xmilan commented Jan 7, 2020

Good questions. I didn't have any failures, I was not using the -t timeout option, only changed the values in webscreenshot.js
I think the 1.4 s and 1.8 s wouldn't conflict with the default 30 sec timeout.
I'm not even sure how they relate to each other. I assume the main timeout option (default 30 s) is only relevant to the Chrome and Firefox renderers since PhantomJS has its own settings in webscreenshot.js.

If I remember correctly, at one point a page didn't fully load even with 1400 and 1800, so a bit higher values might be needed for consistent results, something like 2400 - 2800 (?)

I've only tested this with individual URLs so far.
I will check the increased timeouts with a huge list of URLs and compare it to the defaults values.

My only concern is that this could potentially increase the run time a lot if multiple URLs don't load immediately (or before the default 400 - 800).
So I'm not sure yet about using 1400 and 1800 as default values.

@maaaaz
Copy link
Owner

maaaaz commented Jan 7, 2020

Good questions. I didn't have any failures, I was not using the -t timeout option, only changed the values in webscreenshot.js
I think the 1.4 s and 1.8 s wouldn't conflict with the default 30 sec timeout.
I'm not even sure how they relate to each other. I assume the main timeout option (default 30 s) is only relevant to the Chrome and Firefox renderers since PhantomJS has its own settings in webscreenshot.js.

No, the -t option applies to any renderer: if the renderer reaches that timeout, a SIGKILL is sent to the process.

If I remember correctly, at one point a page didn't fully load even with 1400 and 1800, so a bit higher values might be needed for consistent results, something like 2400 - 2800 (?)

I've only tested this with individual URLs so far.
I will check the increased timeouts with a huge list of URLs and compare it to the defaults values.

Yes that would be appreciated, run $ time webscreenshot [options] and dont hesitate to post execution results.

My only concern is that this could potentially increase the run time a lot if multiple URLs don't load immediately (or before the default 400 - 800).

I think I've already did these kind of tests far in the past, I don't really remember the results but that global increase of duration actually rings a bell to me.

So I'm not sure yet about using 1400 and 1800 as default values.

If the tests show that the global duration is increased, I'll keep the current values but implement an option to handle these parameters and document somewhere that they should be specified in case of partial screenshots.

@0xmilan
Copy link

0xmilan commented Jan 9, 2020

No, the -t option applies to any renderer: if the renderer reaches that timeout, a SIGKILL is sent to the process.

What I meant is that if PhantomJS already stops at the 800 ms maxTimeout specified in webscreenshot.js, then the main -t timeout won't be relevant.

I ran three test on 100 URLs, one with the default timeout values, one with 1000 ms added and one with 1500 ms added.

ajaxTimeout: 400, maxTimeout: 800
python webscreenshot.py -v -i 100URLs  102,07s user 15,79s system 167% cpu 1:10,30 total
40 pages loaded, 60 didn't load
ajaxTimeout: 1400, maxTimeout: 1800
python webscreenshot.py -v -i 100URLs  105,77s user 15,94s system 124% cpu 1:37,95 total
97 pages loaded, 3 didn't load
ajaxTimeout: 1900, maxTimeout: 2300,
python webscreenshot.py -v -i 100URLs  105,79s user 16,72s system 117% cpu 1:44 /2m-15,5s
100 pages loaded

So there's a trade-off between run time and pages actually loading.
Having a higher max timeout doesn't affect the pages that would load quickly anyway, but obviously having to wait more for individual pages does add up and results in an overall duration increase.

@maaaaz
Copy link
Owner

maaaaz commented Jan 9, 2020

I'm not sure to read well the figures, the total time is 1m10s (70s)for the first case, 1m37s (97s) for the second and 1m44s (104s) for the third one ? It's only +50% duration increase for more than +100% successful screenshots.

It is worth it, the primary goal of such tool is to perform the maximum number of successful screenshots.

The execution duration is already addressed through multiprocessing and cannot/doesn't have to be more optimized by lowering the number of successful results.

So I might use the 1900/2300 values and offer a user option to specify them.

Cheers.

@0xmilan
Copy link

0xmilan commented Jan 9, 2020

the total time is 1m10s (70s)for the first case, 1m37s (97s) for the second and 1m44s (104s) for the third one ?

Correct.

It's only +50% duration increase for more than +100% successful screenshots.

Yeah, depends on how you define successful. In my example above, the blank page technically loaded successfully, but there was important content missing since I also wanted the mailing lists to show up so I had to wait a bit longer.

This is just a test with Google Groups, really. Other pages might behave differently.
For example it might be that you have everything important already loaded with the default 400 - 800 timeouts and increasing that would only load more ads on the page. I don't know.

What's important content will always depend on the user. Maybe the user wants the ads to load and see how they are displayed.

If you want to set a higher default, I would go for around ajaxTimeout: 1400, maxTimeout: 1800. Then let users know in the README how to change it manually in webscreenshot.js if they don't see the results they want or wire the timeout values to a command line option.

A too high default max timeout can hang the process unnecessarily, e.g. if there's an ad server not responding.

@maaaaz
Copy link
Owner

maaaaz commented Jan 9, 2020

Got it, that's clear.

@maaaaz
Copy link
Owner

maaaaz commented Jan 11, 2020

--ajax-max-timeouts option added and default values changed in v2.8

@0xmilan
Copy link

0xmilan commented Jan 12, 2020

Thanks for the quick implementation! Works like a charm.

@maaaaz
Copy link
Owner

maaaaz commented Jan 12, 2020

Thank you for your feedbacks @milangfx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants