Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use webscreenshot from inside a python script? #19

Open
own3mall opened this issue Apr 7, 2019 · 11 comments
Open

How to use webscreenshot from inside a python script? #19

own3mall opened this issue Apr 7, 2019 · 11 comments

Comments

@own3mall
Copy link

own3mall commented Apr 7, 2019

The documentation states:

pip install webscreenshot and then directly use webscreenshot

How does one directly use webscreenshot?

My python script contains:

import webscreenshot

Now, how do I call webscreenshot directly from the script? The documentation doesn't provide any examples. It does for calling the script from the commandline and passing arguments, but I want to call it directly from inside my python script.

webscreenshot.take_screenshot(list_of_urls) doesn't seem to work.

@maaaaz
Copy link
Owner

maaaaz commented May 19, 2019

Hello,

You indeed need to call that function.
But before that you need a proper options variable with parameters specified inside: launch the tool with -vv option and you will see the structure of that variable here

Cheers.

@maaaaz
Copy link
Owner

maaaaz commented Jul 13, 2019

Hello,

Here below a more precise answer:

import argparse
from webscreenshot.webscreenshot import *

# url list to screenshot
url_list = ['http://google.fr', 'http://google.com']

# defining options manually
options = argparse.Namespace(URL=None, cookie=None, header=None, http_password=None, http_username=None, input_file=None, log_level='DEBUG', multiprotocol=False, no_xserver=False, output_directory='/tmp/screenshots', port=None, proxy=None, proxy_auth=None, proxy_type=None, renderer='phantomjs', renderer_binary=None, ssl=False, timeout=30, verbosity=2, window_size='1200,800', workers=4)

# actually launching the function
take_screenshot(url_list, options)

I admit that this use case deserves a better approach.

Cheers

@maaaaz
Copy link
Owner

maaaaz commented Jan 5, 2020

For the reference, I maintain an updated version of the correct code in the FAQ

@ss2sfcollege
Copy link

I'm getting this error on using the above code snippet

`[+] 2 URLs to be screenshot
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.8/site-packages/webscreenshot/webscreenshot.py", line 421, in craft_cmd
output_format = options.format if options.renderer == 'phantomjs' else 'png'
AttributeError: 'Namespace' object has no attribute 'format'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/aditya/GIT/Web/test.py", line 11, in
take_screenshot(url_list, options)
File "/usr/lib/python3.8/site-packages/webscreenshot/webscreenshot.py", line 525, in take_screenshot
taken_screenshots = [r for r in pool.imap(func=craft_cmd, iterable=izip(url_list, itertools.repeat(options)))]
File "/usr/lib/python3.8/site-packages/webscreenshot/webscreenshot.py", line 525, in
taken_screenshots = [r for r in pool.imap(func=craft_cmd, iterable=izip(url_list, itertools.repeat(options)))]
File "/usr/lib/python3.8/multiprocessing/pool.py", line 865, in next
raise value
AttributeError: 'Namespace' object has no attribute 'format'
`

@maaaaz
Copy link
Owner

maaaaz commented Apr 24, 2020

@ss2sfcollege, have you followed indications.
If yes, it's weird, as the format option is declared in the code sample.

@poornasandeep
Copy link

Hello,

I'm getting the following error if executed the above program:

C:\Users\sandeep\PycharmProjects\sparkflow_validation\venv\Scripts\python.exe C:/Users/sandeep/PycharmProjects/sparkflow_validation/take_screenshot.py
[+] 2 URLs to be screenshot
[+] 2 URLs to be screenshot
[+] 2 URLs to be screenshot
[+] 2 URLs to be screenshot
[+] 2 URLs to be screenshot
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\runpy.py", line 262, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\runpy.py", line 95, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\sandeep\PycharmProjects\sparkflow_validation\take_screenshot.py", line 11, in
take_screenshot(url_list, options)
File "C:\Users\sandeep\PycharmProjects\sparkflow_validation\venv\lib\site-packages\webscreenshot\webscreenshot.py", line 523, in take_screenshot
pool = multiprocessing.Pool(processes=int(options.workers), initializer=init_worker)
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 212, in init
self._repopulate_pool()
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 326, in _repopulate_pool_static
w.start()
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\context.py", line 326, in _Popen
return Popen(process_obj)
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

The same error is getting in a loop and the program is not terminating

@maaaaz
Copy link
Owner

maaaaz commented May 6, 2020

@poornasandeep can you paste here the code you are using to call webscreenshot ?

@YusufRoshdy
Copy link

@maaaaz I am getting the same error as @poornasandeep. Here is the code I am using (taken from the FAQ):

import argparse
from webscreenshot.webscreenshot import *

url_list = ['http://google.com']

options = argparse.Namespace(URL=None, cookie=None, header=None, http_password=None, http_username=None, input_file=None, log_level='DEBUG', multiprotocol=False, no_xserver=False, output_directory='./screenshots', port=None, proxy=None, proxy_auth=None, proxy_type=None, renderer='phantomjs', renderer_binary=None, ssl=False, timeout=30, verbosity=2, window_size='1200,800', workers=4)

take_screenshot(url_list, options)

It does not terminate and it keeps printing [+] 1 URLs to be screenshot forever.

I am using Python 3.8.3 on Windows 10 (2004 update), with version 2.92 of the webscreenshot package.

Here is the error stack
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Program Files\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Program Files\Python38\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Program Files\Python38\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Program Files\Python38\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Program Files\Python38\lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Program Files\Python38\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Program Files\Python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "d:\upwork\Nikhil Parekh\SMTP\mail with html\utilities.py", line 11, in <module>
    take_screenshot(url_list, options)
  File "C:\Users\yusuf\AppData\Roaming\Python\Python38\site-packages\webscreenshot\webscreenshot.py", line 535, in take_screenshot
    pool = multiprocessing.Pool(processes=int(options.workers), initializer=init_worker)
  File "C:\Program Files\Python38\lib\multiprocessing\context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 212, in __init__
    self._repopulate_pool()
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 303, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 326, in _repopulate_pool_static
    w.start()
  File "C:\Program Files\Python38\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Program Files\Python38\lib\multiprocessing\context.py", line 326, in _Popen
    return Popen(process_obj)
  File "C:\Program Files\Python38\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Program Files\Python38\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "C:\Program Files\Python38\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

@maaaaz
Copy link
Owner

maaaaz commented Jul 25, 2020

Thanks for reporting, it seems related to the way Python 3.8 now behaves with multiprocessing.

I think that the pool creation (that line) should be moved to the main() function, as suggested on different cases

In the meantime, try to execute your code with Python 3.7 and not 3.8.

@maaaaz
Copy link
Owner

maaaaz commented Aug 16, 2020

I confirm that bug, I tried to fix it but unfortunately failed so far in front of this madness.

I do understand the technical reasons, but I regret that users calling webscreenshot from alternate scripts will have to handle multiprocessing by themselves instead of webscreenshot doing it on its own.

@Concept211
Copy link

An alternative would be to run it as a subprocess which seems to be working fine for me on Python 3.8:

import subprocess

subprocess.run('webscreenshot google.com --window-size 800,600')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants