how to process 2 pages in diffrent threads? #623

czcz1024 · 2021-04-16T07:27:18Z

i want to open 2 pages and create 2 thread,thread 1 process page1,and thread 2 process page 2.
i try this code

def run1(context):
    page = context.new_page()
    page.goto('https://page1')
    page.wait_for_timeout(5000)
    page.close()

def run2(context):
    page = context.new_page()
    page.goto('https://page2')
    page.wait_for_timeout(1000)
    page.close()

def main():
    with sync_playwright() as playwright:
        browser = playwright.chromium.launch(headless=False)
        context = browser.new_context()
        t=Thread(target=run1,args=(context,))
        t1=Thread(target=run2,args=(context,))
        t.start()
        t1.start()
        t.join()
        t1.join()
        context.close()
        browser.close()

but first open page 1 and 5 seconds later ,it opens page 2.
it oepens the pages one by one
how can i process multi page in diffrent thread at same time

kumaraditya303 · 2021-04-16T11:35:06Z

Playwright isn't thread safe so you need to start playwright separately for each thread or you can use asyncio

from playwright.sync_api import sync_playwright
from threading import Thread


def run1():
    with sync_playwright() as playwright:
        browser = playwright.chromium.launch(headless=False)
        page = browser.new_page()
        page.goto("https://google.com")
        page.wait_for_timeout(1000)
        page.close()


def run2():
    with sync_playwright() as playwright:
        browser = playwright.chromium.launch(headless=False)
        page = browser.new_page()
        page.goto("https://google.com")
        page.wait_for_timeout(1000)
        page.close()


def main():
    t = Thread(target=run1)
    t1 = Thread(target=run2)
    t.start()
    t1.start()
    t.join()
    t1.join()


if __name__ == "__main__":
    main()

mxschmitt · 2021-04-23T09:44:45Z

Closed as part of the triage and no response.

sla-te · 2021-05-12T14:14:37Z

Do I understand this one correctly, that if I create a new playwright object inside a newly opened thread it will be stable?

mxschmitt · 2021-05-12T14:17:32Z

exactly, this should work.

sla-te · 2021-05-12T14:28:52Z

Hmm, okay that confuses me because according to #470 playwright is not threadsafe, am I misinterpreting this issue?

mxschmitt · 2021-05-12T14:34:52Z

ah maybe I miss-interpreted it. If you create a new playwright instance in each thread, then it works fine. Sharing it with different threads does not work or is not stable.

sla-te · 2021-05-12T14:49:45Z

I understand you response as in, that in #470 the reply we got was incorrect, because as far as I understand the following snippet, - "create a new playwright instance in each thread" - is what we do:

from playwright.sync_api import sync_playwright
from concurrent.futures.thread import ThreadPoolExecutor

def sample_func():
    playwright_instance = sync_playwright().start()
    browser = playwright_instance.firefox.launch()
    page = browser.new_page()
    page.goto('http://whatsmyuseragent.org/')
    page.screenshot(path=f'example-{browser_type.name}.png')
    browser.close()

tpe = ThreadPoolExecutor()
for _ in range(100):
    tpe.submit(sample_func)
tpe.shutdown()

Is my assumption correct, that in contrary to the response in #470, that playwright is not threadsafe, that it is indeed threadsafe, if a new playwright instance is created inside each thread - As it is being done in the code snippet above?

mxschmitt · 2021-05-12T14:58:04Z

Yes this should work but I would not recommend spawning as many Playwright instances and do the scaling at that end. e.g. over multiple instances running on k8 and syncing over messaging queues like RabbitMQ would perform and scale better.

sla-te · 2021-05-12T15:02:23Z

Thank you for the quick responses.

I dont' fully understand what you mean by "do the scaling at that end, over multiple instances running on k8" - I am assuming, that you mean to not use Threading at all but create the logic in a way, that "if executed once it will execute one and only one instance" and then use kubernetes to have it run multiple times, like in the threading example?

kumaraditya303 · 2021-05-12T15:05:09Z

Fyi: @chwba Thread safe means that it is not safe to call method or access attributes from a different thread. You can always create a playwright in TLS and it would be safe.

sla-te · 2021-05-12T15:07:54Z

Fyi: @chwba Thread safe means that it is not safe to call method or access attributes from a different thread. You can always create a playwright in TLS and it would be safe.

Understood. The reason my colleague had opened #470 was, that using playwright, as described on the code snippet(s), we were experiencing heavy "weird" behaviour of playwright, as in web-elements not being detected, which did indeed exist, inexplicable exceptions, that did not make much sense on first sight - None of this was happening if running only one instance.

Now I am looking for the best possible way to run multiple instances of playwright (roughly 100 simultaneously running instances at max, at the moment), but keeping playwright stable still - Now after @mxschmitt 's response I have the feeling Threading would not be the right way to go, and Multiprocessing will eat up even more resources and expose additional challenges, especially if we create it async.

kumaraditya303 · 2021-05-12T15:12:37Z

For using playwright in a multithreaded environment it is recommended to use TLS hence it would be safe and it will never cause wierd exceptions. You can avoid multithreading if you choose to use ProcessPoolExecutor and then use TLS per process to isolate it and hence it would be safe.

sla-te · 2021-05-12T15:14:57Z

For using playwright in a multithreaded environment it is recommended to use TLS hence it would be safe and it will never cause wierd exceptions. You can avoid multithreading if you choose to use ProcessPoolExecutor and then use TLS per process to isolate it and hence it would be safe.

What do you mean by use TLS? From how I interpret "use TLS", it means for me connect to the target website via "https" instead of "http", is this correct or am I misunderstanding what you mean?

Maybe I should clarify: The application will be running on a single dedicated server (32cores, 256gb ram), and we are not an organization but rather freelancers, who create automatic testing solutions for the websites we create.

kumaraditya303 · 2021-05-12T15:18:06Z

Here by TLS I meant thread local storage which is like contextvars python module but for threads not asyncio. It is the recommended to use in multithreaded envs to store connection objs etc. and can also be used for Playwright

sla-te · 2021-05-12T15:21:47Z

Here by TLS I meant thread local storage which is like contextvars python module but for threads not asyncio. It is the recommended to use in multithreaded envs to store connection objs etc. and can also be used for Playwright

I have to admit, that I havent worked with contextvars before. - Would it be possible to maybe create a small snippet for us, that shows how you would suggest to go forward, keeping in mind the ThreadPoolExecutor snippet I had posted above?

EDIT: I did some digging regarding TLS and found https://stackoverflow.com/questions/1408171/thread-local-storage-in-python - do you mean, that we should instantiate the playwright instances inside the threads in such a thread-local namespace?

sla-te · 2021-05-12T19:49:19Z

Okay, so this is what we have come up with, hope, we have understood your suggestions correctly:

import random
import threading
from concurrent.futures.thread import ThreadPoolExecutor
from time import sleep

from loguru import logger
from playwright.sync_api import Playwright, BrowserType, BrowserContext, Page
from playwright.sync_api import sync_playwright


class Tls(threading.local):
    def __init__(self):
        self.playwright: Playwright = None
        self.browser: BrowserType = None
        self.context: BrowserContext = None
        self.page: Page = None


class Generator:
    tls = Tls()

    def __init__(self):
        pass

    def run(self, k):
        logger.info("THREAD: %s - ENTER" % k)

        self.tls.playwright = sync_playwright().start()
        self.tls.browser = self.tls.playwright.firefox.launch(headless=True)
        self.tls.context = self.tls.browser.new_context(
            bypass_csp=True,
            ignore_https_errors=True,
            color_scheme=random.choice(["dark", "light", "no-preference"]),
            timezone_id=None,
            geolocation={"longitude": 1, "latitude": 2},
            locale="en-US",
            java_script_enabled=True,
            user_agent=None,
        )
        self.tls.page = self.tls.context.new_page()
        self.tls.page.goto("https://google.com")
        self.tls.page.screenshot(path=f'{random.randint(100, 10000)}.png')

        self.tls.page.close()
        self.tls.context.close()
        self.tls.browser.close()
        self.tls.playwright.stop()

        logger.info("THREAD: %s - EXIT" % k)


if __name__ == "__main__":
    generators = list()
    tpe = ThreadPoolExecutor()
    for i in range(1, 11):
        generator = Generator()
        generators.append(generator)
        tpe.submit(generator.run, i)
        sleep(0.1)
    tpe.shutdown(wait=False)

    while sum([int(t.is_alive()) for t in tpe._threads]) > 1:
        sleep(3)

Going on from this, I would have two more questions regarding stability:

Is it ok, to create a new, individual context for each new page we define on top of one playwright->browser instance?
How many contexts/pages would you suggest to use at max per browser?
Which of the following (a,b) would you suggest to prefer?
a. One Playwright, one context, one page per thread
b. One Playwright, n contexts, n pages per thread [n to be advised from your side, (see 2.) ]

Im not sure how playwright works internally but I could imagine, that at a certain amount of pages/contexts inside one browser, which ultimately result in new browser tabs each, that it can get instable, if we open too many inside one browser instance.

kumaraditya303 · 2021-05-13T07:23:48Z

I have to admit, that I havent worked with contextvars before. - Would it be possible to maybe create a small snippet for us, that shows how you would suggest to go forward, keeping in mind the ThreadPoolExecutor snippet I had posted above?

@chwba

Here is a snippet which uses TLS i.e. Thread Local Storage and other best practises for multithreaded safe playwright script:

import threading
from playwright.sync_api import sync_playwright
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor


class Tls(threading.local):
    def __init__(self) -> None:
        self.playwright = sync_playwright().start()
        print("Create playwright instance in Thread", threading.current_thread().name)


class Worker:
    tls = Tls()

    def run(self):
        print("Launched worker in ", threading.current_thread().name)
        browser = self.tls.playwright.chromium.launch(headless=False)
        context = browser.new_context()
        page = browser.new_page()
        page.goto("http://whatsmyuseragent.org/")
        page.screenshot(path=f"example-{threading.current_thread().name}.png")
        page.close()
        context.close()
        browser.close()
        print("Stopped worker in ", threading.current_thread().name)


if __name__ == "__main__":
    with ThreadPoolExecutor(max_workers=5) as executor:
        for _ in range(50):
            worker = Worker()
            executor.submit(worker.run)

Here each thread creates its own playwright object if it is different thread else it reuses the playwright and uses tls to store it hence it is thread safe and wouldn't causes errors or race conditions, you can change max_workers as per your needs.

Here is the gist url https://gist.github.com/kumaraditya303/e6dee949dda298b35d167369955d45c6

kumaraditya303 · 2021-05-13T07:48:32Z

Is it ok, to create a new, individual context for each new page we define on top of one playwright->browser instance?

Yes, you should create a new context per thread

How many contexts/pages would you suggest to use at max per browser?

If you launch one playwright then it creates two subprocesses, one driver and one the browser you want to use hence, I would say you can create multiple contexts around ~3 per thread.

Which of the following (a,b) would you suggest to prefer?
a. One Playwright, one context, one page per thread
b. One Playwright, n contexts, n pages per thread [n to be advised from your side, (see 2.) ]

Do not try to create multiple playwright per thread, always use thread local to isolate each other,
As stated above 2~3 context per browser should be a good amount

kumaraditya303 · 2021-05-13T08:11:12Z

Maybe I should clarify: The application will be running on a single dedicated server (32cores, 256gb ram), and we are not an organization but rather freelancers, who create automatic testing solutions for the websites we create.

I missed that, if you have 32 cores then you should minimum use 3~4 contexts per thread and have 64 threads. That would be a good start

sla-te · 2021-05-13T11:51:03Z

Thank you for the thorough replies.

One last question: If we go ahead and use 3-4 contexts per thread, while each context will work on different tests - I haven't come up with a different approach but opening another 3-4 threads from inside this very Thread (which wouldn't work because these new threads would not have access to the thread-local playwright anymore). See the below code snippet to clarify:

import random
import threading
from concurrent.futures.thread import ThreadPoolExecutor
from time import sleep

from loguru import logger
from playwright.sync_api import Playwright, BrowserType, BrowserContext, Page
from playwright.sync_api import sync_playwright


class Tls(threading.local):
    def __init__(self):
        self.playwright: Playwright = None
        self.browser: BrowserType = None
        self.context: BrowserContext = None
        self.page: Page = None


class Generator:
    tls = Tls()

    def __init__(self):
        pass

    def run(self, k):
        logger.info("THREAD: %s - ENTER" % k)

        self.tls.playwright = sync_playwright().start()
        self.tls.browser = self.tls.playwright.firefox.launch(headless=True)

        # Create 3 different contexts
        self.tls.context = self.tls.browser.new_context( )
        self.tls.second_context = self.tls.browser.new_context( )
        self.tls.third_context = self.tls.browser.new_context( )
        
        # Create 3 different pages
        self.tls.page = self.tls.context.new_page()
        self.tls.second_page = self.tls.context.new_page()
        self.tls.third_page = self.tls.context.new_page()
        
        # Navigate to 3 different websites
        self.tls.page.goto("https://google.com")
        self.tls.second_page .goto("https://web.de")
        self.tls.page.third_page.goto("https://cnn.com")
        # do separate work on  each of the pages
        # how can we achieve concurrency in this scenario?    
    
        self.tls.page.close()
        self.tls.second_page.close()
        self.tls.third_page.close()

        self.tls.context.close()
        self.tls.second_context.close()
        self.tls.third_context.close()

        self.tls.browser.close()
        self.tls.playwright.stop()

        logger.info("THREAD: %s - EXIT" % k)


if __name__ == "__main__":
    generators = list()
    tpe = ThreadPoolExecutor()
    for i in range(1, 11):
        generator = Generator()
        generators.append(generator)
        tpe.submit(generator.run, i)
        sleep(0.1)
    tpe.shutdown(wait=False)

    while sum([int(t.is_alive()) for t in tpe._threads]) > 1:
        sleep(3)

kumaraditya303 · 2021-05-13T11:58:58Z

@chwba First I would recommend you to initialise playwright in the thread local constructor so that it can be reused as in my earlier example I gave earlier. I meant by 3~4 contexts that you want to process them synchronously, but as stated by you above that you need concurrency there too, then you should create one context per thread and parallelize them with thread, then you should try to increase thread by 64 to higher till your server handles it correctly since 64 threads means 64 subprocesses and may be you will be out of memory.

sla-te · 2021-05-17T06:24:38Z

@kumaraditya303 Thank you again for the thorough replies. I wanted to let you know, that we have now finished creating the solution - We went for 1 playwright/context/page per thread and were able to achieve 45-50 simultaneous threads running headless firefox browsers with a CPU load of ~85% on average with peaks to 100%, ram is no issue at all, we got only 30% used on full load. If we increase to 55 or more, the CPU load will stay at 100% though and it does not feel safe to do that over a long period x). Do you have any hints regarding how we could tweak playwright a little further to maybe squeeze out another couple threads?

PS: Regarding the local constructor, we had left it on None, because we thought it would start a playwright process also in the main process in that case and we only run playwright inside the threads, outside we only print statistics and do management of everything else.
PS 2: We are using an AMD EPYC 7502P with 128 GB DDR4 ECC RAM

kumaraditya303 · 2021-05-17T06:38:19Z

@chwba The CPU load is fair as there are would be around 50 processes running simultaneously of browsers, I don't think you could do more threads at this time. However, if you want even more performance out of the it, you can combine threading and asyncio to get even better performance which would give you more performance in the same CPU load but then you would use multiple contexts and hence it would use more RAM

simplPart · 2021-06-12T05:07:59Z

combine threading and asyncio

can you show how to do it correctly with asyncio? and does firefox have any arguments like https://stackoverflow.com/a/58589026 to reduce the load on the processors? (I have many contexts running. 1 browser per process and many contexts per thread).

first I start the process -> start the asyncio loop in the process -> start the browser -> create about 5 threads, transfer the browser -> start about 5 contexts per thread. - This loads the server heavily (6 cores, 32 RAM), cores are loaded at 100% and then the browser is simply closed and "Target page, context or browser has been closed" ((

Konano · 2021-10-17T11:04:46Z

combine threading and asyncio

can you show how to do it correctly with asyncio? and does firefox have any arguments like https://stackoverflow.com/a/58589026 to reduce the load on the processors? (I have many contexts running. 1 browser per process and many contexts per thread).

first I start the process -> start the asyncio loop in the process -> start the browser -> create about 5 threads, transfer the browser -> start about 5 contexts per thread. - This loads the server heavily (6 cores, 32 RAM), cores are loaded at 100% and then the browser is simply closed and "Target page, context or browser has been closed" ((

Same problem

kumaraditya303 · 2021-10-17T11:07:44Z

@Konano create a new issue for asyncio

Konano · 2021-10-17T11:10:39Z

@kumaraditya303 you mean this problem is related to asyncio, not to playwright?

kumaraditya303 · 2021-10-17T11:11:36Z

@kumaraditya303 you mean this problem is related to asyncio, not to playwright?

No create a new issue on playwright-python repo to separate the discussion from this one

mxschmitt added the triaging label Apr 16, 2021

mxschmitt closed this as completed Apr 23, 2021

Sagar95Chakole mentioned this issue Dec 4, 2021

[Question]: Running Playwright in multiple threads leaves threads running idle after execution #1062

Closed

mxschmitt mentioned this issue Mar 16, 2022

[Question]: Is there a way to run a playwright instance in a background task? #1207

Closed

rwoll mentioned this issue Jul 11, 2022

[BUG] greenlet.error: cannot switch to a different thread #1422

Closed

LeilaSchooley mentioned this issue Nov 1, 2022

[Question]: Creating a playwright instance per thread with the async api #1619

Closed

brandonrobertz mentioned this issue Jan 31, 2023

cannot switch to a different thread llm-workflow-engine/llm-workflow-engine#77

Closed

sanosuke009 mentioned this issue Mar 18, 2023

Handle Multithreading in Python sanosuke009/PoseidonFramework#29

Open

dgtlmoon mentioned this issue Apr 8, 2023

[Question]: Since playwright-python uses a single node via UNIX pipes, can multiple sync_playwright() use a single node process? #1850

Closed

michaeleveringham mentioned this issue Oct 5, 2023

[Question] Is there any reason not to use greenlets to achieve concurrency with a single browser instance in the sync API? #2101

Closed

skhrlx mentioned this issue Feb 20, 2024

[BUG] Vinyzu/Botright#58

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to process 2 pages in diffrent threads? #623

how to process 2 pages in diffrent threads? #623

czcz1024 commented Apr 16, 2021 •

edited by mxschmitt

Loading

kumaraditya303 commented Apr 16, 2021

mxschmitt commented Apr 23, 2021

sla-te commented May 12, 2021

mxschmitt commented May 12, 2021

sla-te commented May 12, 2021

mxschmitt commented May 12, 2021

sla-te commented May 12, 2021

mxschmitt commented May 12, 2021

sla-te commented May 12, 2021

kumaraditya303 commented May 12, 2021

sla-te commented May 12, 2021 •

edited

Loading

kumaraditya303 commented May 12, 2021

sla-te commented May 12, 2021 •

edited

Loading

kumaraditya303 commented May 12, 2021 •

edited

Loading

sla-te commented May 12, 2021 •

edited

Loading

sla-te commented May 12, 2021 •

edited

Loading

kumaraditya303 commented May 13, 2021 •

edited

Loading

kumaraditya303 commented May 13, 2021 •

edited

Loading

kumaraditya303 commented May 13, 2021 •

edited

Loading

sla-te commented May 13, 2021 •

edited

Loading

kumaraditya303 commented May 13, 2021 •

edited

Loading

sla-te commented May 17, 2021 •

edited

Loading

kumaraditya303 commented May 17, 2021 •

edited

Loading

simplPart commented Jun 12, 2021 •

edited

Loading

Konano commented Oct 17, 2021

kumaraditya303 commented Oct 17, 2021

Konano commented Oct 17, 2021

kumaraditya303 commented Oct 17, 2021

how to process 2 pages in diffrent threads? #623

how to process 2 pages in diffrent threads? #623

Comments

czcz1024 commented Apr 16, 2021 • edited by mxschmitt Loading

kumaraditya303 commented Apr 16, 2021

mxschmitt commented Apr 23, 2021

sla-te commented May 12, 2021

mxschmitt commented May 12, 2021

sla-te commented May 12, 2021

mxschmitt commented May 12, 2021

sla-te commented May 12, 2021

mxschmitt commented May 12, 2021

sla-te commented May 12, 2021

kumaraditya303 commented May 12, 2021

sla-te commented May 12, 2021 • edited Loading

kumaraditya303 commented May 12, 2021

sla-te commented May 12, 2021 • edited Loading

kumaraditya303 commented May 12, 2021 • edited Loading

sla-te commented May 12, 2021 • edited Loading

sla-te commented May 12, 2021 • edited Loading

kumaraditya303 commented May 13, 2021 • edited Loading

kumaraditya303 commented May 13, 2021 • edited Loading

kumaraditya303 commented May 13, 2021 • edited Loading

sla-te commented May 13, 2021 • edited Loading

kumaraditya303 commented May 13, 2021 • edited Loading

sla-te commented May 17, 2021 • edited Loading

kumaraditya303 commented May 17, 2021 • edited Loading

simplPart commented Jun 12, 2021 • edited Loading

Konano commented Oct 17, 2021

kumaraditya303 commented Oct 17, 2021

Konano commented Oct 17, 2021

kumaraditya303 commented Oct 17, 2021

czcz1024 commented Apr 16, 2021 •

edited by mxschmitt

Loading

sla-te commented May 12, 2021 •

edited

Loading

sla-te commented May 12, 2021 •

edited

Loading

kumaraditya303 commented May 12, 2021 •

edited

Loading

sla-te commented May 12, 2021 •

edited

Loading

sla-te commented May 12, 2021 •

edited

Loading

kumaraditya303 commented May 13, 2021 •

edited

Loading

kumaraditya303 commented May 13, 2021 •

edited

Loading

kumaraditya303 commented May 13, 2021 •

edited

Loading

sla-te commented May 13, 2021 •

edited

Loading

kumaraditya303 commented May 13, 2021 •

edited

Loading

sla-te commented May 17, 2021 •

edited

Loading

kumaraditya303 commented May 17, 2021 •

edited

Loading

simplPart commented Jun 12, 2021 •

edited

Loading