Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeouts making python aiohttp requests in recent WSL2 kernel but not WSL1 or older WSL2 kernel #9565

Closed
1 of 2 tasks
ntextreme3 opened this issue Jan 31, 2023 · 4 comments
Closed
1 of 2 tasks

Comments

@ntextreme3
Copy link

ntextreme3 commented Jan 31, 2023

Version

Microsoft Windows [Version 10.0.19043.2364]

WSL Version

  • WSL 2
  • WSL 1

Kernel Version

5.15.79.1-microsoft-standard-WSL2

Distro Version

Ubuntu 20.04 and 22.04

Other Software

tested in python3.8.16, 3.10.6, 3.11.1
latest aiohttp==3.8.3

Repro Steps

steps

I ran into this on Ubuntu-20.04, but just to be sure I:

  • created a new Ubuntu-22.04 distro from scratch via Microsoft Store
  • installed just a couple python versions (py38, 310, 311 via pyenv)
  • setup new virtualenvs with those python versions
  • installed aiohttp==3.8.3 in the various virtualenvs
  • created this script and ran it to test what I was seeing

script

minimal reproducible example

NOTE: I'm behind a corporate proxy which is why I'm using trust_env in the script, so I can call this like https_proxy=blah python issue.py and hit public URLs for just this example -- however the real example of this that led me here is hitting internal URLs and does not use a proxy. Proxy should be irrelevant here.

import asyncio
import traceback

import aiohttp


async def put_links(queue):
    for _ in range(10):
        await queue.put("https://httpbin.org/get")
        print(f"put link in queue")


async def get_links(i, queue, session):
    print(f"worker {i}: started")
    while True:
        url = await queue.get()
        try:
            resp = await session.head(url)
            print(f"worker {i}: {resp.status}")
        except Exception:
            print(f"worker {i}: failed, {traceback.format_exc()}")
        queue.task_done()


async def start_getters(queue):
    num_workers = 5
    print(f"starting {num_workers} workers")
    async with aiohttp.ClientSession(trust_env=True, timeout=aiohttp.ClientTimeout(5)) as session:
        workers = [asyncio.Task(get_links(i, queue, session)) for i in range(num_workers)]
        await queue.join()
        print("done, cancelling workers")
        for worker in workers:
            worker.cancel()


async def main():
    queue = asyncio.Queue()
    await put_links(queue)
    await start_getters(queue)
    print("done")


if __name__ == "__main__":
    asyncio.run(main())

observations

When I run it as WSL2 with kernel 5.15.79.1-microsoft-standard-WSL2, it hangs, consistently after the 7th result is printed 😵. Always happening on >7th looks like it's because only workers 0 and 1 ever complete and continue to pull items from the queue; and 2, 3, and 4 hang from the start and never unblock, leaving only those 3 at the end to timeout.

I then had a friend test this on their [slightly] older WSL2 kernel 5.10.102.1-microsoft-standard-WSL2, and it succeeds there.

As I was starting to wonder if this was a kernel issue, I tried to see what WSL1 would give me. If I take that same brand new Ubuntu-22 and run wsl --set-version Ubuntu-22.04 1 to convert it to WSL1, with kernel 4.4.0-19041-Microsoft, it succeeds perfectly every time; same as on my friend's older kernel.

I can swap back and forth between converting that distro from WSL2 to WSL1 and back and it always consistently fails the same way in my WSL2.

I see that the wslconfig spec does include custom kernel paths, but I don't know enough about building a custom kernel to know if I was doing it right, if I would have to rebuild/reinstall Python, system libs, etc. to know if I was accurately testing the right thing. (tbh, I'm not sure if just doing --set-version to WSL1 is an accurate test in that respect either 😅).

Expected Behavior

This script should succeed and not hang or timeout.

expected output
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
starting 5 workers
worker 0: started
worker 1: started
worker 2: started
worker 3: started
worker 4: started
worker 2: 200
worker 3: 200
worker 0: 200
worker 1: 200
worker 0: 200
worker 1: 200
worker 0: 200
worker 2: 200
worker 4: 200
worker 3: 200
done, cancelling workers
done

Actual Behavior

output from a failing run
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
starting 5 workers
worker 0: started
worker 1: started
worker 2: started
worker 3: started
worker 4: started
worker 1: 200
worker 0: 200
worker 0: 200
worker 1: 200
worker 1: 200
worker 1: 200
worker 0: 200
worker 2: failed, Traceback (most recent call last):
  File "issue.py", line 18, in get_links
    resp = await session.head(url)
  File "/home/ntrenchi/temp/wsl-asyncio-issue/env/lib/python3.8/site-packages/aiohttp/client.py", line 637, in _request
    break
  File "/home/ntrenchi/temp/wsl-asyncio-issue/env/lib/python3.8/site-packages/aiohttp/helpers.py", line 720, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError

worker 3: failed, Traceback (most recent call last):
  File "issue.py", line 18, in get_links
    resp = await session.head(url)
  File "/home/ntrenchi/temp/wsl-asyncio-issue/env/lib/python3.8/site-packages/aiohttp/client.py", line 637, in _request
    break
  File "/home/ntrenchi/temp/wsl-asyncio-issue/env/lib/python3.8/site-packages/aiohttp/helpers.py", line 720, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError

worker 4: failed, Traceback (most recent call last):
  File "issue.py", line 18, in get_links
    resp = await session.head(url)
  File "/home/ntrenchi/temp/wsl-asyncio-issue/env/lib/python3.8/site-packages/aiohttp/client.py", line 637, in _request
    break
  File "/home/ntrenchi/temp/wsl-asyncio-issue/env/lib/python3.8/site-packages/aiohttp/helpers.py", line 720, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError

done, cancelling workers
done

If I take out the timeout and just let it hang and just ctrl+C out of it, I get:

Traceback (most recent call last):
  File "issue.py", line 44, in <module>
    asyncio.run(main())
  File "/home/ntrenchi/.pyenv/versions/3.8.16/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/ntrenchi/.pyenv/versions/3.8.16/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete
    self.run_forever()
  File "/home/ntrenchi/.pyenv/versions/3.8.16/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
    self._run_once()
  File "/home/ntrenchi/.pyenv/versions/3.8.16/lib/python3.8/asyncio/base_events.py", line 1823, in _run_once
    event_list = self._selector.select(timeout)
  File "/home/ntrenchi/.pyenv/versions/3.8.16/lib/python3.8/selectors.py", line 468, in select
    fd_event_list = self._selector.poll(timeout, max_ev)
KeyboardInterrupt

If I check strace, at the time when it's hanging it's on epoll_wait

epoll_wait(3, 0x7f2de7448bc0, 6, 14843) = -1 EINTR (Interrupted system call)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---

Diagnostic Logs

No response

@joeriddles
Copy link

joeriddles commented Feb 2, 2023

@ntextreme3 I started experiencing similar issues two days ago as well, both using Django runserver and the built-in http.server.

This issue may be related #5018 (comment)

@joeriddles
Copy link

joeriddles commented Feb 3, 2023

@ntextreme3 does updating to today's released version of WSL fix your problem?

See this thread: #9508 (comment)

@elsaco
Copy link

elsaco commented Feb 3, 2023

Sample output using python-3.11.1 and kernel 5.15.83.1-microsoft-standard-WSL2 on Ubuntu-22.04:

(pyTest) elsaco@RIPPER:~/foo$ python issue.py
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
starting 5 workers
worker 0: started
worker 1: started
worker 2: started
worker 3: started
worker 4: started
worker 1: 200
worker 0: 200
worker 3: 200
worker 2: 200
worker 0: 200
worker 3: 200
worker 2: 200
worker 1: 200
worker 4: 200
worker 0: 200
done, cancelling workers
done
(pyTest) elsaco@RIPPER:~/foo$ python issue.py
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
put link in queue
starting 5 workers
worker 0: started
worker 1: started
worker 2: started
worker 3: started
worker 4: started
worker 1: 200
worker 3: 200
worker 4: 200
worker 2: 200
worker 1: 200
worker 3: 200
worker 4: 200
worker 2: 200
worker 0: 200
worker 1: 200
done, cancelling workers
done

Didn't try other Python versions. WSL used for testing:

WSL version: 1.1.2.0
Kernel version: 5.15.83.1
WSLg version: 1.0.49
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.2546

@ntextreme3
Copy link
Author

ntextreme3 commented Feb 3, 2023

Edit (in case this helps anyone)

tl;dr - updating MTU fixed this for me and no longer times out.

It was late when I originally resolved this1. I updated Windows and didn't realize I didn't have my VPN turned back on after restarting. After noticing that VPN was the issue, it was easier to find a fix -- as there are plenty of similar issues full of people reporting that changing the MTU resolved their issue:

Note: It still works fine for my co-worker on the older kernel even with VPN connected.

I did another wsl --update today, but the issue still exists. I'm leaving this closed though since I believe it's covered under the tons of other VPN related tickets.

Current wsl --version (2023-03-03):

WSL version: 1.1.3.0
Kernel version: 5.15.90.1
WSLg version: 1.0.49
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.2604

Footnotes

  1. My original, no longer relevant comment when closing this ticket

    I noticed your Windows version above was different. Nothing was showing for me in Windows Update, so I used the Windows Update Assistant to update from 21H1 to 22H2.

    Before:

    WSL version: 1.0.3.0
    Kernel version: 5.15.79.1
    WSLg version: 1.0.47
    MSRDC version: 1.2.3575
    Direct3D version: 1.606.4
    DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
    Windows version: 10.0.19043.2364
    

    After:

    WSL version: 1.0.3.0
    Kernel version: 5.15.79.1
    WSLg version: 1.0.47
    MSRDC version: 1.2.3575
    Direct3D version: 1.606.4
    DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
    Windows version: 10.0.19045.2486
    

    Works now with just new Windows version 🤷‍♂️ ... Thanks!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants