Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race conditions at scale leading to security issues. #186

Open
parikls opened this issue Aug 3, 2024 · 9 comments · May be fixed by #189
Open

Race conditions at scale leading to security issues. #186

parikls opened this issue Aug 3, 2024 · 9 comments · May be fixed by #189
Assignees
Labels
bug Something isn't working security Security-related issues and vulnerabilities.

Comments

@parikls
Copy link

parikls commented Aug 3, 2024

My use case:

Users are logging in via an SSO, and after that I'm using the access token of the logged-in user to obtain additional information from the provider.
In my view I'm getting the access token using the provider property, e.g. provider.access_token. But when I'm running the code on a big scale (we have hundreds of thousands users) - sometimes one user can obtain the access token of the other user, which leads to serious security issues. This happens rarely, and only if multiple users are logging-in at the same time (microseconds differences).
After investigating the source codes of the fastapi-sso, I've figured out that the problem is actually with the provider implementation. SSOBase class is stateful and a process_login coro has multiple awaits after fetching the token, so when there are multiple simultaneous users are logging in via the sso - this leads to a race condition, and the first user will obtain the token of the last user.

My code which I'm currently using.

lifespan.py


def create_my_sso_provider(app: FastAPI):
    discovery = {
        "authorization_endpoint": '...',
        "token_endpoint": '...',
        "userinfo_endpoint": '...',
    }
    provider = create_provider(
        name='provider,
        discovery_document=discovery,
        response_convertor=openid_from_response
    )

    app.teachable_sso = provider(
        client_id=...,
        client_secret=...,
        redirect_uri=...,
        scope=...,
    )

api.py

async def my_view(...):
    sso_user: OpenID = await app.teachable_sso.verify_and_process(request, redirect_uri=...)
    access_token, refresh_token = teachable_sso.access_token, teachable_sso.refresh_token
    await do_request_to_sso_provider_to_fetch_more_data(access_token)

I've wrote a very quick and dirty test case for the fastapi-sso to prove that this is actually an issue:

import asyncio
from unittest.mock import Mock, patch

from starlette.datastructures import URL

from fastapi_sso import create_provider


async def test__race_condition():
    discovery = {
        "authorization_endpoint": "https://authorization_endpoint",
        "token_endpoint": "https://token_endpoint",
        "userinfo_endpoint": "https://userinfo_endpoint",
    }
    factory = create_provider(name='provider', discovery_document=discovery)

    provider = factory(client_id='client_id', client_secret='client_secret')

    # mock for the response. will return a token which we've set
    class Response:

        def __init__(self, token: str):
            self.token = token

        def json(self):
            return {
                'refresh_token': self.token,
                'access_token': self.token,
                'id_token': self.token,
            }

    # mock of the httpx client
    class AsyncClient:

        post_responses = []  # list of the responses which a client will return for the `POST` requests

        def __init__(self):
            self.headers = {}

        async def __aenter__(self):
            return self

        async def __aexit__(self, exc_type, exc_val, exc_tb):
            pass

        async def post(self, *args, **kwargs):
            await asyncio.sleep(0)  # simulate a loop switch cos it'll happen during a real HTTP request
            return self.post_responses.pop()

        # we actually don't care what GET will return for this particular test,
        # but this method is required to fully run the `process_login` code
        async def get(self, *args, **kwargs):
            await asyncio.sleep(0)
            return Response(token='')

    with patch('fastapi_sso.sso.base.httpx') as httpx:
        httpx.AsyncClient = AsyncClient

        first_response = Response(token='first_token')
        second_response = Response(token='second_token')

        AsyncClient.post_responses = [second_response, first_response]  # reversed order because of `pop`

        async def process_login():
            # this coro will be executed concurrently.
            # completely not caring about the params
            request = Mock()
            request.url = URL('https://url.com?state=state&code=code')
            await provider.process_login(
                code='code',
                request=request,
                params=dict(state='state'),
                convert_response=False
            )
            return provider.access_token

        # process login concurrently twice
        tasks = [
            process_login(),
            process_login()
        ]
        results = await asyncio.gather(*tasks)

        # we would want to get the first and second tokens,
        # but we see that the first request actually obtained the second token as well
        assert results == [first_response.token, second_response.token]
@tomasvotava tomasvotava self-assigned this Aug 3, 2024
@tomasvotava tomasvotava added bug Something isn't working security Security-related issues and vulnerabilities. labels Aug 3, 2024
@tomasvotava
Copy link
Owner

Hi @parikls, thank you so much for reporting this, I was foolishly convinced this was resolved by introducing the context manager (you should be using with provider: in your case), but I never honestly thought about requests happening asynchronously like this. I am currently AFK, but tomorrow I'll fix it and finally make the sso class state-less, which was my will all along. Thanks!

@parikls
Copy link
Author

parikls commented Aug 5, 2024

@tomasvotava the context manager won't help in this case unfortunately. Currently the workaround which works - is to create a new provider for each new request. Luckily I don't see a big difference in a performance/mem usage of our API containers after introducing this approach (it's live for 2 days already). But definitely would be better to solve this properly, instead of creating a new instances of provider, oauthlib classes, etc.

def create_my_provider():
    provider = create_provider(
        name=...,
        discovery_document=...,
        response_convertor=...
    )

    return provider(
        client_id=...,
        client_secret=...,
        redirect_uri=...,
        scope=...,
    )

async def my_view(sso_provider: Depends(create_my_provider)):
    ...

Thanks in advance!

P.S. I think I can confirm that the above approach solved the issue completely. At least we had the same amount of requests during the weekend, and 0 errors in a sentry.

@parikls
Copy link
Author

parikls commented Aug 7, 2024

@tomasvotava UPD: faced the same issue yesterday even with the provider creation per request. Probably some more shared state exist somewhere? Maybe on a class level? I haven't yet investigated this deeply, but want to keep you updated

@tomasvotava
Copy link
Owner

@parikls thanks a lot for keeping me up to date, I am afraid I cannot post a fix without changing the behavior of how access_token is retrieved (it cannot be on the instance if the instance is to be reusable), and so I will have to make some changes.
If you are using the generic sso provider, it should always create a whole new class when you call create_provider, it seems weird to me that there should be any shared state and if there is, this bug actually goes even deeper 😨
I double-checked the static attributes set on BaseSSO and the oauth client really gets recreated for each instance, how exactly do you catch this bug in a sentry?

@parikls
Copy link
Author

parikls commented Aug 7, 2024

@tomasvotava my SSO provider has a /me endpoint which allows me to get the user info, e.g. email, for the provided access token. So I'm using the access token to fetch the user email, and comparing that email with the OpenID.email (returned from verify_and_process). Simplified code looks like this:

sso_user: OpenID = await verify_and_process(
        request,
        redirect_uri=...
    )
access_token, refresh_token = my_sso.access_token, my_sso.refresh_token

me = await custom_provider_api.get_me(access_token=access_token)

if me['email'] != sso_user.email:
    logger.error('... error goes here ...')

@tomasvotava
Copy link
Owner

Ok, thanks for clarifying that, I was really hoping there was an error on your side, to be honest 😄 I know security isn't laughable thing, I assure you I take it seriously, I am just trying to gather as much intel as I can before moving on to solution, so that I can be sure that I actually resolve it properly this time. You are being very, very helpful, thanks!

@parikls
Copy link
Author

parikls commented Aug 7, 2024

@tomasvotava np =)
Right now I'm going to utilize a simple asyncio lock, so we'll process an SSO callbacks on each container one by one, e.g.

from asyncio import Lock

LOGIN_LOCK = Lock()

async def my_view(...):
    async with LOGIN_LOCK:
        sso_user: OpenID = await verify_and_process(request, redirect_uri=...)
        access_token, refresh_token = teachable_sso.access_token, teachable_sso.refresh_token
    ...

And will post an update in a day or two

@parikls
Copy link
Author

parikls commented Aug 13, 2024

@tomasvotava FYI after adding a lock - issue is fully resolved

@tomasvotava
Copy link
Owner

@parikls thanks for letting me know, my planned solution does basically the same, but it will require everyone to change from using with provider: to async with provider:, I hope it's going to be alright for everyone, but I'd say security is more important than convenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working security Security-related issues and vulnerabilities.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants