Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swagger cache fills itself up without boundaries #141

Open
jchorin opened this issue Aug 6, 2020 · 3 comments
Open

Swagger cache fills itself up without boundaries #141

jchorin opened this issue Aug 6, 2020 · 3 comments
Labels
bug Something isn't working

Comments

@jchorin
Copy link

jchorin commented Aug 6, 2020

  • etcd3-py version: 0.1.6
  • Python version: 3.6.9
  • Operating System: Ubuntu 18.04

Description

My setup:

I am using an API with aiohttp. For every request received, an AioClient is created by an aiohttp middleware. The client is closed after the request has been handled.

My issue:

If too many requests are sent to the API, the memory footprint of the API process increases continuously, until my machine breaks and resets.

What I Did

Here is a minimal example. It connects to an etcd database running locally, with ~20 elements present at the prefix "/".

import asyncio
from etcd3 import AioClient


async def read_db():
    while True:
        client = AioClient()
        try:
            resp = await client.range("/")
        finally:
            await client.close()


async def all_run(concurrent=10):
    """Run many reads concurrently
    """
    await asyncio.gather(
        *(read_db() for i in range(concurrent)),
        return_exceptions=False,
    )


def main():
    loop = asyncio.get_event_loop()
    try:
        result = loop.run_until_complete(all_run())
    except asyncio.CancelledError:
        pass
    finally:
        loop.close()


main()

This script, when running, uses more than 1 Go of memory after only 5 minutes.

Workaround

I narrowed down the issue to the caches of SwaggerNode and SwaggerSpec.

By changing the function read_db in the above example like the following:

import asyncio

from etcd3 import AioClient
from etcd3.swagger_helper import SwaggerSpec, SwaggerNode

counter = 0


async def read_db():
    global counter
    while True:
        counter += 1
        client = AioClient()
        try:
            resp = await client.range("/")
        finally:
            await client.close()

        if counter % 20 == 0:
            # Empty the different caches every 20 reads
            SwaggerNode._node_cache = {}
            SwaggerNode.__getattr__.__wrapped__.cache = {}
            SwaggerSpec._ref.__wrapped__.cache = {}
            SwaggerSpec.getPath.__wrapped__.cache = {}
            SwaggerSpec.getSchema.__wrapped__.cache = {}

my memory footprint is kept at 120 Mo even after 20 minutes.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.90. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@issue-label-bot issue-label-bot bot added the bug Something isn't working label Aug 6, 2020
@jchorin
Copy link
Author

jchorin commented Aug 14, 2020

If I modify the previous example to use the same client everywhere, I do not have any issue with the memory footprint being too high.

See the following modifications:

import asyncio
from etcd3 import AioClient


async def read_db(client):
    while True:
        resp = await client.range("/")


async def all_run(concurrent=10):
    """Run many reads concurrently
    """
    client = None
    try:
        client = AioClient()

        await asyncio.gather(
            *(read_db(client) for i in range(concurrent)),
            return_exceptions=False,
        )
    finally:
        if client:
            await client.close()


def main():
    loop = asyncio.get_event_loop()
    try:
        result = loop.run_until_complete(all_run())
    except asyncio.CancelledError:
        pass
    finally:
        loop.close()


main()

However, would it be best practice to use the same AioClient for a web server with several concurrent requests?

mgoerens pushed a commit to rak-n-rok/Krake that referenced this issue Aug 14, 2020
See issue #370

Previously, the Krake API created one etcd client for each request
received. Because of the etcd client used, a memory leakage appeared,
where the client would put too much elements in cache, see
Revolution1/etcd3-py#141

To circumvent the issue, one single client is now used for the whole
Krake API. As this client leverages a pool of connections, it could
make sense to only use one of them. With this method, the API does not
show any sign of memory leak.

Signed-off-by: Jean Chorin <[email protected]>
@Revolution1
Copy link
Owner

That might be the solution.

But I'll have to dig into it to find the cause.

Actually the "swagger cache" was just a temporary solution. My goal is to auto generate data model class from the swagger spec. Instead of generate model class at runtime. That, the cache won't be a problem

mgoerens pushed a commit to rak-n-rok/Krake that referenced this issue Oct 22, 2020
See issue #370

Previously, the Krake API created one etcd client for each request
received. Because of the etcd client used, a memory leakage appeared,
where the client would put too much elements in cache, see
Revolution1/etcd3-py#141

To circumvent the issue, one single client is now used for the whole
Krake API. As this client leverages a pool of connections, it could
make sense to only use one of them. With this method, the API does not
show any sign of memory leak.

Signed-off-by: Jean Chorin <[email protected]>
mgoerens pushed a commit to rak-n-rok/Krake that referenced this issue Mar 31, 2021
See issue #370

Previously, the Krake API created one etcd client for each request
received. Because of the etcd client used, a memory leakage appeared,
where the client would put too much elements in cache, see
Revolution1/etcd3-py#141

To circumvent the issue, one single client is now used for the whole
Krake API. As this client leverages a pool of connections, it could
make sense to only use one of them. With this method, the API does not
show any sign of memory leak.

Signed-off-by: Jean Chorin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants