Swagger cache fills itself up without boundaries #141

jchorin · 2020-08-06T09:24:34Z

etcd3-py version: 0.1.6
Python version: 3.6.9
Operating System: Ubuntu 18.04

Description

My setup:

I am using an API with aiohttp. For every request received, an AioClient is created by an aiohttp middleware. The client is closed after the request has been handled.

My issue:

If too many requests are sent to the API, the memory footprint of the API process increases continuously, until my machine breaks and resets.

What I Did

Here is a minimal example. It connects to an etcd database running locally, with ~20 elements present at the prefix "/".

import asyncio
from etcd3 import AioClient


async def read_db():
    while True:
        client = AioClient()
        try:
            resp = await client.range("/")
        finally:
            await client.close()


async def all_run(concurrent=10):
    """Run many reads concurrently
    """
    await asyncio.gather(
        *(read_db() for i in range(concurrent)),
        return_exceptions=False,
    )


def main():
    loop = asyncio.get_event_loop()
    try:
        result = loop.run_until_complete(all_run())
    except asyncio.CancelledError:
        pass
    finally:
        loop.close()


main()

This script, when running, uses more than 1 Go of memory after only 5 minutes.

Workaround

I narrowed down the issue to the caches of SwaggerNode and SwaggerSpec.

By changing the function read_db in the above example like the following:

import asyncio

from etcd3 import AioClient
from etcd3.swagger_helper import SwaggerSpec, SwaggerNode

counter = 0


async def read_db():
    global counter
    while True:
        counter += 1
        client = AioClient()
        try:
            resp = await client.range("/")
        finally:
            await client.close()

        if counter % 20 == 0:
            # Empty the different caches every 20 reads
            SwaggerNode._node_cache = {}
            SwaggerNode.__getattr__.__wrapped__.cache = {}
            SwaggerSpec._ref.__wrapped__.cache = {}
            SwaggerSpec.getPath.__wrapped__.cache = {}
            SwaggerSpec.getSchema.__wrapped__.cache = {}

my memory footprint is kept at 120 Mo even after 20 minutes.

The text was updated successfully, but these errors were encountered:

issue-label-bot · 2020-08-06T09:24:36Z

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.90. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

jchorin · 2020-08-14T13:18:09Z

If I modify the previous example to use the same client everywhere, I do not have any issue with the memory footprint being too high.

See the following modifications:

import asyncio
from etcd3 import AioClient


async def read_db(client):
    while True:
        resp = await client.range("/")


async def all_run(concurrent=10):
    """Run many reads concurrently
    """
    client = None
    try:
        client = AioClient()

        await asyncio.gather(
            *(read_db(client) for i in range(concurrent)),
            return_exceptions=False,
        )
    finally:
        if client:
            await client.close()


def main():
    loop = asyncio.get_event_loop()
    try:
        result = loop.run_until_complete(all_run())
    except asyncio.CancelledError:
        pass
    finally:
        loop.close()


main()

However, would it be best practice to use the same AioClient for a web server with several concurrent requests?

See issue #370 Previously, the Krake API created one etcd client for each request received. Because of the etcd client used, a memory leakage appeared, where the client would put too much elements in cache, see Revolution1/etcd3-py#141 To circumvent the issue, one single client is now used for the whole Krake API. As this client leverages a pool of connections, it could make sense to only use one of them. With this method, the API does not show any sign of memory leak. Signed-off-by: Jean Chorin <[email protected]>

Revolution1 · 2020-08-14T14:40:26Z

That might be the solution.

But I'll have to dig into it to find the cause.

Actually the "swagger cache" was just a temporary solution. My goal is to auto generate data model class from the swagger spec. Instead of generate model class at runtime. That, the cache won't be a problem

See issue #370 Previously, the Krake API created one etcd client for each request received. Because of the etcd client used, a memory leakage appeared, where the client would put too much elements in cache, see Revolution1/etcd3-py#141 To circumvent the issue, one single client is now used for the whole Krake API. As this client leverages a pool of connections, it could make sense to only use one of them. With this method, the API does not show any sign of memory leak. Signed-off-by: Jean Chorin <[email protected]>

issue-label-bot bot added the bug Something isn't working label Aug 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swagger cache fills itself up without boundaries #141

Swagger cache fills itself up without boundaries #141

jchorin commented Aug 6, 2020

issue-label-bot bot commented Aug 6, 2020

jchorin commented Aug 14, 2020

Revolution1 commented Aug 14, 2020

Swagger cache fills itself up without boundaries #141

Swagger cache fills itself up without boundaries #141

Comments

jchorin commented Aug 6, 2020

Description

My setup:

My issue:

What I Did

Workaround

issue-label-bot bot commented Aug 6, 2020

jchorin commented Aug 14, 2020

Revolution1 commented Aug 14, 2020