Fil missing massive chunks of memory usage in a Python program that creates mostly just Python objects #37

itamarst · 2020-07-06T14:07:17Z

Hoping for reproducer, but may have to make my own:

Python 3.6 (running on Debian) loads 200MB CSV. Result uses 1GB of RAM, this shows up fine in Fil.
The CSV rows are then loaded into a whole pile of objects. Per htop this is using another 3GB of RAM, but that 3GB doesn't show up in Fil.

What could cause this?

Fil isn't being told about all allocations.
1. Somehow some allocations from Python aren't reported.
  1. ~~PYMALLOC env variable doesn't work completely.~~
  2. ~~PYMALLOC env variable only works for some Python APIs.~~
2. It's one of the unsupported APIs, e.g. posix_memalign.
Fil is correctly tracking allocations, but memory is leaking.
1. ~~Due to Make sure via test that we're not leaking memory due to using dynamic linker's malloc() #35.~~
2. Memory fragmentation in the allocator due to many small allocations.
3. free() isn't being called, or being called wrongly somehow.
Fil is not tracking allocations correctly.

The text was updated successfully, but these errors were encountered:

itamarst · 2020-07-06T14:24:54Z

Allocating a list of integers is successfully tracked by Fil on the Python 3.6 I have installed.

rfleschenberg · 2020-07-06T14:30:14Z

The test was done on Debian 10 with Python 3.6 installed via pyenv.

There also is the Django ORM involved here, and the whole code is wrapped in a database transaction. I think the untracked memory is already being allocated before the database is queried, but I am not 100% sure. I will try to come up with a minimal reproducible example that doesn't involve database access.

itamarst · 2020-07-06T18:47:59Z

Can't reproduce when creating a list of regular Python class instances, Fil happily tracks that too.

itamarst · 2020-07-06T19:36:55Z

@rfleschenberg if you run ldd path/to/your/python, what does it say?

rfleschenberg · 2020-07-06T20:49:06Z

@rfleschenberg if you run ldd path/to/your/python, what does it say?

ldd $(which python)
        linux-vdso.so.1 (0x00007ffd191bf000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007faf15858000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007faf15853000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007faf1584e000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007faf156cb000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007faf1550a000)
        /lib64/ld-linux-x86-64.so.2 (0x00007faf15deb000)

itamarst · 2020-07-07T15:52:12Z

Probably not caused by #35, then, since libc precedes ld-linux.

itamarst · 2020-07-09T19:52:58Z

@rfleschenberg if you can't find the time for a reproducer, another useful thing would be just the list of Python packages that get imported (or even just installed) for the problem code, cause I can then audit them for which allocation APIs they use.

itamarst · 2020-07-09T20:28:11Z

Another theory: in scenarios where there are lots of small objects allocated, Fil currently has a huge amount of memory overhead. Like, memory usage can be 10× due to Fil overhead.

This was not what Fil was designed for, to be fair, but lots of small objects seems like a legitimate use case. Playing around with solutions in the branch for #43.

rfleschenberg · 2020-07-10T12:51:51Z

Fil currently has a huge amount of memory overhead. Like, memory usage can be 10× due to Fil overhead.

The code was allocating > 4 GB in production, where FIL was not involved, so I think that's probably not the issue.

if you can't find the time for a reproducer, another useful thing would be just the list of Python packages that get imported (or even just installed) for the problem code, cause I can then audit them for which allocation APIs they use

I still hope I can find the time to come up with an MRE, but due to my current work load I can't promise to do this until a certain date, sorry :-/

The issue occured in a somewhat-complex project, unfortunately. There also are bindings to some custom C++ code. However, as far as I can tell, most of the project's complexity / libraries should not be involved in the code path in question. I am attaching the full output of pip freeze just in case it is useful. The particular code path that hit the issue contains these import statements:

import logging
import re
import time

from django.conf import settings
from django.core.exceptions import ValidationError
from django.db.models.functions import Lower
from django.utils.functional import cached_property

from product.constants import TARGET_GROUPS_CHOICES
# These are Django model classes.
from product.models import (BasicColor, Brand, ProductCategoryMixin, ProductSize, ProductVariant,
                            ProductVariantTranslation, Store, StoreToProduct)
# The next two lines are pretty trivial helper functions
from product.validators import is_valid_mpn as _is_valid_mpn
from product.validators import validate_eu, validate_uk, validate_us

requirements.txt

itamarst · 2020-07-10T14:47:58Z

The code was allocating > 4 GB in production, where FIL was not involved, so I think that's probably not the issue.

When you ran Fil and got number that was much lower (1.2GB), you ran all the way to the end on full data? Or just on subset of data?

rfleschenberg · 2020-07-11T00:09:14Z

The code was allocating > 4 GB in production, where FIL was not involved, so I think that's probably not the issue.

When you ran Fil and got number that was much lower (1.2GB), you ran all the way to the end on full data? Or just on subset of data?

I ran against the full set. The OOM killer killed the process at some point. I wonder whether that has corrupted Fil's management data. In that case the "wrong" number could maybe just be a side-effect of the OOM-kill?

Afterwards I ran it against a subset. It ran to completion then. The numbers still seemed off by a similar factor (Fil showing ~1.4 GB vs htop showing > 4 GB). I cannot completely vouch for this second case, however. It's certainly possible that I did some stupid mistake.

Thank you for all of your work on this! I will try my best to provide more useful debugging information, and ideally a reproducible example. Unfortunately I can't commit to a deadline :-/ What I can promise is to keep it on my radar and do it as soon as possible. If you prefer to close the issue until it's reproducible, please do so!

itamarst · 2020-07-15T10:55:18Z

"Fil showing ~1.4 GB vs htop showing > 4 GB" could very well be a symptom of #45. Fil has a lot of overhead.

To rule this out, you could run the following experiment:

Create a subset of the data, small enough that the program finishes and doesn't use too much memory (let's say 1GB?).
Run the program without Fil, note maximum memory usage.
Run the program with Fil.

If the issue is just Fil overhead, Fil will report same maximum memory usage as step 2 above; the actual usage will be higher due to overhead, but the report will match step 2.

itamarst · 2020-08-13T18:45:25Z

I just released v0.9.0.

It's 3-4× as fast when doing large numbers of allocations.
Memory overhead is a lot smaller. E.g. I saw my test script go from ~800MB RAM to ~200MB RAM, for a script that Fil reported as using ~80MB of application allocations (this script does millions of tiny allocations).

If you could rerun your code with this version, it maybe that you can actually run the script to the end in a reasonable amount of time, and not run out of memory. Which would allow for more accurate assessment of whether there's a bug or it was just the overhead I just shrunk.

itamarst self-assigned this Jul 6, 2020

itamarst mentioned this issue Jul 6, 2020

Run tests that ensure Python objects are tracked. #41

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fil missing massive chunks of memory usage in a Python program that creates mostly just Python objects #37

Fil missing massive chunks of memory usage in a Python program that creates mostly just Python objects #37

itamarst commented Jul 6, 2020 •

edited

Loading

itamarst commented Jul 6, 2020

rfleschenberg commented Jul 6, 2020

itamarst commented Jul 6, 2020

itamarst commented Jul 6, 2020

rfleschenberg commented Jul 6, 2020 •

edited

Loading

itamarst commented Jul 7, 2020

itamarst commented Jul 9, 2020

itamarst commented Jul 9, 2020 •

edited

Loading

rfleschenberg commented Jul 10, 2020 •

edited

Loading

itamarst commented Jul 10, 2020

rfleschenberg commented Jul 11, 2020

itamarst commented Jul 15, 2020 •

edited

Loading

itamarst commented Aug 13, 2020 •

edited

Loading

Fil missing massive chunks of memory usage in a Python program that creates mostly just Python objects #37

Fil missing massive chunks of memory usage in a Python program that creates mostly just Python objects #37

Comments

itamarst commented Jul 6, 2020 • edited Loading

itamarst commented Jul 6, 2020

rfleschenberg commented Jul 6, 2020

itamarst commented Jul 6, 2020

itamarst commented Jul 6, 2020

rfleschenberg commented Jul 6, 2020 • edited Loading

itamarst commented Jul 7, 2020

itamarst commented Jul 9, 2020

itamarst commented Jul 9, 2020 • edited Loading

rfleschenberg commented Jul 10, 2020 • edited Loading

itamarst commented Jul 10, 2020

rfleschenberg commented Jul 11, 2020

itamarst commented Jul 15, 2020 • edited Loading

itamarst commented Aug 13, 2020 • edited Loading

itamarst commented Jul 6, 2020 •

edited

Loading

rfleschenberg commented Jul 6, 2020 •

edited

Loading

itamarst commented Jul 9, 2020 •

edited

Loading

rfleschenberg commented Jul 10, 2020 •

edited

Loading

itamarst commented Jul 15, 2020 •

edited

Loading

itamarst commented Aug 13, 2020 •

edited

Loading