-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fil missing massive chunks of memory usage in a Python program that creates mostly just Python objects #37
Comments
Allocating a list of integers is successfully tracked by Fil on the Python 3.6 I have installed. |
The test was done on Debian 10 with Python 3.6 installed via pyenv. There also is the Django ORM involved here, and the whole code is wrapped in a database transaction. I think the untracked memory is already being allocated before the database is queried, but I am not 100% sure. I will try to come up with a minimal reproducible example that doesn't involve database access. |
Can't reproduce when creating a list of regular Python class instances, Fil happily tracks that too. |
@rfleschenberg if you run |
|
Probably not caused by #35, then, since libc precedes ld-linux. |
@rfleschenberg if you can't find the time for a reproducer, another useful thing would be just the list of Python packages that get imported (or even just installed) for the problem code, cause I can then audit them for which allocation APIs they use. |
Another theory: in scenarios where there are lots of small objects allocated, Fil currently has a huge amount of memory overhead. Like, memory usage can be 10× due to Fil overhead. This was not what Fil was designed for, to be fair, but lots of small objects seems like a legitimate use case. Playing around with solutions in the branch for #43. |
The code was allocating > 4 GB in production, where FIL was not involved, so I think that's probably not the issue.
I still hope I can find the time to come up with an MRE, but due to my current work load I can't promise to do this until a certain date, sorry :-/ The issue occured in a somewhat-complex project, unfortunately. There also are bindings to some custom C++ code. However, as far as I can tell, most of the project's complexity / libraries should not be involved in the code path in question. I am attaching the full output of import logging
import re
import time
from django.conf import settings
from django.core.exceptions import ValidationError
from django.db.models.functions import Lower
from django.utils.functional import cached_property
from product.constants import TARGET_GROUPS_CHOICES
# These are Django model classes.
from product.models import (BasicColor, Brand, ProductCategoryMixin, ProductSize, ProductVariant,
ProductVariantTranslation, Store, StoreToProduct)
# The next two lines are pretty trivial helper functions
from product.validators import is_valid_mpn as _is_valid_mpn
from product.validators import validate_eu, validate_uk, validate_us |
When you ran Fil and got number that was much lower (1.2GB), you ran all the way to the end on full data? Or just on subset of data? |
I ran against the full set. The OOM killer killed the process at some point. I wonder whether that has corrupted Fil's management data. In that case the "wrong" number could maybe just be a side-effect of the OOM-kill? Afterwards I ran it against a subset. It ran to completion then. The numbers still seemed off by a similar factor (Fil showing ~1.4 GB vs htop showing > 4 GB). I cannot completely vouch for this second case, however. It's certainly possible that I did some stupid mistake. Thank you for all of your work on this! I will try my best to provide more useful debugging information, and ideally a reproducible example. Unfortunately I can't commit to a deadline :-/ What I can promise is to keep it on my radar and do it as soon as possible. If you prefer to close the issue until it's reproducible, please do so! |
"Fil showing ~1.4 GB vs htop showing > 4 GB" could very well be a symptom of #45. Fil has a lot of overhead. To rule this out, you could run the following experiment:
If the issue is just Fil overhead, Fil will report same maximum memory usage as step 2 above; the actual usage will be higher due to overhead, but the report will match step 2. |
I just released v0.9.0.
If you could rerun your code with this version, it maybe that you can actually run the script to the end in a reasonable amount of time, and not run out of memory. Which would allow for more accurate assessment of whether there's a bug or it was just the overhead I just shrunk. |
Hoping for reproducer, but may have to make my own:
What could cause this?
PYMALLOC
env variable doesn't work completely.PYMALLOC
env variable only works for some Python APIs.posix_memalign
.Due to Make sure via test that we're not leaking memory due to using dynamic linker's malloc() #35.free()
isn't being called, or being called wrongly somehow.The text was updated successfully, but these errors were encountered: