Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak Detected #962

Closed
EdGaere opened this issue Jan 28, 2023 · 2 comments · Fixed by #967
Closed

Memory Leak Detected #962

EdGaere opened this issue Jan 28, 2023 · 2 comments · Fixed by #967

Comments

@EdGaere
Copy link

EdGaere commented Jan 28, 2023

Overview Description

There appears to be a memory leak generated by repeated calls to dates.parse_pattern(dt, format, locale) if the function is called with a wide variety of different formats and locales.

This is because each time a new DateTimePattern is created for a new (format, locale), the object is cached to _pattern_cache (dict), which grows endlessly.

babel/dates.py:1598 (Babel 2.9.1)

def parse_pattern(pattern):
    """Parse date, time, and datetime format patterns."""
    ...

    # here is the problem
     _pattern_cache[pattern] = pat = DateTimePattern(pattern, u''.join(result))

Perhaps a better design could be to simply lru_cache the dates.parse_pattern() function ?

from functools import lru_cache

@lru_cache(maxsize=1000)
def parse_pattern(pattern):
    """Parse date, time, and datetime format patterns."""
    ...

Steps to Reproduce

from datetime import datetime

from babel.localedata import locale_identifiers
from babel.dates import format_datetime

from pympler import tracker # track memory leaks(=> https://github.com/pympler/pympler)

# show initial memory usage
tr = tracker.SummaryTracker()
tr.print_diff()

# create some random datetime
d = datetime(2007, 4, 1, 13, 27, 53)

# create some datetime formats
custom_formats = [  r"M/d/yy, h:mm a" # short
                    ,r"MMM d, y, h:mm:ss a" # medium
                    ,r"MMMM d, y 'at' h:mm:ss a z" # long
                    ,r"EEEE, MMMM d, y 'at' h:mm:ss a zzzz" # full

                    ,r"EEEE, MMMM d, y 'at' hh:mm:ss zzz" # shorter timezone
                    ,r"EEEE, MMMM d, y 'at' hh:mm:ss zzzz" # full, 24hr
                        
                    ,r"EEEE, MMMM d, y 'at' hh:mm:ss"
                    ,r"EEEE, MMMM d, y 'at' h:mm:ss a"

                    ,r"EEEE, d MMM y hh:mm:ss"
                    ,r"EEEE, d MMM y h:mm:ss a"

                    ,r"d MMM y hh:mm:ss"
                    ,r"d MMM y h:mm:ss a"
                    ]

# call format_datetime for all locale/format combinations, about 9.4k combinations
for locale_name in locale_identifiers():
    for custom_format in custom_formats:
        s = format_datetime(d, locale=locale_name, format=custom_format)

# show difference in memory usage since start
tr.print_diff()




Actual Results

Initial Memory Snapshot
types | # objects | total size

               list |        3750 |     318.95 KB
                str |        3747 |     260.45 KB
                int |         817 |      22.34 KB

Final Memory Snapshot
types | # objects | total size

                         dict |      272282 |    113.17 MB
                          str |       21809 |      1.51 MB
                         list |       12416 |      1.12 MB
  babel.dates.DateTimePattern |        9668 |    453.19 KB
                        tuple |        6829 |    385.02 KB
  babel.numbers.NumberPattern |        7550 |    353.91 K

Expected Results

Reproducibility

Additional Information

@akx
Copy link
Member

akx commented Jan 28, 2023

I wonder if this is a problem in any real-world application... :) But sure, using lru_cache would probably work just as well!

@EdGaere
Copy link
Author

EdGaere commented Feb 7, 2023

Thanks for the fix :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants