Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing of mixed-cased locale ID. #361

Merged
merged 5 commits into from
Mar 3, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 18 additions & 3 deletions babel/localedata.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import os
import threading
from collections import MutableMapping
from itertools import chain

from babel._compat import pickle

Expand All @@ -24,15 +25,29 @@
_dirname = os.path.join(os.path.dirname(__file__), 'locale-data')


def normalize_locale(name):
"""Normalize a locale ID by stripping spaces and apply proper casing.

Returns the normalized locale ID string or `None` if the ID is not
recognized.
"""
name = name.strip().lower()
for locale_id in chain.from_iterable([_cache, locale_identifiers()]):
if name == locale_id.lower():
return locale_id


def exists(name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, greedy method.
I'm really looking forward to #305 !

(just an observation, please ignore :)

"""Check whether locale data is available for the given locale. Ther
return value is `True` if it exists, `False` otherwise.
"""Check whether locale data is available for the given locale.

Returns `True` if it exists, `False` otherwise.

:param name: the locale identifier string
"""
if name in _cache:
return True
return os.path.exists(os.path.join(_dirname, '%s.dat' % name))
file_found = os.path.exists(os.path.join(_dirname, '%s.dat' % name))
return True if file_found else bool(normalize_locale(name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would iteration over in-mem objects would be faster than disk access? If so, please consider performing the in-memory check first. :)

(super duper minor optimization)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd complicate the whole deal unnecessarily, imho. It'd become something like

  • (1) name in cache?
  • (2) normalized name in cache?
  • (3) name in file system?
  • (4) normalized name in file system?

which would kinda interleave normalization with this "simple" existence checking...



def locale_identifiers():
Expand Down
19 changes: 19 additions & 0 deletions tests/test_localedata.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@

import doctest
import unittest
import random
from operator import methodcaller

from babel import localedata

Expand Down Expand Up @@ -73,6 +75,23 @@ def test_merge():
localedata.merge(d, {1: 'Foo', 2: 'Bar'})
assert d == {1: 'Foo', 2: 'Bar', 3: 'baz'}


def test_locale_identification():
for l in localedata.locale_identifiers():
assert localedata.exists(l)


def test_unique_ids():
# Check all locale IDs are uniques.
all_ids = localedata.locale_identifiers()
assert len(all_ids) == len(set(all_ids))
# Check locale IDs don't collide after lower-case normalization.
lower_case_ids = list(map(methodcaller('lower'), all_ids))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO

lower_case_ids = [id.lower() for id in all_ids] is more readable, but don't care much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would be, and more Pythonic, but that's kinda minor, and can be trivially corrected later if need be.

assert len(lower_case_ids) == len(set(lower_case_ids))


def test_mixedcased_locale():
for l in localedata.locale_identifiers():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to iterate through all ids? Would it be sufficient to spot check some deliberately selected hardcoded values? Eg. 'Fi', 'FI_fi', 'fi_FI', 'fI_fI', 'ZH_hanT_HK'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's another optimization that could be done (via pytest.parametrize), but I don't think our test suite takes too long to run as it is :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True that. I just get the heebie jeebies when I see random.choice in unit tests, ha.

locale_id = ''.join([
methodcaller(random.choice(['lower', 'upper']))(c) for c in l])
assert localedata.exists(locale_id)