Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use unicode-slugify to generate Version.slug #5186

Merged
merged 3 commits into from
Feb 19, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 40 additions & 8 deletions readthedocs/builds/version_slug.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@

from django.db import models
from django.utils.encoding import force_text
from slugify import slugify as unicode_slugify


def get_fields_with_model(cls):
Expand Down Expand Up @@ -53,13 +54,15 @@ def get_fields_with_model(cls):

class VersionSlugField(models.CharField):

"""Inspired by ``django_extensions.db.fields.AutoSlugField``."""
"""
Inspired by ``django_extensions.db.fields.AutoSlugField``.

invalid_chars_re = re.compile('[^-._a-z0-9]')
leading_punctuation_re = re.compile('^[-._]+')
placeholder = '-'
fallback_slug = 'unknown'
Uses ``unicode-slugify`` to generate the slug.
"""

ok_chars = '-._' # dash, dot, underscore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the rfc https://tools.ietf.org/html/rfc1035

The following syntax will result in fewer problems with many

applications that use domain names (e.g., mail, TELNET).

<domain> ::= <subdomain> | " "

<subdomain> ::= <label> | <subdomain> "." <label>

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

<let-dig-hyp> ::= <let-dig> | "-"

<let-dig> ::= <letter> | <digit>

_ underscore isn't allowed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it should start with a letter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I didn't change this behavior. We could consider.

Our current regex was working like that:

# Regex breakdown:
#   [a-z0-9] -- start with alphanumeric value
#   [-._a-z0-9] -- allow dash, dot, underscore, digit, lowercase ascii
#   *? -- allow multiple of those, but be not greedy about the matching
#   (?: ... ) -- wrap everything so that the pattern cannot escape when used in
#                regexes.
VERSION_SLUG_REGEX = '(?:[a-z0-9A-Z][-._a-z0-9A-Z]*?)'

Basically, it does not allow ., - and _ as starting character, but it does allow a letter (lower and upper case) and a number.

If we want to be more strict and follow those directions, I suppose we should not allow the . (dot) either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, this only affects the version's slug, not the project's slug. We are safe here, sorry for the noise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use the same logic for project slugs if it's not explicitly defined by the user?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean removing the _ and .. I mean, "why we don't use the all same logic for generating the slug for a Version than the slug for a Project?"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Project's are used in DNS, but Versions aren't. They are different things, so we don't need the exact same restrictions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand that.

I'm suggesting to use the same logic (unicode-slugify to generate the slug) but apply different restrictions.

Anyway, if so, it's for a different PR.

test_pattern = re.compile('^{pattern}$'.format(pattern=VERSION_SLUG_REGEX))
fallback_slug = 'unknown'

def __init__(self, *args, **kwargs):
kwargs.setdefault('db_index', True)
Expand All @@ -78,13 +81,42 @@ def get_queryset(self, model_cls, slug_field):
return model._default_manager.all()
return model_cls._default_manager.all()

def _normalize(self, content):
"""
Normalize some invalid characters (/, %, !, ?) to become a dash (``-``).

.. note::

We replace these characters to a dash to keep compatibility with the
old behavior and also because it makes this more readable.

For example, ``release/1.0`` will become ``release-1.0``.
"""
return re.sub('[/%!?]', '-', content)

def slugify(self, content):
"""
Make ``content`` a valid slug.

It uses ``unicode-slugify`` behind the scenes which works properly with
Unicode characters.
"""
if not content:
return ''

slugified = content.lower()
slugified = self.invalid_chars_re.sub(self.placeholder, slugified)
slugified = self.leading_punctuation_re.sub('', slugified)
normalized = self._normalize(content)
slugified = unicode_slugify(
normalized,
only_ascii=True,
spaces=False,
lower=True,
ok=self.ok_chars,
space_replacement='-',
)

# Remove first character wile it's an invalid character for the
# beginning of the slug
slugified = slugified.lstrip(self.ok_chars)

if not slugified:
return self.fallback_slug
Expand Down
12 changes: 12 additions & 0 deletions readthedocs/rtd_tests/tests/test_version_slug.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,3 +105,15 @@ def test_uniquifying_suffix(self):
self.assertEqual(field.uniquifying_suffix(25), '_z')
self.assertEqual(field.uniquifying_suffix(26), '_ba')
self.assertEqual(field.uniquifying_suffix(52), '_ca')

def test_unicode(self):
version = Version.objects.create(
verbose_name='camión',
project=self.pip,
)
self.assertEqual(version.slug, 'camion')
version = Version.objects.create(
verbose_name='ŭñíč°də-branch',
project=self.pip,
)
self.assertEqual(version.slug, 'unicd-branch')
3 changes: 3 additions & 0 deletions requirements/pip.txt
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,9 @@ django-kombu==0.9.4
mock==2.0.0
stripe==2.19.0

# unicode-slugify==0.1.5 is not released on PyPI yet
git+https://github.com/mozilla/unicode-slugify@b696c37#egg=unicode-slugify
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙁


django-formtools==2.1

# docker is pinned to 3.1.3 because we found some strange behavior
Expand Down