Skip to content

Commit

Permalink
Add a State Machine Matcher for the routing
Browse files Browse the repository at this point in the history
This new State Machine Matcher is intended to replace the existing
Table Matcher. It is an improvement in that is is quicker at matching
routes, with significant speedups seen for more complex routes and
crucially routing tables (up to 5 times seen locally).

The previous Table Matcher worked by checking each route individually
until a match is found (going through the table). The State Machine
Matcher works by considering each part of the route (parts based on
`/`). Hence the Table Matcher is O(N) where N is number of routes and
State Machine Matcher is O(M) where M is the number of parts in the
target path. Note though the State Machine Matcher is O(N) for a route
table consisting of routes using non-part-isolating converters.
  • Loading branch information
pgjones committed Jul 2, 2022
1 parent c741436 commit 5a7c0de
Show file tree
Hide file tree
Showing 8 changed files with 400 additions and 170 deletions.
2 changes: 2 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Version 2.2.0
debug console. :pr:`2439`
- Fix compatibility with Python 3.11 by ensuring that ``end_lineno``
and ``end_col_offset`` are present on AST nodes. :issue:`2425`
- Add a new faster matching router based on a state
machine. :pr:`2433`.


Version 2.1.2
Expand Down
47 changes: 47 additions & 0 deletions docs/routing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,10 @@ converters can be overridden or extended through :attr:`Map.converters`.

.. autoclass:: UUIDConverter

If a custom converter can match a forward slash, ``/``, it should have
the attribute ``part_isolating`` set to ``False``. This will ensure
that rules using the custom converter are correctly matched.


Maps, Rules and Adapters
========================
Expand All @@ -126,6 +130,15 @@ Maps, Rules and Adapters
.. autoclass:: Rule
:members: empty

Matchers
========

.. autoclass:: StateMachineMatcher
:members:

.. autoclass:: TableMatcher
:members:


Rule Factories
==============
Expand Down Expand Up @@ -261,3 +274,37 @@ scheme and host, ``force_external=True`` is implied.
url = adapter.build("comm")
assert url == "ws://example.org/ws"
State Machine Matching
======================

The default matching algorithm uses a state machine that transitions
between parts of the request path to find a match. To understand how
this works consider this rule::

/resource/<id>

Firstly this rule is decomposed into two ``RulePart``. The first is a
static part with a content equal to ``resource``, the second is
dynamic and requires a regex match to ``[^/]+``.

A state machine is then created with an initial state that represents
the rule's first ``/``. This initial state has a single, static
transition to the next state which represents the rule's second
``/``. This second state has a single dynamic transition to the final
state which includes the rule.

To match a path the matcher starts and the initial state and follows
transitions that work. Clearly a trial path of ``/resource/2`` has the
parts ``""``, ``resource``, and ``2`` which match the transitions and
hence a rule will match. Whereas ``/other/2`` will not match as there
is no transition for the ``other`` part from the initial state.

The only diversion from this rule is if a ``RulePart`` is not
part-isolating i.e. it will match ``/``. In this case the ``RulePart``
is considered final and represents a transition that must include all
the subsequent parts of the trial path.

To use the pre 2.2 TableMatcher change the ``map._matcher`` to be an
instance of the TableMatcher before adding rules to the map.
2 changes: 2 additions & 0 deletions src/werkzeug/routing/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,8 @@
from .exceptions import WebsocketMismatch
from .map import Map
from .map import MapAdapter
from .matcher import StateMachineMatcher
from .matcher import TableMatcher
from .rules import EndpointPrefix
from .rules import parse_converter_args
from .rules import Rule
Expand Down
10 changes: 10 additions & 0 deletions src/werkzeug/routing/converters.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ class BaseConverter:

regex = "[^/]+"
weight = 100
part_isolating = True

def __init__(self, map: "Map", *args: t.Any, **kwargs: t.Any) -> None:
self.map = map
Expand Down Expand Up @@ -50,6 +51,8 @@ class UnicodeConverter(BaseConverter):
:param length: the exact length of the string.
"""

part_isolating = True

def __init__(
self,
map: "Map",
Expand Down Expand Up @@ -80,6 +83,8 @@ class AnyConverter(BaseConverter):
arguments.
"""

part_isolating = True

def __init__(self, map: "Map", *items: str) -> None:
super().__init__(map)
self.regex = f"(?:{'|'.join([re.escape(x) for x in items])})"
Expand All @@ -97,6 +102,7 @@ class PathConverter(BaseConverter):

regex = "[^/].*?"
weight = 200
part_isolating = False


class NumberConverter(BaseConverter):
Expand All @@ -107,6 +113,7 @@ class NumberConverter(BaseConverter):

weight = 50
num_convert: t.Callable = int
part_isolating = True

def __init__(
self,
Expand Down Expand Up @@ -168,6 +175,7 @@ class IntegerConverter(NumberConverter):
"""

regex = r"\d+"
part_isolating = True


class FloatConverter(NumberConverter):
Expand All @@ -191,6 +199,7 @@ class FloatConverter(NumberConverter):

regex = r"\d+\.\d+"
num_convert = float
part_isolating = True

def __init__(
self,
Expand All @@ -216,6 +225,7 @@ class UUIDConverter(BaseConverter):
r"[A-Fa-f0-9]{8}-[A-Fa-f0-9]{4}-"
r"[A-Fa-f0-9]{4}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{12}"
)
part_isolating = True

def to_python(self, value: str) -> uuid.UUID:
return uuid.UUID(value)
Expand Down
11 changes: 6 additions & 5 deletions src/werkzeug/routing/map.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
from .exceptions import RequestPath
from .exceptions import RequestRedirect
from .exceptions import WebsocketMismatch
from .matcher import TableMatcher
from .matcher import StateMachineMatcher
from .rules import _simple_rule_re
from .rules import Rule

Expand Down Expand Up @@ -105,8 +105,7 @@ def __init__(
encoding_errors: str = "replace",
host_matching: bool = False,
) -> None:
self._rules: t.List[Rule] = []
self._matcher = TableMatcher()
self._matcher = StateMachineMatcher(merge_slashes)
self._rules_by_endpoint: t.Dict[str, t.List[Rule]] = {}
self._remap = True
self._remap_lock = self.lock_class()
Expand Down Expand Up @@ -149,6 +148,10 @@ def is_endpoint_expecting(self, endpoint: str, *arguments: str) -> bool:
return True
return False

@property
def _rules(self) -> t.List[Rule]:
return [rule for rules in self._rules_by_endpoint.values() for rule in rules]

def iter_rules(self, endpoint: t.Optional[str] = None) -> t.Iterator[Rule]:
"""Iterate over all rules or the rules of an endpoint.
Expand All @@ -171,7 +174,6 @@ def add(self, rulefactory: "RuleFactory") -> None:
rule.bind(self)
if not rule.build_only:
self._matcher.add(rule)
self._rules.append(rule)
self._rules_by_endpoint.setdefault(rule.endpoint, []).append(rule)
self._remap = True

Expand Down Expand Up @@ -362,7 +364,6 @@ def update(self) -> None:
return

self._matcher.update()
self._rules.sort(key=lambda x: x.match_compare_key())
for rules in self._rules_by_endpoint.values():
rules.sort(key=lambda x: x.build_compare_key())
self._remap = False
Expand Down
Loading

0 comments on commit 5a7c0de

Please sign in to comment.