Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Fix issues with Chart|LayerChart.encode, 1.32x speedup to infer_encoding_types #3444

Merged
merged 13 commits into from
Jun 28, 2024

Conversation

dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Jun 20, 2024

Background

I stumbled across this issue in the docs during 80a0812 and went down a bit of a rabbit hole

Before

image

After

image

Fixes

  • Sphinx warning on Chart.encode. Also incorrectly under Attributes section
  • Preserve static typing previously found in _encode_signature but lost after _EncodingMixin.encode
    • Re-running mypy output 'Found 63 errors in 47 files (checked 360 source files)', tests/examples

Perf

  • This was a response to the TODO left at the top of infer_encoding_types

altair/altair/utils/core.py

Lines 785 to 795 in 76a9ce1

All args and kwargs in a single dict, with keys and types
based on the channels mapping.
"""
# Construct a dictionary of channel type to encoding name
# TODO: cache this somehow?
channel_objs = (getattr(channels, name) for name in dir(channels))
channel_objs = (
c for c in channel_objs if isinstance(c, type) and issubclass(c, SchemaBase)
)
channel_to_name: Dict[Type[SchemaBase], str] = {
c: c._encoding_name for c in channel_objs

Benchmark

Code block
import altair as alt
from altair import Undefined
from altair.utils.core import infer_encoding_types
import random

def args_kwds():
	# Generating 40 mock channels
    tps = [
        alt.Angle,
        alt.Color,
        alt.Column,
        alt.Description,
        alt.Detail,
        alt.Facet,
        alt.Fill,
        alt.FillOpacity,
        alt.Href,
        alt.Key,
        alt.Latitude,
        alt.Latitude2,
        alt.Longitude,
        alt.Longitude2,
        alt.Opacity,
        alt.Order,
        alt.Radius,
        alt.Radius2,
        alt.Row,
        alt.Shape,
        alt.Size,
        alt.Stroke,
        alt.StrokeDash,
        alt.StrokeOpacity,
        alt.StrokeWidth,
        alt.Text,
        alt.Theta,
        alt.Theta2,
        alt.Tooltip,
        alt.Url,
        alt.X,
        alt.X2,
        alt.XError,
        alt.XError2,
        alt.XOffset,
        alt.Y,
        alt.Y2,
        alt.YError,
        alt.YError2,
        alt.YOffset,
    ]
    random.shuffle(tps)
    split = random.randint(0, 40)
	# when only keyword-args are passed, a slightly faster branch can be used
    pos_only = tps[:split]
    kwd_only = tps[split:]
    args = [arg("field_name") for arg in pos_only]
    kwargs = {kwd._encoding_name: kwd("field_name") for kwd in kwd_only}
    return args, kwargs


def _infer_new(
    *args,
    self=None,  # Just to include the time it takes to pop
				# Would be the implicit self of Chart.encode
    angle=Undefined,
    color=Undefined,
    column=Undefined,
    description=Undefined,
    detail=Undefined,
    facet=Undefined,
    fill=Undefined,
    fillOpacity=Undefined,
    href=Undefined,
    key=Undefined,
    latitude=Undefined,
    latitude2=Undefined,
    longitude=Undefined,
    longitude2=Undefined,
    opacity=Undefined,
    order=Undefined,
    radius=Undefined,
    radius2=Undefined,
    row=Undefined,
    shape=Undefined,
    size=Undefined,
    stroke=Undefined,
    strokeDash=Undefined,
    strokeOpacity=Undefined,
    strokeWidth=Undefined,
    text=Undefined,
    theta=Undefined,
    theta2=Undefined,
    tooltip=Undefined,
    url=Undefined,
    x=Undefined,
    x2=Undefined,
    xError=Undefined,
    xError2=Undefined,
    xOffset=Undefined,
    y=Undefined,
    y2=Undefined,
    yError=Undefined,
    yError2=Undefined,
    yOffset=Undefined,
):
    kwargs = locals()
    kwargs.pop("self")
    args = kwargs.pop("args")
    if args:
		# now required to remove any collisions 
		# as all have a default
        kwargs = {k: v for k, v in kwargs.items() if v is not Undefined}
    return infer_encoding_types(args, kwargs)


def test_infer():
    args, kwds = args_kwds()
    return _infer_new(*args, **kwds)

def test_infer_old():
	# This is still benefitting from all non-cache changes to `infer_encoding_types`
    args, kwds = args_kwds()
    return infer_encoding_types(args, kwds, alt.channels)
# %%timeit -n 100000
>>> test_infer()
154 µs ± 165 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# %%timeit -n 100000
>>> test_infer_old()
204 µs ± 256 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

1.32x speedup

…dup to `infer_encoding_types`

Fixes:
- [Sphinx warning](https://altair-viz.github.io/user_guide/generated/toplevel/altair.Chart.html#altair.Chart) on `Chart.encode`. Also incorrectly under `Attributes` section
- Preserve static typing previously found in `_encode_signature` but lost after `_EncodingMixin.encode`
  - Re-running `mypy` output 'Found 63 errors in 47 files (checked 360 source files)', tests/examples

Perf:
- This was a response to the `TODO` left at the top of `infer_encoding_types`
- Will be adding the benchmark to the PR description
Incompatible types in assignment (expression has type "Chart", variable has type "DataFrame")
`Color` -> `Fill` when passed to `fill` channel
… revealed

'error: Argument "color" to "encode" of "_EncodingMixin" has incompatible type "dict[Any, Any] | SchemaBase"; expected "str | Color | dict[Any, Any] | ColorDatum | ColorValue | UndefinedType"  [arg-type]'
- New implementation does not use `**kwargs`, which eliminates an entire class of tests based on `.encode(invalidChannel=...)` as these now trigger a runtime error
@dangotbanned dangotbanned changed the title fix, doc, perf: Fix issues with Chart|LayerChart.encode, 1.32x speedup to infer_encoding_types perf: Fix issues with Chart|LayerChart.encode, 1.32x speedup to infer_encoding_types Jun 20, 2024
return encoding
raise NotImplementedError(f"positional of type {type(tp).__name__!r}")

def _wrap_in_channel(self, obj: Any, encoding: str, /):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the first time I see a forward slash, / as argument within a function. Can you explain what that does?

Copy link
Member Author

@dangotbanned dangotbanned Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure @mattijn

/ is a marker for positional-only parameters. Any parameter to the left of / is positional-only, and may not be used by name.
In this case, calling self._wrap_in_channel(encoding="enc", obj=[1,2,3]) raises a TypeError. The python tutorial docs may be helpful for an overview as well.

Personally, I like to use / in cases like:

  • The function/method is not part of the public API, to leave more flexibility in renaming parameters in the future, without introducing a breaking change
  • There are a 1-3 parameters
  • The function is currently used in one specific way and the function name and parameter order is clear
    • In this case, _wrap_in_channel could logically be thought of as having parameters _wrap_in_channel(wrappee, wrapper).

Looking back at PEP570, I see that this was introduced in python3.8, which may explain the feature's absence in altair until now.

Hope all of that was helpful

@mattijn
Copy link
Contributor

mattijn commented Jun 26, 2024

Much appreciated for this PR @dangotbanned! I cannot really oversee the implications of this PR, but if it solves a todo within the code, resolves the docs issue and introduces a nice performance bump, than I'm really happy you went into that rabbit hole... and came out back again:)

All tests are happy already, but if you have a suggestion how I could review this better than looking carefully to the code-diff, that would be much appreciated.

Thanks again for this PR!

altair/utils/core.py Outdated Show resolved Hide resolved
altair/utils/core.py Outdated Show resolved Hide resolved
@dangotbanned
Copy link
Member Author

Much appreciated for this PR @dangotbanned! I cannot really oversee the implications of this PR, but if it solves a todo within the code, resolves the docs issue and introduces a nice performance bump, than I'm really happy you went into that rabbit hole... and came out back again:)

All tests are happy already, but if you have a suggestion how I could review this better than looking carefully to the code-diff, that would be much appreciated.

Thanks again for this PR!

Thanks @mattijn

I've marked up some comments now with some additional info that may be helpful.
Other than that, if you go through the commits sequentially - you can see specific mypy fixes, which help explain some changes that may have appeared unrelated.

Copy link
Contributor

@binste binste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Wasn't aware that the signature for encode does not work and the new implementation is definitely more robust with the explicit declaration of the kwargs. I'm also glad you went into that rabbit hole ;)

I added some smaller comments. Afterwards, I think this is ready to be merged.

altair/vegalite/v5/api.py Outdated Show resolved Hide resolved
altair/utils/core.py Show resolved Hide resolved
altair/utils/core.py Show resolved Hide resolved
tests/utils/test_core.py Show resolved Hide resolved
tests/utils/test_schemapi.py Outdated Show resolved Hide resolved
tests/utils/test_schemapi.py Outdated Show resolved Hide resolved
Copy link
Contributor

@binste binste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks again! I leave it to @mattijn to merge in case he has any additional comments.

@dangotbanned
Copy link
Member Author

dangotbanned commented Jun 27, 2024

Thanks @binste appreciate the review

I'll work on the conflicts today

@dangotbanned
Copy link
Member Author

dangotbanned commented Jun 27, 2024

I've pushed this with the mypy errors unfixed as I wanted a second opinion on solving it.

It seems core.LookupData is exported to __all__, but api.LookupData is not.
I think the latter is the intended export, but these errors seem to have revealed it was only accessible as the verbose altair.vegalite.v5.api.LookupData.

This has gone unnoticed due to .encode previously being untyped see #3444 (comment).

@binste am I understanding this correctly?

This example should be enough to repro locally:

lookup_data = alt.LookupData(
airports, key="iata", fields=["state", "latitude", "longitude"]
)
background = alt.Chart(states).mark_geoshape(
fill="lightgray",
stroke="white"
).properties(
width=750,
height=500
).project("albersUsa")
connections = alt.Chart(flights_airport).mark_rule(opacity=0.35).encode(
latitude="latitude:Q",
longitude="longitude:Q",
latitude2="lat2:Q",
longitude2="lon2:Q"
).transform_lookup(
lookup="origin",
from_=lookup_data
).transform_lookup(
lookup="destination",
from_=lookup_data,
as_=["state", "lat2", "lon2"]
).transform_filter(
select_city
)

…t assumes that altair.LookupData comes from core.py instead of api.py
@binste
Copy link
Contributor

binste commented Jun 27, 2024

api.LookupData is already the one which you get from top-level altair which is correct:

image

But indeed, mypy does get confused by this and thinks it's core.LookupData. We already had issues with all the from ... import * statements in the package which are used to expose almost all objects on the top-level. Ideally, we could tell mypy that it's api.LookupData on the top-level. Unfortunately, I could not convince mypy with a simple type hint after the import statements. I now pushed a commit which excludes LookupData from being exported. We already do this for 2 other classes which clash with classes defined in other modules. Downside is that there is now no more LookupData defined on altair.vegalite.v5.schema but I think that's ok. Don't expect this to be a way to import it that people use.

@dangotbanned
Copy link
Member Author

dangotbanned commented Jun 27, 2024

#3444 (comment)

Thanks @binste for the explanation and quick fix

Copy link
Member Author

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor tweak, but think it would be more maintainable

tools/update_init_file.py Show resolved Hide resolved
@dangotbanned
Copy link
Member Author

dangotbanned commented Jun 28, 2024

Nice catch! Wasn't aware that the signature for encode does not work

@binste just to explain this a bit further, the issue (I think) stems from @utils.use_signature:

  • The assumption that Obj is always a class, whereas channels._encode_signature was a function.
  • It appears to be acting like @functools.wraps, but the below is quite different to functools.update_wrapper.

altair/altair/utils/core.py

Lines 689 to 696 in c9106f0

def use_signature(Obj: Callable[P, Any]): # -> Callable[..., Callable[P, V]]:
"""Apply call signature and documentation of Obj to the decorated method"""
def decorate(f: Callable[..., V]) -> Callable[P, V]:
# call-signature of f is exposed via __wrapped__.
# we want it to mimic Obj.__init__
f.__wrapped__ = Obj.__init__ # type: ignore
f._uses_signature = Obj # type: ignore

Comparing to @deprecation.deprecated, which does use @functools.wraps:

from __future__ import annotations
import sys
from typing import Callable, TypeVar, TYPE_CHECKING
import warnings
import functools
if sys.version_info >= (3, 10):
from typing import ParamSpec
else:
from typing_extensions import ParamSpec
if TYPE_CHECKING:
from functools import _Wrapped
T = TypeVar("T")
P = ParamSpec("P")
R = TypeVar("R")
class AltairDeprecationWarning(UserWarning):
pass
def deprecated(
message: str | None = None,
) -> Callable[..., type[T] | _Wrapped[P, R, P, R]]:
"""Decorator to deprecate a function or class.
Parameters
----------
message : string (optional)
The deprecation message
"""
def wrapper(obj: type[T] | Callable[P, R]) -> type[T] | _Wrapped[P, R, P, R]:
return _deprecate(obj, message=message)
return wrapper
def _deprecate(
obj: type[T] | Callable[P, R], name: str | None = None, message: str | None = None
) -> type[T] | _Wrapped[P, R, P, R]:
"""Return a version of a class or function that raises a deprecation warning.
Parameters
----------
obj : class or function
The object to create a deprecated version of.
name : string (optional)
The name of the deprecated object
message : string (optional)
The deprecation message
Returns
-------
deprecated_obj :
The deprecated version of obj
Examples
--------
>>> class Foo: pass
>>> OldFoo = _deprecate(Foo, "OldFoo")
>>> f = OldFoo() # doctest: +SKIP
AltairDeprecationWarning: alt.OldFoo is deprecated. Use alt.Foo instead.
"""
if message is None:
message = f"alt.{name} is deprecated. Use alt.{obj.__name__} instead." ""
if isinstance(obj, type):
if name is None:
msg = f"Requires name, but got: {name=}"
raise TypeError(msg)
else:
return type(
name,
(obj,),
{
"__doc__": obj.__doc__,
"__init__": _deprecate(obj.__init__, "__init__", message),
},
)
elif callable(obj):
@functools.wraps(obj)
def new_obj(*args: P.args, **kwargs: P.kwargs) -> R:
warnings.warn(message, AltairDeprecationWarning, stacklevel=1)
return obj(*args, **kwargs)
new_obj._deprecated = True # type: ignore[attr-defined]
return new_obj
else:
msg = f"Cannot deprecate object of type {type(obj)}"
raise ValueError(msg)

A future PR may be helpful, to ensure @utils.use_signature works in a predictable manner

@binste binste requested a review from mattijn June 28, 2024 12:07
@mattijn mattijn merged commit c82f8c2 into vega:main Jun 28, 2024
11 checks passed
@mattijn
Copy link
Contributor

mattijn commented Jun 28, 2024

Thanks for the PR and review, LGTM!

@dangotbanned dangotbanned deleted the encode-improve branch June 28, 2024 13:06
dangotbanned added a commit to dangotbanned/altair that referenced this pull request Jun 29, 2024
Related to vega#3444 (comment)

*Placeholder for screenshot(s) documenting the bug*
dangotbanned added a commit to dangotbanned/altair that referenced this pull request Jul 22, 2024
I should have updated in vega#3444 but the problem didn't become apparent until running through `ruff`
dangotbanned added a commit to dangotbanned/altair that referenced this pull request Sep 1, 2024
Was kept, but only needed for tests since vega#3444.
As `infer_encoding_types` is not public API - this is a safe remove, no need for deprecation
binste pushed a commit that referenced this pull request Sep 4, 2024
* test: Monkeypatch channels global

Removes the dependency in `test_infer_encoding_types`

* refactor: Remove `channels` parameter in `infer_encoding_types`

Was kept, but only needed for tests since #3444.
As `infer_encoding_types` is not public API - this is a safe remove, no need for deprecation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants