fix: Handle Date32 columns in Arrow tables and Polars DataFrames #3377

jonmmease · 2024-03-22T12:50:31Z

This includes:

Support sanitizing Date32 columns in pyarrow Tables
For objects that follow the DataFrame interchange protocol, check for direct arrow conversion methods by name ("arrow", "to_arrow", and "to_arrow_table"), and if found use this instead of pyarrow from_dataframe function (which doesn't support Date32 yet).
To make mypy happy, I updated the DataFrameLike protocol to be @runtime_checkable (which adds support for isinstance checks), and switched our if hasattr(data, '__dataframe__'): calls to if isinstance(data, DataFrameLike): calls. This way mypy knows that data is a DataFrameLike inside the branch and is happy for it to be passed to the arrow_table_from_dfi_dataframe function.
Update VegaFusion tests for VegaFusion 1.6.6 and update the minimum VegaFusion version to 1.6.6.

jonmmease · 2024-03-22T12:52:53Z

altair/vegalite/v5/api.py

@@ -56,7 +56,7 @@ def _dataset_name(values: Union[dict, list, core.InlineDataset]) -> str:
        values = values.to_dict()
    if values == [{}]:
        return "empty"
-    values_json = json.dumps(values, sort_keys=True)
+    values_json = json.dumps(values, sort_keys=True, default=str)


For computing the values hash, I think it's fine to fallback to string representation for types not supported by json.dumps (datetime.date in this case).

…d using isinstance

jonmmease · 2024-03-23T12:51:00Z

tests/utils/test_mimebundle.py

@@ -241,7 +241,7 @@ def check_pre_transformed_vega_spec(vega_spec):

    # Check that the bin transform has been applied
    row0 = data_0["values"][0]
-    assert row0 == {"a": "A", "b": 28, "b_end": 28.0, "b_start": 0.0}
+    assert row0 == {"a": "A", "b_end": 28.0, "b_start": 0.0}


VegaFusion 1.6.6 strips out the "description" encoding field (which isn't used in the canvas renderer), so it's able to do a better job dropping unused columns.

mattijn

Thanks @jonmmease, really nice PR! Literature level code👍.
I added one more commit with a few more isinstance(data, DataFrameLike) over hasattr(data, __dataframe__)

I'm surprised that pandas has no a to_arrow() method or something a like.

mattijn · 2024-03-24T15:15:32Z

altair/utils/data.py

+    for convert_method_name in ("arrow", "to_arrow", "to_arrow_table"):
+        convert_method = getattr(dfi_df, convert_method_name, None)
+        if callable(convert_method):
+            result = convert_method()
+            if isinstance(result, pa.Table):
+                return result


It is a joy to read this type of code diff. Really nice approach!

mattijn · 2024-03-24T15:28:08Z

All tests pass. Merging! Thanks again @jonmease!

Handle arrow table with date32 columns

afd8caa

jonmmease commented Mar 22, 2024

View reviewed changes

jonmmease added 3 commits March 22, 2024 08:53

Handle all date types

49536d4

Add changelog entry

9c6454d

Use direct arrow conversion methods if available

f225b55

jonmmease changed the title ~~fix: Handle Date32 columns in arrow tables~~ fix: Handle Date32 columns in Arrow tables and Polars DataFrames Mar 23, 2024

jonmmease added 3 commits March 23, 2024 08:16

Make mypy happy by making DataFrameLike protocol runtime checkable an…

1432353

…d using isinstance

Update changelog

d843a1a

Fix vegafusion test and update VegaFusion constraint

7feefd5

jonmmease commented Mar 23, 2024

View reviewed changes

This was referenced Mar 23, 2024

Polars Dataframe with "date" dtype would not be rendered by Altair. Two workarounds (with and without pandas) are proposed. #3280

Closed

feature: Add browser renderer to open charts in external browser and update chart.show() to display chart #3379

Merged

joelostblom added the bug label Mar 23, 2024

jonmmease mentioned this pull request Mar 23, 2024

docs: Remove release notes and fully capture them in GitHub Releases #3380

Merged

check for instance DataFrameLike instead of __dataframe__ attribute

c119d1e

mattijn approved these changes Mar 24, 2024

View reviewed changes

mattijn merged commit c7c4149 into main Mar 24, 2024
20 checks passed

mattijn mentioned this pull request Mar 30, 2024

Don't error when hashing data that can't be serialised to JSON #3161

Closed

mattijn mentioned this pull request Jun 26, 2024

Remove PyArrow dependency for Polars support #3445

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Handle Date32 columns in Arrow tables and Polars DataFrames #3377

fix: Handle Date32 columns in Arrow tables and Polars DataFrames #3377

jonmmease commented Mar 22, 2024 •

edited

Loading

jonmmease Mar 22, 2024

jonmmease Mar 23, 2024

mattijn left a comment

mattijn Mar 24, 2024

mattijn commented Mar 24, 2024

fix: Handle Date32 columns in Arrow tables and Polars DataFrames #3377

fix: Handle Date32 columns in Arrow tables and Polars DataFrames #3377

Conversation

jonmmease commented Mar 22, 2024 • edited Loading

jonmmease Mar 22, 2024

Choose a reason for hiding this comment

jonmmease Mar 23, 2024

Choose a reason for hiding this comment

mattijn left a comment

Choose a reason for hiding this comment

mattijn Mar 24, 2024

Choose a reason for hiding this comment

mattijn commented Mar 24, 2024

jonmmease commented Mar 22, 2024 •

edited

Loading