Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompalibity with pandas ^2.0.0 when querying datetime64 columns #168

Open
AchazRyus opened this issue Sep 10, 2024 · 0 comments
Open

Incompalibity with pandas ^2.0.0 when querying datetime64 columns #168

AchazRyus opened this issue Sep 10, 2024 · 0 comments

Comments

@AchazRyus
Copy link

I am using pandas 2.0.0 (also happens with higher version of pandas) and performing SQL query using %%sparksql requesting datetime64 data. I encounter this error when performing basic query such as

%%sparksql select * from db.table

The error :

Casting to unit-less dtype 'datetime64' is not supported. Pass e.g. 'datetime64[ns]' instead.

The work around is to downgrade pandas. It could be solving the issue to have a converter when parsing a datetime series :

`from pyspark.sql.types import TimestampType

def correct_dtype(pser: pd.Series) -> pd.Series:
if pd.api.types.is_datetime64_any_dtype(pser):
return pser.astype('datetime64[ns]', copy=False)
return pser.astype(pandas_type, copy=False)`

If this is not a behavior you wan't, maybe consider adding an upper limit on the pandas version in the pyproject.toml

"pandas<2.0.0"

Complete Stacktrace :

File [/opt/conda/lib/python3.11/site-packages/IPython/core/interactiveshell.py:2541](lib/python3.11/site-packages/IPython/core/interactiveshell.py#line=2540), in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2539 with self.builtin_trap:
   2540     args = (magic_arg_s, cell)
-> 2541     result = fn(*args, **kwargs)
   2543 # The code below prevents the output from being displayed
   2544 # when using magics with decorator @output_can_be_silenced
   2545 # when the last Python token in the expression is a ';'.
   2546 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):

File [/opt/conda/lib/python3.11/site-packages/jupyterlab_sql_editor/ipython_magic/sparksql/sparksql.py:263](opt/conda/lib/python3.11/site-packages/jupyterlab_sql_editor/ipython_magic/sparksql/sparksql.py#line=262), in SparkSql.sparksql(self, line, cell, local_ns)
    261 end = time()
    262 print(f"Execution time: {end - start:.2f} seconds")
--> 263 pdf = to_pandas(results, self.spark._jconf) if not use_cache else self.cached_pdf
    264 if len(pdf) > limit:
    265     print(f"Only showing top {limit} {'row' if limit == 1 else 'rows'}")

File [/opt/conda/lib/python3.11/site-packages/jupyterlab_sql_editor/outputters/util.py:632](opt/conda/lib/python3.11/site-packages/jupyterlab_sql_editor/outputters/util.py#line=631), in to_pandas(df, jconf)
    628     # struct_in_pandas = jconf.pandasStructHandlingMode()
    629     struct_in_pandas = "dict"
    631     return pd.concat(
--> 632         [
    633             _create_converter_to_pandas(
    634                 field.dataType,
    635                 field.nullable,
    636                 timezone=timezone,
    637                 struct_in_pandas=("row" if struct_in_pandas == "legacy" else struct_in_pandas),
    638                 error_on_duplicated_field_names=False,
    639                 timestamp_utc_localized=False,
    640             )(pser)
    641             for (_, pser), field in zip(pdf.items(), df.schema.fields)
    642         ],
    643         axis="columns",
    644     )
    645 else:
    646     return pdf

File [/opt/conda/lib/python3.11/site-packages/jupyterlab_sql_editor/outputters/util.py:633](opt/conda/lib/python3.11/site-packages/jupyterlab_sql_editor/outputters/util.py#line=632), in <listcomp>(.0)
    628     # struct_in_pandas = jconf.pandasStructHandlingMode()
    629     struct_in_pandas = "dict"
    631     return pd.concat(
    632         [
--> 633             _create_converter_to_pandas(
    634                 field.dataType,
    635                 field.nullable,
    636                 timezone=timezone,
    637                 struct_in_pandas=("row" if struct_in_pandas == "legacy" else struct_in_pandas),
    638                 error_on_duplicated_field_names=False,
    639                 timestamp_utc_localized=False,
    640             )(pser)
    641             for (_, pser), field in zip(pdf.items(), df.schema.fields)
    642         ],
    643         axis="columns",
    644     )
    645 else:
    646     return pdf

File [/opt/conda/lib/python3.11/site-packages/jupyterlab_sql_editor/outputters/util.py:358](opt/conda/lib/python3.11/site-packages/jupyterlab_sql_editor/outputters/util.py#line=357), in _create_converter_to_pandas.<locals>.correct_dtype(pser)
    357 def correct_dtype(pser: pd.Series) -> pd.Series:
--> 358     return pser.astype(pandas_type, copy=False)

File [/opt/conda/lib/python3.11/site-packages/pandas/core/generic.py:6324](opt/conda/lib/python3.11/site-packages/pandas/core/generic.py#line=6323), in NDFrame.astype(self, dtype, copy, errors)
   6317     results = [
   6318         self.iloc[:, i].astype(dtype, copy=copy)
   6319         for i in range(len(self.columns))
   6320     ]
   6322 else:
   6323     # else, only a single dtype is given
-> 6324     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6325     return self._constructor(new_data).__finalize__(self, method="astype")
   6327 # GH 33113: handle empty frame or series

File [/opt/conda/lib/python3.11/site-packages/pandas/core/internals/managers.py:451](opt/conda/lib/python3.11/site-packages/pandas/core/internals/managers.py#line=450), in BaseBlockManager.astype(self, dtype, copy, errors)
    448 elif using_copy_on_write():
    449     copy = False
--> 451 return self.apply(
    452     "astype",
    453     dtype=dtype,
    454     copy=copy,
    455     errors=errors,
    456     using_cow=using_copy_on_write(),
    457 )

File [/opt/conda/lib/python3.11/site-packages/pandas/core/internals/managers.py:352](opt/conda/lib/python3.11/site-packages/pandas/core/internals/managers.py#line=351), in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    350         applied = b.apply(f, **kwargs)
    351     else:
--> 352         applied = getattr(b, f)(**kwargs)
    353     result_blocks = extend_blocks(applied, result_blocks)
    355 out = type(self).from_blocks(result_blocks, self.axes)

File [/opt/conda/lib/python3.11/site-packages/pandas/core/internals/blocks.py:511](opt/conda/lib/python3.11/site-packages/pandas/core/internals/blocks.py#line=510), in Block.astype(self, dtype, copy, errors, using_cow)
    491 """
    492 Coerce to the new dtype.
    493 
   (...)
    507 Block
    508 """
    509 values = self.values
--> 511 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    513 new_values = maybe_coerce_values(new_values)
    515 refs = None

File [/opt/conda/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:242](opt/conda/lib/python3.11/site-packages/pandas/core/dtypes/astype.py#line=241), in astype_array_safe(values, dtype, copy, errors)
    239     dtype = dtype.numpy_dtype
    241 try:
--> 242     new_values = astype_array(values, dtype, copy=copy)
    243 except (ValueError, TypeError):
    244     # e.g. _astype_nansafe can fail on object-dtype of strings
    245     #  trying to convert to float
    246     if errors == "ignore":

File [/opt/conda/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:184](opt/conda/lib/python3.11/site-packages/pandas/core/dtypes/astype.py#line=183), in astype_array(values, dtype, copy)
    180     return values
    182 if not isinstance(values, np.ndarray):
    183     # i.e. ExtensionArray
--> 184     values = values.astype(dtype, copy=copy)
    186 else:
    187     values = _astype_nansafe(values, dtype, copy=copy)

File [/opt/conda/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:694](opt/conda/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py#line=693), in DatetimeArray.astype(self, dtype, copy)
    682     raise TypeError(
    683         "Cannot use .astype to convert from timezone-aware dtype to "
    684         "timezone-naive dtype. Use obj.tz_localize(None) or "
    685         "obj.tz_convert('UTC').tz_localize(None) instead."
    686     )
    688 elif (
    689     self.tz is None
    690     and is_datetime64_dtype(dtype)
    691     and dtype != self.dtype
    692     and is_unitless(dtype)
    693 ):
--> 694     raise TypeError(
    695         "Casting to unit-less dtype 'datetime64' is not supported. "
    696         "Pass e.g. 'datetime64[ns]' instead."
    697     )
    699 elif is_period_dtype(dtype):
    700     return self.to_period(freq=dtype.freq)

TypeError: Casting to unit-less dtype 'datetime64' is not supported. Pass e.g. 'datetime64[ns]' instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant