Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyarrow engine incorretly serialize timestamp with Z. #2384

Closed
thomasfrederikhoeck opened this issue Apr 4, 2024 · 1 comment
Closed

Pyarrow engine incorretly serialize timestamp with Z. #2384

thomasfrederikhoeck opened this issue Apr 4, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@thomasfrederikhoeck
Copy link
Contributor

thomasfrederikhoeck commented Apr 4, 2024

Environment

Delta-rs version: main

Binding: python

Environment:

  • Cloud provider:
  • OS: windows
  • Other:

Bug

What happened:
Pyarrow serialize timestamp with Z in the end incorrectly which is in contrast to timestampNtz which is correct without Z.

image

What you expected to happen:
Both without Z
How to reproduce it:

import pyarrow as pa
import pytz

tz = "UTC"

def get_data(with_tz):
    tzinfo = pytz.timezone(tz) if  with_tz else None
    dates = pd.date_range(
        datetime(2021,1,1,3,4,6,3, tzinfo=tzinfo),
        datetime(2021,1,3,3,4,6, tzinfo=tzinfo)
        )
    return pd.DataFrame({"time":dates, "a":[i for i in range(len(dates))]})

schema = pa.schema(
        [
            ("time", pa.timestamp("us")),
            ("a", pa.int64()),
        ]
    )
dt = DeltaTable.create(
        "mytable_timestampNtz", schema=schema, partition_by=["time"]
    )

write_deltalake("mytable_timestampNtz",get_data(with_tz=False), partition_by="time", mode="append")
print(dt.schema())
schema = pa.schema(
        [
            ("time", pa.timestamp("us",tz)),
            ("a", pa.int64()),
        ]
    )
dt = DeltaTable.create(
        "mytable_timestamp", schema=schema, partition_by=["time"]
    )

write_deltalake("mytable_timestamp",get_data(with_tz=True), partition_by="time", mode="append")
print(dt.schema())

>Schema([Field(time, PrimitiveType("timestampNtz"), nullable=True), Field(a, PrimitiveType("long"), nullable=True)])
>Schema([Field(time, PrimitiveType("timestamp"), nullable=True), Field(a, PrimitiveType("long"), nullable=True)])

More details:

@ion-elgreco
Copy link
Collaborator

Gonna close this one since pyarrow engine is deprecated now

@ion-elgreco ion-elgreco closed this as not planned Won't fix, can't repro, duplicate, stale Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants