You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
google-cloud-bigquery version: pip show google-cloud-bigquery 3.25.0
Steps to reproduce
Create a table like so :
CREATE OR REPLACETABLE `my_table`
ASSELECT JSON'[{"foo":"A"},{"foo":5}]'AS DATA
Query it from python :
fromgoogle.cloud.bigqueryimportClientclient=Client()
df=client.query("SELECT * FROM `my_table`",to_dataframe()
print(df["DATA"][0])
Expected behavior
We should see printed :
[{'foo':'A'},{'foo':5}]
Stack trace
Traceback (most recent call last):
File "/home/hugo/test/test.py", line 9, in <module>
).to_dataframe()
^^^^^^^^^^^^^^
File "/home/hugo/test/venv/lib/python3.11/site-packages/google/cloud/bigquery/job/query.py", line 2053, in to_dataframe
return query_result.to_dataframe(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hugo/test/venv/lib/python3.11/site-packages/google/cloud/bigquery/table.py", line 2379, in to_dataframe
record_batch = self.to_arrow(
^^^^^^^^^^^^^^
File "/home/hugo/test/venv/lib/python3.11/site-packages/google/cloud/bigquery/table.py", line 1946, in to_arrow
for record_batch in self.to_arrow_iterable(
File "/home/hugo/test/venv/lib/python3.11/site-packages/google/cloud/bigquery/table.py", line 1809, in _to_page_iterable
yield from result_pages
File "/home/hugo/test/venv/lib/python3.11/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 731, in download_arrow_row_iterator
yield _row_iterator_page_to_arrow(page, column_names, arrow_types)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hugo/test/venv/lib/python3.11/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 704, in _row_iterator_page_to_arrow
arrays.append(pyarrow.array(page._columns[column_index], type=arrow_type))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/array.pxi", line 368, in pyarrow.lib.array
File "pyarrow/array.pxi", line 42, in pyarrow.lib._sequence_to_array
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not convert 'A' with type str: tried to convert to int64
The error suggests that PyArrow is attempting to convert all values for the "foo" key to int64, based on the presence of the integer value 5. However, it fails when encountering the string value "A" for the same key.
The solution would be to make the type inference during JSON data ingestion to handle heterogeneous data type for the same key.
One thing of note, is that the JSON type is not yet explicitly handled in _pandas_helper.py, printing arrow_types inside of download_arrow_row_iterator during the execution of the above code yields [None] as no mapping exist from the JSON bigquery type to any pyarrow type.
The text was updated successfully, but these errors were encountered:
Hugo-Polloli
changed the title
Querying a JSON field containing heterogenous datatypes for the same field fails
Querying a JSON field containing heterogeneous datatypes for the same field fails
Jul 18, 2024
Hugo-Polloli
changed the title
Querying a JSON field containing heterogeneous datatypes for the same field fails
BigQuery Python client fails to query JSON field with heterogeneous data types
Jul 18, 2024
Environment details
python --version
3.11.6pip --version
24.1.2google-cloud-bigquery
version:pip show google-cloud-bigquery
3.25.0Steps to reproduce
Expected behavior
We should see printed :
Stack trace
The error suggests that PyArrow is attempting to convert all values for the "foo" key to int64, based on the presence of the integer value 5. However, it fails when encountering the string value "A" for the same key.
The solution would be to make the type inference during JSON data ingestion to handle heterogeneous data type for the same key.
One thing of note, is that the JSON type is not yet explicitly handled in
_pandas_helper.py
, printingarrow_types
inside ofdownload_arrow_row_iterator
during the execution of the above code yields[None]
as no mapping exist from the JSON bigquery type to any pyarrow type.The text was updated successfully, but these errors were encountered: