-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autodetect for BQ happening automatically even with schema defined. #847
Comments
To determine this, it would help to see what is being sent in the job insert request. Would it be possible for you to add a breakpoint / print out the contents of the job resource here? python-bigquery/google/cloud/bigquery/client.py Line 2414 in 7016f69
|
Could you please elaborate? What error do you get when the request "fails"? Or is the error that type of the |
@tswast i'll go ahead and get you that job_resource values here shortly! I'll post back when i do. @plamut It is the latter. The type of
|
@tswast here is a output of
the value is under the |
Thanks, @brummetj That means the library is sending the schema. Double-checking with https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad I don't see any typos with regards to This must be a backend issue / issue with the table creation. |
@tswast Sounds good to me! ... Any steps you want me to take to help with the backend issue? Should i be calling |
That could be a good workaround. If a table already exists, the backend API should use that.
I don't have as much access to internal job logs and such. Filing a support request if you have it would get you to the right folks. If not, the backend team does monitor the public issue tracker here: https://issuetracker.google.com/issues/new?component=187149&template=1162659 |
Oh, I have another thought! @brummetj Are those "long float number (ex: 75257229615211513551680283362634662820), int number ( ex: 4 )" values actual float / int values in Python? If so, it's likely that that the Would it be possible to convert these to strings before passing them to I could see the library adding logic to convert values based on the BigQuery schema we have, but this would likely come at a pretty severe performance penalty. Alternatively, you could file that issue against the backend to see if they can convert those to strings for you a load time. |
@tswast they are not actual float/int values. They are wrapped in quotes and python is interpreting them as a string.... e.i ... they are already strings :D I'm probably just going to go ahead and file issue with the link you shared |
Thanks. If you could comment here with a link to the issue you file, that'd be great. Then I can track it as well as anyone else who hits this issue. |
@tswast here you go! https://issuetracker.google.com/issues/195444399 |
This item went to the BQ backend team and was closed as |
Issue
I'm currently using the python sdk to transform and load data into BQ. One of the fields I have is supposed to be a string but the data will come in as either a long float number (ex:
75257229615211513551680283362634662820
), int number ( ex:4
), or string (ex:http://some-string-url
).So the data representation looks like
Since BQ does the auto detection, these values can be recognized as either a FLOAT, INT, or STRING. I followed docs, and stack recommendations to assign a hard schema to the config which looks like
The schema is read in from a json file that looks like
and parsed into the SchemaField() object..
Just to be clear i do believe that this is apart of a
subschema
and assigned to thefields
as it's an array of objects.then i submit the data through the SDK via the
load_table_from_json
which looks likeAll fairly straightforward, and follows closely with what is documented and shared by others. BUT for whatever reason, the API request will fail and continue to try parse
value
as either a STRING, FLOAT or INT even with the schema defined.So a couple questions here... is this a bug with BQ, or the SDK not accepting the schema? Am i doing something wrong here? Does the autodetect feature always do its thing even when turned off and a schema defined?
Any communication here would be greatly appreciated.
Cheers
info
sdk version:
google-cloud-bigquery 2.20.0
os: MacOS
environment: Cloud Function
python version: 3.9
The text was updated successfully, but these errors were encountered: