-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infer 'REQUIRED' mode with a flag for consistently filled in values #28
Conversation
Thanks. I made a small change to implement this during the schema output (i.e. in I also had to restrict --infer_mode to CSV files only. It doesn't really make sense for JSON files since JSON fields are often completely missing in some records compared to other records. |
@bxparks nice, I've run it locally, it seems to be what I wanted and written in a shorter way :) You may consider adding tests :) Could you please tell me when it would normally become available on |
Cool. I'll merge into develop, then let it sit there for a day or two, to give me a chance to clean up small loose ends (There always something). Then push to PyPI in a few days. |
@bxparks hm, yes it seems there's :P, I've just run it over a different file and it does not seem to support |
BigQuery (more precisely 'bq load') supports only 6 decimal places for the 'second' component. You should write a filter to normalize the data. You might say that this belongs in bigquery-schema-generator but I would push back a little on that. To me, this is a data-cleansing operation that belongs in the 'T' part of the ETL pipeline. |
@bxparks ah I see it is not a |
Hello Brian!
Sorry for making you wait.
Here is what I wanted to have.
I've been rereading code and I've understood your confusion.
You use
bq load
and it expects one to loosemode
to eitherNULLABLE
orREPEATED
. The thing is that I do not usebq load
. :) I first analyze a few given files, decide on a schema, create a table, and then start loading a lot of data continuously (I wouldn't like to discuss it too deeply). If we allow users to decide on mode only when a given flag is set, it should remain backward compatible, however please do check if I haven't broken things :)To me it seems like a general purpose functionality therefore I would prefer it to be open source so I really hope you won't mind having it.
If you agree to this, feel free to commit directly to my branch or branch out unless you strongly prefer to rewrite :) Also, if you just do a code review could be helpful as I am less experienced with Python.
Have a nice weekend.