Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for existing BigQuery Schema (replace #40) #57

Closed
wants to merge 39 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
41f24cc
Add support for starting from a known schema_map
bozzzzo Mar 22, 2020
99c5168
A SCHEMA tag can not be followed by DATA, ERROR or SCHEMA tag
bozzzzo Mar 24, 2020
783e1f1
Convert tests to pytest.
bozzzzo Mar 22, 2020
bbbb35c
Add support for starting from a known BQ schema
bozzzzo Mar 24, 2020
040d801
Add support for tox to test across python versions
bozzzzo Mar 24, 2020
ea5107e
BQ schema is case insensitive, track keys by their lowercase value
bozzzzo Mar 24, 2020
35d5cec
lowercase field names when generating schema map from schema
bozzzzo Apr 7, 2020
444a5b9
Keep schema map index case insensitive, but preserve original field case
bozzzzo Apr 23, 2020
265b3c3
case sensitivity fix - preserve key name from old/original schema, if…
matevz-digiverse Jun 16, 2020
ee47847
add an optional callback to SchemaGenerator - allow the caller to pro…
matevz-digiverse Aug 19, 2020
8fbe52d
Modified data_reader class to read in an existing schema
abroglesc Nov 6, 2020
e15bfe9
Fixed bug related to sanitization within CSV datasets
abroglesc Nov 6, 2020
1e286f1
Migrated off of pytest and back to unittest. Coverted fixtures into f…
abroglesc Nov 6, 2020
7afdf41
Adding command-line flag for starting from existing bigquery schema
abroglesc Nov 6, 2020
96ca4ae
Added default NULLABLE mode when bigquery does not provide one
abroglesc Nov 9, 2020
fff6f5b
Removing errors informed from tests. Adding additional test cases inc…
abroglesc Nov 9, 2020
c5a57de
Removing type_mismatch_callback as this was untested
abroglesc Nov 9, 2020
5ef1427
Merge branch 'develop' into abroglesc-existing-schema
abroglesc Nov 9, 2020
098dbba
Fixing tests post merging from develop
abroglesc Nov 9, 2020
608b3fd
Removing tox from gitignore
abroglesc Nov 9, 2020
dd0db5f
Updating README to include details on existing_schema_path
abroglesc Nov 9, 2020
de27694
Fixing Flake8 errors
abroglesc Nov 9, 2020
7d1b4ce
Actually fully fixing flake8 tests
abroglesc Nov 9, 2020
ed60a7f
Removing old generator.run function call
abroglesc Nov 16, 2020
bb35faf
Removing unused test function
abroglesc Nov 16, 2020
23f8c40
Keeping case sensitivity rather than converting everything to lowerca…
abroglesc Nov 17, 2020
39f8222
Fixing error logging bug related to base_path not being passed to get…
abroglesc Nov 18, 2020
efcb2fa
Adding additional json_full_path error locations
abroglesc Nov 18, 2020
865e270
Fixing flake8 error
abroglesc Nov 18, 2020
bb5745c
Allow infer_schema to control relaxing mode when using existing_schem…
abroglesc Dec 1, 2020
411f402
Renaming line --> line_number
abroglesc Dec 1, 2020
9829bb0
Updating make flake8 task to also scan tests/ folder since CI/CD does…
abroglesc Dec 1, 2020
b556e0b
Fix flake8 on tests/
abroglesc Dec 1, 2020
be3a37a
Revert read_errors_section logic to original
abroglesc Dec 1, 2020
fae9581
Convert .format into f strings
abroglesc Dec 1, 2020
d41556e
Revert generator for testcases to original loop method
abroglesc Dec 1, 2020
38523e2
Added a test for standard sql types to legacy type conversion FLOAT64…
abroglesc Dec 1, 2020
989c2f8
Added additional 2 standard to legacy type conversions to test
abroglesc Dec 1, 2020
82593a4
Fixed bug where we used infer_mode to set a field as REQUIRED for a j…
abroglesc Dec 1, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions bigquery_schema_generator/generate_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,9 +238,11 @@ def deduce_schema_for_line(self, json_object, schema_map, base_path=None):
# that we can aggregate keys with slightly different casing together
sanitized_key = self.sanitize_name(key).lower()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename sanitized_key in this method to something like canonical_key, to avoid confusion with sanitized_key in get_schema_entry()? Also, update the comment to say something like: "The canonical key is the lower-cased version of the sanitized key so that the case of the field name is preserved when generating the schema."

schema_entry = schema_map.get(sanitized_key)
new_schema_entry = self.get_schema_entry(key, value)
new_schema_entry = self.get_schema_entry(key,
value,
base_path=base_path)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, this is fixed on the develop branch... In fact, you fixed it with 0c2b3a0. It looks like the problem is that you took Bozo's #40 but didn't rebase it off develop, so now, this PR has diverged.

schema_map[sanitized_key] = self.merge_schema_entry(
schema_entry, new_schema_entry)
schema_entry, new_schema_entry, base_path=base_path)

def sanitize_name(self, value):
''' Sanitizes a column name within the schema.
Expand Down