-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for existing BigQuery Schema (replace #40) #57
Closed
Closed
Changes from 1 commit
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
41f24cc
Add support for starting from a known schema_map
bozzzzo 99c5168
A SCHEMA tag can not be followed by DATA, ERROR or SCHEMA tag
bozzzzo 783e1f1
Convert tests to pytest.
bozzzzo bbbb35c
Add support for starting from a known BQ schema
bozzzzo 040d801
Add support for tox to test across python versions
bozzzzo ea5107e
BQ schema is case insensitive, track keys by their lowercase value
bozzzzo 35d5cec
lowercase field names when generating schema map from schema
bozzzzo 444a5b9
Keep schema map index case insensitive, but preserve original field case
bozzzzo 265b3c3
case sensitivity fix - preserve key name from old/original schema, if…
matevz-digiverse ee47847
add an optional callback to SchemaGenerator - allow the caller to pro…
matevz-digiverse 8fbe52d
Modified data_reader class to read in an existing schema
abroglesc e15bfe9
Fixed bug related to sanitization within CSV datasets
abroglesc 1e286f1
Migrated off of pytest and back to unittest. Coverted fixtures into f…
abroglesc 7afdf41
Adding command-line flag for starting from existing bigquery schema
abroglesc 96ca4ae
Added default NULLABLE mode when bigquery does not provide one
abroglesc fff6f5b
Removing errors informed from tests. Adding additional test cases inc…
abroglesc c5a57de
Removing type_mismatch_callback as this was untested
abroglesc 5ef1427
Merge branch 'develop' into abroglesc-existing-schema
abroglesc 098dbba
Fixing tests post merging from develop
abroglesc 608b3fd
Removing tox from gitignore
abroglesc dd0db5f
Updating README to include details on existing_schema_path
abroglesc de27694
Fixing Flake8 errors
abroglesc 7d1b4ce
Actually fully fixing flake8 tests
abroglesc ed60a7f
Removing old generator.run function call
abroglesc bb35faf
Removing unused test function
abroglesc 23f8c40
Keeping case sensitivity rather than converting everything to lowerca…
abroglesc 39f8222
Fixing error logging bug related to base_path not being passed to get…
abroglesc efcb2fa
Adding additional json_full_path error locations
abroglesc 865e270
Fixing flake8 error
abroglesc bb5745c
Allow infer_schema to control relaxing mode when using existing_schem…
abroglesc 411f402
Renaming line --> line_number
abroglesc 9829bb0
Updating make flake8 task to also scan tests/ folder since CI/CD does…
abroglesc b556e0b
Fix flake8 on tests/
abroglesc be3a37a
Revert read_errors_section logic to original
abroglesc fae9581
Convert .format into f strings
abroglesc d41556e
Revert generator for testcases to original loop method
abroglesc 38523e2
Added a test for standard sql types to legacy type conversion FLOAT64…
abroglesc 989c2f8
Added additional 2 standard to legacy type conversions to test
abroglesc 82593a4
Fixed bug where we used infer_mode to set a field as REQUIRED for a j…
abroglesc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -238,9 +238,11 @@ def deduce_schema_for_line(self, json_object, schema_map, base_path=None): | |
# that we can aggregate keys with slightly different casing together | ||
sanitized_key = self.sanitize_name(key).lower() | ||
schema_entry = schema_map.get(sanitized_key) | ||
new_schema_entry = self.get_schema_entry(key, value) | ||
new_schema_entry = self.get_schema_entry(key, | ||
value, | ||
base_path=base_path) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
schema_map[sanitized_key] = self.merge_schema_entry( | ||
schema_entry, new_schema_entry) | ||
schema_entry, new_schema_entry, base_path=base_path) | ||
|
||
def sanitize_name(self, value): | ||
''' Sanitizes a column name within the schema. | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename
sanitized_key
in this method to something likecanonical_key
, to avoid confusion withsanitized_key
inget_schema_entry()
? Also, update the comment to say something like: "The canonical key is the lower-cased version of the sanitized key so that the case of the field name is preserved when generating the schema."