-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
source-S3: Support JSON format (#14213)
* json format support added * json format support added * code formatted * format convertion changed * format naming convertion changed * test cased issue fixed * test case issued resolved * sample file and config added for integration tests * Json doc added Json doc added * update * sample file and config added for integration tests * sample file and config added for integration tests * update jsonl files * review 1 * review 1 * review 1 * pyarrow version upgrade * clean integration test folder architecture * add timestamp record to simple_test.jsonl * fixed integration test and parser review change * simplify table read * doc update * fix specs * user sample files * fix sample files * add newlines at end of files * rename json parser * rename jsonfile to jsonlfile * schema inference added * patch review fix * Update docs/integrations/sources/s3.md doc update Co-authored-by: George Claireaux <[email protected]> * changing the version * changing the title to sync with other type * fix expected csv records * fix expected records for avro and parquet * review fix * fixed master schema handling * remove sample configs * fix expected records * json doc update added more details on json parser * fixed api name * bump version * auto-bump connector version [ci skip] Co-authored-by: alafanechere <[email protected]> Co-authored-by: George Claireaux <[email protected]> Co-authored-by: George Claireaux <[email protected]> Co-authored-by: Octavia Squidington III <[email protected]>
- Loading branch information
1 parent
c5a98f3
commit 3d49955
Showing
38 changed files
with
556 additions
and
29 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
16 changes: 16 additions & 0 deletions
16
airbyte-integrations/connectors/source-s3/integration_tests/config_minio.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"dataset": "test", | ||
"provider": { | ||
"storage": "S3", | ||
"bucket": "test-bucket", | ||
"aws_access_key_id": "123456", | ||
"aws_secret_access_key": "123456key", | ||
"path_prefix": "", | ||
"endpoint": "http://10.0.3.185:9000" | ||
}, | ||
"format": { | ||
"filetype": "csv" | ||
}, | ||
"path_pattern": "*.csv", | ||
"schema": "{}" | ||
} |
File renamed without changes.
File renamed without changes.
15 changes: 15 additions & 0 deletions
15
airbyte-integrations/connectors/source-s3/integration_tests/configured_catalogs/jsonl.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"streams": [ | ||
{ | ||
"stream": { | ||
"name": "test", | ||
"json_schema": {}, | ||
"supported_sync_modes": ["full_refresh", "incremental"], | ||
"source_defined_cursor": true, | ||
"default_cursor_field": ["_ab_source_file_last_modified"] | ||
}, | ||
"sync_mode": "incremental", | ||
"destination_sync_mode": "append" | ||
} | ||
] | ||
} |
15 changes: 15 additions & 0 deletions
15
airbyte-integrations/connectors/source-s3/integration_tests/configured_catalogs/parquet.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"streams": [ | ||
{ | ||
"stream": { | ||
"name": "test", | ||
"json_schema": {}, | ||
"supported_sync_modes": ["full_refresh", "incremental"], | ||
"source_defined_cursor": true, | ||
"default_cursor_field": ["_ab_source_file_last_modified"] | ||
}, | ||
"sync_mode": "incremental", | ||
"destination_sync_mode": "append" | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
...s3/integration_tests/expected_records.txt → ...ntegration_tests/expected_records/csv.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
2 changes: 2 additions & 0 deletions
2
airbyte-integrations/connectors/source-s3/integration_tests/expected_records/jsonl.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
{"stream": "test", "data": {"id": 1, "name": "PVdhmjb1", "valid": false,"value": 1.2, "event_date": "2022-01-01T00:00:00Z", "_ab_additional_properties": {}, "_ab_source_file_last_modified": "2022-07-15T08:31:02+0000", "_ab_source_file_url": "simple_test.jsonl"}, "emitted_at": 162727468000} | ||
{"stream": "test", "data": {"id": 2, "name": "ABCDEF", "valid": true,"value": 1.0, "event_date": "2023-01-01T00:00:00Z", "_ab_additional_properties": {}, "_ab_source_file_last_modified": "2022-07-15T08:31:02+0000", "_ab_source_file_url": "simple_test.jsonl"}, "emitted_at": 162727468000} |
2 changes: 2 additions & 0 deletions
2
...e-integrations/connectors/source-s3/integration_tests/expected_records/jsonl_newlines.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
{"stream": "test", "data": {"id": 1, "name": "PVdhmjb1", "valid": false,"value": 1.2, "event_date": "2022-01-01T00:00:00Z", "_ab_additional_properties": {}, "_ab_source_file_last_modified": "2022-07-15T10:07:00+0000", "_ab_source_file_url": "simple_test_newlines.jsonl"}, "emitted_at": 162727468000} | ||
{"stream": "test", "data": {"id": 2, "name": "ABCDEF", "valid": true,"value": 1.0, "event_date": "2023-01-01T00:00:00Z", "_ab_additional_properties": {}, "_ab_source_file_last_modified": "2022-07-15T10:07:00+0000", "_ab_source_file_url": "simple_test_newlines.jsonl"}, "emitted_at": 162727468000} |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+256 Bytes
...ntegrations/connectors/source-s3/integration_tests/sample_files/avrofile/test_sample.avro
Binary file not shown.
9 changes: 9 additions & 0 deletions
9
...-integrations/connectors/source-s3/integration_tests/sample_files/csvfile/simple_test.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
id,name,valid | ||
1,PVdhmjb1,False | ||
2,j4DyXTS7,True | ||
3,v0w8fTME,False | ||
4,1q6jD8Np,False | ||
5,77h4aiMP,True | ||
6,Le35Wyic,True | ||
7,xZhh1Kyl,False | ||
8,M2t286iJ,False |
2 changes: 2 additions & 0 deletions
2
...egrations/connectors/source-s3/integration_tests/sample_files/jsonlfile/simple_test.jsonl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
{"id":1,"name":"PVdhmjb1","valid":false, "value": 1.2, "event_date": "2022-01-01T00:00:00Z"} | ||
{"id":2,"name":"ABCDEF","valid":true, "value": 1, "event_date": "2023-01-01T00:00:00Z"} |
15 changes: 15 additions & 0 deletions
15
.../connectors/source-s3/integration_tests/sample_files/jsonlfile/simple_test_newlines.jsonl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"id":1, | ||
"name":"PVdhmjb1", | ||
"valid":false, | ||
"value": 1.2, | ||
"event_date": "2022-01-01T00:00:00Z" | ||
} | ||
{ | ||
"id":2, | ||
"name":"ABCDEF", | ||
"valid":true, | ||
"value": 1, | ||
"event_date": | ||
"2023-01-01T00:00:00Z" | ||
} |
Binary file added
BIN
+1.21 KB
...tions/connectors/source-s3/integration_tests/sample_files/parquetfile/sample_test.parquet
Binary file not shown.
8 changes: 8 additions & 0 deletions
8
airbyte-integrations/connectors/source-s3/integration_tests/sample_files/simple_test.jsonl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{"id":1,"name":"PVdhmjb1","valid":false, "value": 1.2} | ||
{"id":2,"name":"j4DyXTS7","valid":true, "value": 1.3} | ||
{"id":3,"name":"v0w8fTME","valid":false, "value": 1.4} | ||
{"id":4,"name":"1q6jD8Np","valid":false, "value": 1.5} | ||
{"id":5,"name":"77h4aiMP","valid":true, "value": 1.6} | ||
{"id":6,"name":"Le35Wyic","valid":true, "value": 1.7} | ||
{"id":7,"name":"xZhh1Kyl","valid":false, "value": 1.8} | ||
{"id":8,"name":"M2t286iJ","valid":false, "value": 1.9} |
Oops, something went wrong.