Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: _parse_timestamps_in_record throws exception for key not present in schema #1836

Closed
1 task
menzenski opened this issue Jul 13, 2023 · 1 comment · Fixed by #1844
Closed
1 task

bug: _parse_timestamps_in_record throws exception for key not present in schema #1836

menzenski opened this issue Jul 13, 2023 · 1 comment · Fixed by #1844
Labels
kind/Bug Something isn't working valuestream/SDK

Comments

@menzenski
Copy link
Contributor

Singer SDK Version

0.19.0

Is this a regression?

  • Yes

Python Version

NA

Bug scope

Targets (data type handling, batching, SQL object generation, etc.)

Operating System

macOS, Linux

Description

We use the meltanolabs variant of the target-postgres loader with the ets variant of the tap-spreadsheets-anywhere extractor.

Tap-spreadsheets-anywhere supports dynamic schema inference from randomly selected sample files. This is very useful as we do a lot of loading CSV+JSONL files from AWS S3, for instance.

Our meltano runs are frequently erroring when a KeyError is thrown here

datelike_type = get_datelike_property_type(schema["properties"][key])
- the dynamic schema generation may not include all fields actually present in the data (especially for JSON type documents) - this causes the KeyError and that brings the whole pipeline to a halt.

It would be great if this function could catch the KeyError and only raise an exception if truly necessary - in the scenarios I've seen, I believe the KeyError could have been swallowed with no impact.

Code

meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": false, "producer": true, "string_id": "tap-spreadsheets-anywhere-s3", "cmd_type": "elb", "stdio": "stderr", "name": "tap-spreadsheets-anywhere-s3", "event": "INFO Syncing file \"sub/dir/ectory/2022/04/01/filename.CSV\".", "level": "info", "timestamp": "2023-07-13T17:38:00.017169Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "Traceback (most recent call last):", "level": "info", "timestamp": "2023-07-13T17:38:00.160258Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "  File \"/project/.meltano/loaders/target-postgres/venv/bin/target-postgres\", line 8, in <module>", "level": "info", "timestamp": "2023-07-13T17:38:00.160499Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "    sys.exit(TargetPostgres.cli())", "level": "info", "timestamp": "2023-07-13T17:38:00.160721Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "             ^^^^^^^^^^^^^^^^^^^^", "level": "info", "timestamp": "2023-07-13T17:38:00.160867Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "  File \"/project/.meltano/loaders/target-postgres/venv/lib/python3.11/site-packages/click/core.py\", line 1157, in __call__", "level": "info", "timestamp": "2023-07-13T17:38:00.161008Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "    return self.main(*args, **kwargs)", "level": "info", "timestamp": "2023-07-13T17:38:00.161272Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "           ^^^^^^^^^^^^^^^^^^^^^^^^^^", "level": "info", "timestamp": "2023-07-13T17:38:00.161540Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "  File \"/project/.meltano/loaders/target-postgres/venv/lib/python3.11/site-packages/click/core.py\", line 1078, in main", "level": "info", "timestamp": "2023-07-13T17:38:00.161684Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "    rv = self.invoke(ctx)", "level": "info", "timestamp": "2023-07-13T17:38:00.161824Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "         ^^^^^^^^^^^^^^^^", "level": "info", "timestamp": "2023-07-13T17:38:00.161932Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "  File \"/project/.meltano/loaders/target-postgres/venv/lib/python3.11/site-packages/click/core.py\", line 1434, in invoke", "level": "info", "timestamp": "2023-07-13T17:38:00.162056Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "    return ctx.invoke(self.callback, **ctx.params)", "level": "info", "timestamp": "2023-07-13T17:38:00.162183Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^", "level": "info", "timestamp": "2023-07-13T17:38:00.162344Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "  File \"/project/.meltano/loaders/target-postgres/venv/lib/python3.11/site-packages/click/core.py\", line 783, in invoke", "level": "info", "timestamp": "2023-07-13T17:38:00.162460Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": false, "producer": true, "string_id": "tap-spreadsheets-anywhere-s3", "cmd_type": "elb", "stdio": "stderr", "name": "tap-spreadsheets-anywhere-s3", "event": "INFO Syncing file \"sub/dir/ectory/2022/04/02/filename.CSV\".", "level": "info", "timestamp": "2023-07-13T17:38:00.162621Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "    return __callback(*args, **kwargs)", "level": "info", "timestamp": "2023-07-13T17:38:00.162787Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "           ^^^^^^^^^^^^^^^^^^^^^^^^^^^", "level": "info", "timestamp": "2023-07-13T17:38:00.162935Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "  File \"/project/.meltano/loaders/target-postgres/venv/lib/python3.11/site-packages/singer_sdk/target_base.py\", line 572, in cli", "level": "info", "timestamp": "2023-07-13T17:38:00.163056Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "    target.listen(file_input)", "level": "info", "timestamp": "2023-07-13T17:38:00.163189Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "  File \"/project/.meltano/loaders/target-postgres/venv/lib/python3.11/site-packages/singer_sdk/io_base.py\", line 34, in listen", "level": "info", "timestamp": "2023-07-13T17:38:00.163321Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "    self._process_lines(file_input)", "level": "info", "timestamp": "2023-07-13T17:38:00.163438Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "  File \"/project/.meltano/loaders/target-postgres/venv/lib/python3.11/site-packages/singer_sdk/target_base.py\", line 272, in _process_lines", "level": "info", "timestamp": "2023-07-13T17:38:00.163559Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "    counter = super()._process_lines(file_input)", "level": "info", "timestamp": "2023-07-13T17:38:00.163678Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^", "level": "info", "timestamp": "2023-07-13T17:38:00.163816Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "  File \"/project/.meltano/loaders/target-postgres/venv/lib/python3.11/site-packages/singer_sdk/io_base.py\", line 81, in _process_lines", "level": "info", "timestamp": "2023-07-13T17:38:00.163938Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "    self._process_record_message(line_dict)", "level": "info", "timestamp": "2023-07-13T17:38:00.164044Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "  File \"/project/.meltano/loaders/target-postgres/venv/lib/python3.11/site-packages/target_postgres/target.py\", line 278, in _process_record_message", "level": "info", "timestamp": "2023-07-13T17:38:00.164168Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "    super()._process_record_message(message_dict)", "level": "info", "timestamp": "2023-07-13T17:38:00.164272Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "  File \"/project/.meltano/loaders/target-postgres/venv/lib/python3.11/site-packages/singer_sdk/target_base.py\", line 315, in _process_record_message", "level": "info", "timestamp": "2023-07-13T17:38:00.164383Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "    sink._validate_and_parse(transformed_record)", "level": "info", "timestamp": "2023-07-13T17:38:00.164550Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "  File \"/project/.meltano/loaders/target-postgres/venv/lib/python3.11/site-packages/singer_sdk/sinks/core.py\", line 303, in _validate_and_parse", "level": "info", "timestamp": "2023-07-13T17:38:00.164711Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "    self._parse_timestamps_in_record(", "level": "info", "timestamp": "2023-07-13T17:38:00.164830Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "  File \"/project/.meltano/loaders/target-postgres/venv/lib/python3.11/site-packages/singer_sdk/sinks/core.py\", line 323, in _parse_timestamps_in_record", "level": "info", "timestamp": "2023-07-13T17:38:00.164953Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "    datelike_type = get_datelike_property_type(schema[\"properties\"][key])", "level": "info", "timestamp": "2023-07-13T17:38:00.165083Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "                                               ~~~~~~~~~~~~~~~~~~~~^^^^^", "level": "info", "timestamp": "2023-07-13T17:38:00.165194Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"consumer": true, "producer": false, "string_id": "target-postgres-staging", "cmd_type": "elb", "stdio": "stderr", "name": "target-postgres-staging", "event": "KeyError: 'org_id'", "level": "info", "timestamp": "2023-07-13T17:38:00.165297Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"event": "Loader failed", "level": "error", "timestamp": "2023-07-13T17:38:00.338310Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: {"set_number": 0, "block_type": "ExtractLoadBlocks", "success": false, "err": "RunnerError('Loader failed')", "exit_codes": {"loaders": 1}, "event": "Block run completed.", "level": "error", "timestamp": "2023-07-13T17:38:00.338683Z"}
meltano-workflow-h7c8z-extract-and-load-2678913587: Need help fixing this problem? Visit http://melta.no/ for troubleshooting steps, or to
meltano-workflow-h7c8z-extract-and-load-2678913587: join our friendly Slack community.
@menzenski menzenski added kind/Bug Something isn't working valuestream/SDK labels Jul 13, 2023
@edgarrmondragon
Copy link
Collaborator

Yeah, it seems rather safe to skip but log the field if it's not in the schema

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/Bug Something isn't working valuestream/SDK
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants