Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix db error reporting #1

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

peterwilton-ttam
Copy link
Owner

@peterwilton-ttam peterwilton-ttam commented Jan 5, 2021

From what I can tell, this will fix the obfuscation of the error message contained in the DBResponse.body.

Some context, from Tim:


I'm trying to take a look at the artifacts associated with one of our SFN executions but when trying to call:

Step(...).task.data

We run into the following error:

ServiceException: Metadata request (/flows/{Flow}/runs/{Run}/steps/start/tasks/{Task}/artifacts) failed (code 500): 500 Internal Server Error

Looking at the logs from our metadata service, I see the following error:

Traceback (most recent call last):
    File "/opt/latest/lib/python3.7/site-packages/aiohttp/web_protocol.py", line 418, in start
        resp = await task
    File "/opt/latest/lib/python3.7/site-packages/aiohttp/web_app.py", line 458, in _handle
        resp = await handler(request)
    File "/opt/latest/lib/python3.7/site-packages/services/metadata_service/api/artifact.py", line 140, in get_artifacts_by_task
        artifacts.body)
    File "/opt/latest/lib/python3.7/site-packages/services/metadata_service/api/artifact.py", line 355, in _filter_artifacts_by_attempt_id
        attempt_id = ArtificatsApi._get_latest_attempt_id(artifacts)
    File "/opt/latest/lib/python3.7/site-packages/services/metadata_service/api/artifact.py", line 349, in _get_latest_attempt_id
        if artifact['attempt_id'] > attempt_id:
TypeError: string indices must be integers

From what I can tell, this can only occur when the result of this call: https://github.com/Netflix/metaflow-service/blob/b656d42cfb946b5ec64ef15b4f5e86a1505a4e90/services/data/postgres_async_db.py#L112-L156 returns a DBResponse object with a body of type string. And it looks like this would only happen in the case of error handling. Unfortunately it looks like the underlying error message is obfuscated by this error.


@@ -137,7 +137,7 @@ async def get_artifacts_by_task(self, request):
)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._async_table.get_artifact_in_task(...) returns a DBResponse with .body attribute of type string and a .response_code of 500, indicating an error. The string-type body is set in db_utils.aiopg_exception_handling.

ArtifactsApi._filter_artifacts_by_attempt_id(...) expects artifacts.body to be a list, which is what get_artifact_in_task returns when no error has occurred (and fetch_single==False, which is the case in get_artifact_in_task). In the event of no error, .response_code == 200.

@nmcquay
Copy link

nmcquay commented Jan 5, 2021

If this works for us, will we create the same pull request for the main project?

@peterwilton-ttam
Copy link
Owner Author

If this works for us, will we create the same pull request for the main project?

Yeah, I think that's the plan. Tim has fixed the underlying problem that caused an error to be raised in the first place, so I think that fix and this will go into a single PR for Netflix's repo. Tim's fix changed the db schema slightly (adding an index appropriate for the SFN runs), so he said he would need to define a migration as well.

@nmcquay
Copy link

nmcquay commented Jan 22, 2021

Sorry I thought I approved this with my earlier comment.

@peterwilton-ttam
Copy link
Owner Author

Oops, this was on the backburner. I'm just going to put in a PR with metaflow-service. Thanks for reviewing it.

@peterwilton-ttam
Copy link
Owner Author

Here is the PR: Netflix#93

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants