Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[datalabeling] debug failing tests #3759

Closed
tmatsuo opened this issue May 14, 2020 · 7 comments
Closed

[datalabeling] debug failing tests #3759

tmatsuo opened this issue May 14, 2020 · 7 comments
Assignees
Labels
api: datalabeling Issues related to the AI Platform Data Labeling Service API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. testing type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@tmatsuo
Copy link
Contributor

tmatsuo commented May 14, 2020

We will disable label_text_test.py::test_label_text and manage_dataset_test.py::test_list_dataset in datalabeling directory.

Failed build

It seems like

  • the dataset is considered empty for label_text_test.py::test_label_text.
  • the test backend is throwing DeadlineExceeded for manage_dataset_test.py::test_list_dataset.

This bug is to debug them and re-enable them if possible.

@tmatsuo tmatsuo added testing priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. api: datalabeling Issues related to the AI Platform Data Labeling Service API. labels May 14, 2020
@tmatsuo tmatsuo assigned tmatsuo and unassigned busunkim96 May 14, 2020
@tmatsuo
Copy link
Contributor Author

tmatsuo commented May 14, 2020

History of the test_label_text:
https://github.com/GoogleCloudPlatform/python-docs-samples/commits/master/datalabeling/label_text_test.py

After this commit, we had several successful builds, so the last commit is not the direct culprit.

@tmatsuo tmatsuo changed the title [datalabeling] debug a failing test [datalabeling] debug failing tests May 14, 2020
@tmatsuo
Copy link
Contributor Author

tmatsuo commented May 14, 2020

Well I tried to repro, but manage_dataset_test.py::test_list_dataset is now constantly passing.

@tmatsuo
Copy link
Contributor Author

tmatsuo commented May 14, 2020

For label_text_test.py::test_label_text, it might be a permission issue with the datalabeling agent:
service-1012616486416@gcp-sa-test-datalabeling.iam.gserviceaccount.com

Update:
I don't think it's the permission issue. The data files are all public.

@tmatsuo
Copy link
Contributor Author

tmatsuo commented May 15, 2020

I tracked down the issue to this point.

For that particular test, we create a dataset and import a CSV file to it. The import job succeeds, but for some reason, the import job doesn't import anything (both total_count and import_count in the result are 0).

The same code for importing just works on the production endpoint.

Here is the code for repro. You can put it in datalabeling and run it with the required env vars.

import.py

import os

import backoff
from google.api_core.exceptions import DeadlineExceeded
import pytest

import label_text
import testing_lib

PROJECT_ID = os.getenv('GCLOUD_PROJECT')
INPUT_GCS_URI = 'gs://cloud-samples-data/datalabeling/text/input.csv'
INSTRUCTION_GCS_URI = ('gs://cloud-samples-data/datalabeling'
                       '/instruction/test.pdf')


# create a temporary dataset
dataset = testing_lib.create_dataset(PROJECT_ID)

result = testing_lib.import_data(dataset.name, 'TEXT', INPUT_GCS_URI)

print("Total count: {}".format(result.total_count))
print("Import count: {}".format(result.import_count))

If you run it against the production, it successfully import the data, but on the testing server, no data is imported.

@yoshi-automation yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Aug 13, 2020
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Nov 10, 2020
@tmatsuo
Copy link
Contributor Author

tmatsuo commented Nov 16, 2020

@busunkim96 Can you transfer this issue to googleapis/python-datalabeling?

@busunkim96
Copy link
Contributor

It's unfortunately not possible to transfer issues between orgs. I opened a new issue on that repo.

@munkhuushmgl
Copy link
Contributor

munkhuushmgl commented Nov 16, 2020

@busunkim96 Can u close this since u duplicated ? I will start the work from there

@tmatsuo tmatsuo closed this as completed Nov 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datalabeling Issues related to the AI Platform Data Labeling Service API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. testing type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

4 participants