Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

put_attributions_and_associations hang forever if big array size #118

Open
drsagitn opened this issue Jun 2, 2021 · 6 comments
Open

put_attributions_and_associations hang forever if big array size #118

drsagitn opened this issue Jun 2, 2021 · 6 comments

Comments

@drsagitn
Copy link

drsagitn commented Jun 2, 2021

Hi,

In python mlmd, when i called store.put_attributions_and_associations(attribution_arr, association_arr) with big array size of attribution (about 1500) then the function hang forever. Making chunk of 100 work but very slow. Is there performance issue in that function?

@hughmiao
Copy link
Contributor

hughmiao commented Jun 2, 2021

@drsagitn thanks for letting us know the issue. The call is not optimized for the large arrays, currently for each element it does validation and insert. let's keep the issue open and fix it in the next release.

A couple questions about the usage: What is the backend and deployment setting? And how many artifacts, executions, contexts in your instances? Curious to know the use case here too! :)

@drsagitn
Copy link
Author

drsagitn commented Jun 2, 2021

That is the very first artifact insertion to the db. I tried to insert 1500 artifacts describing image metadata of a dataset. There is about 1 or 2 executions and context inserted before.

@hughmiao
Copy link
Contributor

hughmiao commented Jun 2, 2021

interesting, previously i thought this was a big shared instance. what is the physical db and deployment settings here (a shared mysql with a kfp server, a nfs sqlite file?). could this cause by the deployment?

about the use case, qq, have you considered to have a single dataset as an artifact, and that artifact has 1500 properties, each of which describing an image?

@drsagitn
Copy link
Author

drsagitn commented Jun 3, 2021

The database is mysql 8.0 hosted by GCP. It is newly setup db and only used for mlmd.

For usecase, I do have dataset artifact as well. Dataset has versions (the context). Each version manages a list of committed images.

Image metadata has annotation and others properties which is pretty long, it couldn't be fitted into a string_value property of dataset which is only 65535 bytes length

@hughmiao
Copy link
Contributor

hughmiao commented Jun 3, 2021

got it. thanks for the info. one thing to tune the deployment setting is see whether native sql insertion to that db has latency issue. let's also keep this issue open to optimize the call for large arrays too.

/cc @BrianSong

@drsagitn
Copy link
Author

drsagitn commented Jun 3, 2021

I tested native sql insertion take only 3s to complete 1500 records

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants