Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with data construction #4

Open
HaotianZhangAI4Science opened this issue Apr 23, 2023 · 2 comments
Open

Issues with data construction #4

HaotianZhangAI4Science opened this issue Apr 23, 2023 · 2 comments

Comments

@HaotianZhangAI4Science
Copy link

HaotianZhangAI4Science commented Apr 23, 2023

Nice work! I want to retrain this on my own new datasets. I found in the dataset_generation you mentioned Each script in Step 6 takes approximately 3 days to complete. Do you mean you run the step 6, which contains 15 iterations, for about 45 days with 24 CPU cores?

@keiradams
Copy link
Owner

Yes, that sounds right! Note that this compute cost was specifically for my particular dataset of ~1M molecules. Depending on the types of molecules you're interested in (# of atoms, # of rotatable bonds, etc), you could potentially need much fewer molecules to train the network -- the cost of this data generation would scale down linearly in that case.

It's also completely possible that I did not need all 1M molecules to train my published model; I did not analyze the model's performance degradation with decreasing dataset size.

@HaotianZhangAI4Science
Copy link
Author

Thanks for your kind response, I would try retraining SQUID on a smaller dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants