Collaborate #2

Amelie-Schreiber · 2023-09-01T00:44:07Z

I'm very interested in replicating your work and would like to train a diffusion model to generate protein binding partners similar to what RFDiffusion accomplishes, but I would like to use ESM-2 models as you have done. If you are open to collaborating, feel free to reach out if you have the time. Also, would you be able to create a tutorial similar to this?

pengzhangzhi · 2023-09-01T04:43:34Z

hi there! I am open to collaboration on interesting works. You may want to discuss your ideas and implementation details with me?

best,
zhangzhi

Amelie-Schreiber · 2023-09-01T07:24:55Z

Hi, I am relatively new to training diffusion models. I have only fine-tuned ESM-2 models for sequence classification and for token classification. Are you using `EsmForProteinFolding` as the backbone in your diffusion model? If so, I don't believe I have access to a good enough GPU to train it. My GPUs are too small unless a smaller model can be used. I hope that I am wrong, or that another ESM-2 model can be used that is smaller. Otherwise I am stuck and unable to train. I am having trouble understanding your code also and was hoping we might work on writing a notebook similar to this: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb Thanks for responding! Amelie

…

On Thu, Aug 31, 2023 at 9:43 PM Zhangzhi Peng ***@***.***> wrote: hi there! I am open to collaboration on interesting works. You may want to discuss your ideas and implementation details with me? best, zhangzhi — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMIK6IGP3CHAHK3NDFWIGATXYFRYBANCNFSM6AAAAAA4G2PBNE> . You are receiving this because you authored the thread.Message ID: ***@***.*** com>

pengzhangzhi · 2023-09-01T14:37:40Z

hi, the training is pretty cheap. I can fit the model in a 10g GPU. Regarding the documentation, please follow the readme to install pkgs and train the model. Please let me know which parts confuse you.

best,
Zhangzhi

Amelie-Schreiber · 2023-09-01T22:40:45Z

Could you find me on discord?
Also, could I use Hugging Face's accelerator to do data parallelization to split training across two 8GB GPUs? If so, that might work...

EDIT: I've tried training on a P100 GPU (using a colab instance) and it doesn't seems to work. My training script must not be setup correctly or something.

pengzhangzhi · 2023-09-03T21:45:14Z

Hi,

I don't have discord, sorry.
I have not tested the code on 8g gpus. By reducing the batch size, the memory consumption would be reduced to fit in the 8g memory you have.
I use accelerator a lot; it is very simple and easy to use. Data parallelization may work in that case.
You can call Python scripts and functions from a script in notebook.

Amelie-Schreiber · 2023-09-06T04:07:32Z

Hi! I tried following the install instruction and I am having some issues. First, there seems to be a mistake in the install instructions. I believe you need

cd protein-sequence-diffusion-model

instead of

cd denoising_diffusion_protein_sequence

Also. Once everything is installed, I am getting the following error:

(esm2d) C:\Users\OWO\Desktop\amelie_vscode\esmd\protein-sequence-diffusion-model\denoising_diffusion_pytorch>python pl_train.py --max_epochs 1 --fas_dpath seq_data/fas
C:\Users\OWO\anaconda3\envs\esm2d\lib\site-packages\Bio\pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been deprecated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopython developers if you still need the Bio.pairwise2 module.
  warnings.warn(
C:\Users\OWO\anaconda3\envs\esm2d\lib\site-packages\torchaudio\backend\utils.py:74: UserWarning: No audio backend is available.
  warnings.warn("No audio backend is available.")
seq_data/fas\seqs.a3m already exists.
Traceback (most recent call last):
  File "C:\Users\OWO\Desktop\amelie_vscode\esmd\protein-sequence-diffusion-model\denoising_diffusion_pytorch\pl_train.py", line 205, in <module>
    train(args)
  File "C:\Users\OWO\Desktop\amelie_vscode\esmd\protein-sequence-diffusion-model\denoising_diffusion_pytorch\pl_train.py", line 187, in train
    trainer = pl.Trainer(
  File "C:\Users\OWO\anaconda3\envs\esm2d\lib\site-packages\pytorch_lightning\utilities\argparse.py", line 70, in insert_env_defaults
    return fn(self, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'gpus'

pengzhangzhi · 2023-09-06T14:34:15Z

I guess the error is because the pytorch lightning version is updated and they stop using gpus as an argument.
please set accelerator="auto"
https://lightning.ai/docs/pytorch/stable/common/trainer.html

use trainer = pl.Trainer(max_epochs=20,accelerator="auto")
Ref:
https://stackoverflow.com/a/76193000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collaborate #2

Collaborate #2

Amelie-Schreiber commented Sep 1, 2023 •

edited

Loading

pengzhangzhi commented Sep 1, 2023

Amelie-Schreiber commented Sep 1, 2023 via email

pengzhangzhi commented Sep 1, 2023

Amelie-Schreiber commented Sep 1, 2023 •

edited

Loading

pengzhangzhi commented Sep 3, 2023 •

edited

Loading

Amelie-Schreiber commented Sep 6, 2023

pengzhangzhi commented Sep 6, 2023

Collaborate #2

Collaborate #2

Comments

Amelie-Schreiber commented Sep 1, 2023 • edited Loading

pengzhangzhi commented Sep 1, 2023

Amelie-Schreiber commented Sep 1, 2023 via email

pengzhangzhi commented Sep 1, 2023

Amelie-Schreiber commented Sep 1, 2023 • edited Loading

pengzhangzhi commented Sep 3, 2023 • edited Loading

Amelie-Schreiber commented Sep 6, 2023

pengzhangzhi commented Sep 6, 2023

Amelie-Schreiber commented Sep 1, 2023 •

edited

Loading

Amelie-Schreiber commented Sep 1, 2023 •

edited

Loading

pengzhangzhi commented Sep 3, 2023 •

edited

Loading