fine-tuning with more protein sequences #4

avilella · 2024-08-13T10:47:12Z

Hi, I have a corpus of about 500,000 protein sequences and would like to apply them to existing models like this one for predicting the evolution of monoclonal antibody binding to an epitope.
How could I add my sequences to the models referred in this repo to then use the modified model for such task? Thanks.

idmjky · 2024-08-13T13:55:43Z

Hi, this very much depends on whether you have functional binding data for these 500k sequences. If you do, then you can format them in a csv file with the sequence and their measured binding data and just use that as your first round data to the model. If you just have the sequences without any functional binding data, then you can only fine-tune the base PLM (ESM2 in this case), please refer to fine-tuning of ESM2 on their official github and follow the advice there. After you complete the fine-tuning, you can use your model as the base layer PLM to generate embedding for EVOLVEpro.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fine-tuning with more protein sequences #4

fine-tuning with more protein sequences #4

avilella commented Aug 13, 2024

idmjky commented Aug 13, 2024

fine-tuning with more protein sequences #4

fine-tuning with more protein sequences #4

Comments

avilella commented Aug 13, 2024

idmjky commented Aug 13, 2024