Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fine-tuning with more protein sequences #4

Open
avilella opened this issue Aug 13, 2024 · 1 comment
Open

fine-tuning with more protein sequences #4

avilella opened this issue Aug 13, 2024 · 1 comment

Comments

@avilella
Copy link

Hi, I have a corpus of about 500,000 protein sequences and would like to apply them to existing models like this one for predicting the evolution of monoclonal antibody binding to an epitope.
How could I add my sequences to the models referred in this repo to then use the modified model for such task? Thanks.

@idmjky
Copy link
Owner

idmjky commented Aug 13, 2024

Hi, this very much depends on whether you have functional binding data for these 500k sequences. If you do, then you can format them in a csv file with the sequence and their measured binding data and just use that as your first round data to the model. If you just have the sequences without any functional binding data, then you can only fine-tune the base PLM (ESM2 in this case), please refer to fine-tuning of ESM2 on their official github and follow the advice there. After you complete the fine-tuning, you can use your model as the base layer PLM to generate embedding for EVOLVEpro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants