Converting model to pytorch #4712

rlpatrao · 2020-06-01T22:44:20Z

🐛 Bug

Folks, I am trying to convert the Biobert model to Pytorch. Here are the things that I did so far:

1. For the vocab: I am trying to convert the vocab using solution from #69 :
tokenizer = BartTokenizer.from_pretrained('/content/biobert_v1.1_pubmed/vocab.txt')

I get :
OSError: Model name '/content/biobert_v1.1_pubmed' was not found in tokenizers model name list (bart-large, bart-large-mnli, bart-large-cnn, bart-large-xsum). We assumed '/content/biobert_v1.1_pubmed' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.

I don’t have the vocab.json, so I how do I convert the vocab for the tokenizer ?

2. For the model: As the out of the box pytorch_pretrained_bert.convert_tf_checkpoint_to_pytorch did not work I customized it per #2 by adding:

excluded = ['BERTAdam','_power','global_step']
init_vars = list(filter(lambda x:all([True if e not in x[0] else False for e in excluded]),init_vars))

With this the model 'seems' to be converting fine. But When I load this using:

model = BartForConditionalGeneration.from_pretrained('path/to/model/biobert_v1.1_pubmed_pytorch.model')

I still get

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Can you pl. help me to understand what is going on here ?

The text was updated successfully, but these errors were encountered:

stale · 2020-08-01T04:18:28Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale bot added the wontfix label Aug 1, 2020

stale bot closed this as completed Aug 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting model to pytorch #4712

Converting model to pytorch #4712

rlpatrao commented Jun 1, 2020

stale bot commented Aug 1, 2020

Converting model to pytorch #4712

Converting model to pytorch #4712

Comments

rlpatrao commented Jun 1, 2020

🐛 Bug

stale bot commented Aug 1, 2020