-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoTokenizer中的模型应当是什么呢?Bert吗? #2
Comments
你好,你这里 BertTokenizer.from_pretrained(model_name_or_path)中的参数设置不对。model_name_or_path要么是bert-base-uncased,要么是一个目录,目录下面有名为vocab.txt的文件。 |
如果要是这个参数设置的不对,那么代码应当在这行报错而不是在其他地方。 如果我按照您说的方式,报错如下: tokenizer=BertTokenizer.from_pretrained('/home/lsk/python/code/bert-base-uncased/bert-base-uncased-vocab.txt') batch=tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="pt") print(batch) |
你好,我测试了一下,应该还是tokenizer加载上有问题。 使用下面的加载方式,这样是会报错的。
错误信息如下:
注意,/data10T/zhangyice/2023/pretrained_models/bert-base-uncased/目录下面需要有config.json和vocab.txt;另外,将上面的路径替换为/data10T/zhangyice/2023/pretrained_models/bert-base-uncased/vocab.txt,会报一样的错。 但是,使用下面这种加载方式的,是不会报错的。
|
KeyError: 'Invalid key. Only three types of key are available: (1) string, (2) integers for backend Encoding, and (3) slices for data subsetting.' |
由于连接不到huggingface所以我将bert模型下载到了本地,我将代码修改如下:
try:
self.tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
except:
self.tokenizer = BertTokenizer.from_pretrained('/home/lsk/python/code/bert-base-uncased/bert-base-uncased-vocab.txt', use_fast=True)
和class ASTE(pl.LightningModule):
def init(self, hparams, data_module):
super().init()
self.save_hyperparameters(hparams)
self.data_module = data_module
为什么会出现
Original Traceback (most recent call last):
File "/home/lsk/miniconda3/envs/de/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/lsk/miniconda3/envs/de/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/home/lsk/BDTF/code/utils/aste_datamodule.py", line 82, in call
batch = self.tokenizer_function(examples)
File "/home/lsk/BDTF/code/utils/aste_datamodule.py", line 154, in tokenizer_function
encoding = batch_encodings[i]
File "/home/lsk/miniconda3/envs/de/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 247, in getitem
raise KeyError(
KeyError: 'Invalid key. Only three types of key are available: (1) string, (2) integers for backend Encoding, and (3) slices for data subsetting.'这样的问题
请问Autotokenizer和Autoconfig中的模型是什么?麻烦回答一下谢谢
The text was updated successfully, but these errors were encountered: