You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How can I use pytorch's dataset to fine-tune llama3.1.
When I try to use pytorch's dataset, I keep getting the following errors related to collator:
File ~/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/data/data_collator.py:589, in
...
#labels = [feature[label_name] for feature in features] if label_name in features[0].keys() else None
# reconvert list[None] to None if necessary
# this might occur when we pass {..., "labels": None}
AttributeError: 'str' object has no attribute 'keys'
The reason is that I want to add noise to the word (data-augmentation) and the dataset is dynamic as below.
def __getitem__(self, idx):
# only add noise to input text
# tmp = self.data[idx]
true_qry = [self.data['true_qry'][idx]](url)
if random.random() < self.noise_prob:
sample_edit_distance = random.randint(1, self.max_edit_distance)
input_qry = self.add_noise(true_qry, sample_edit_distance)
else:
input_qry = true_qry
And then I follow the fine-tune scipt and use chatml template
@danielhanchen Really appreciate for your reply. Supposing we do not do converserter, is possible just FT llama3.1 with SFTTrainer using: 1) pytorch dataset using data augamentation; 2) chatml format;
I tried several methods, but seem that SFTTrainer do not tokenize my chatml input and throw "'str' object has no attribute 'keys'" error.
How can I use pytorch's dataset to fine-tune llama3.1.
When I try to use pytorch's dataset, I keep getting the following errors related to collator:
The reason is that I want to add noise to the word (data-augmentation) and the dataset is dynamic as below.
And then I follow the fine-tune scipt and use chatml template
The trainer is as below:
The text was updated successfully, but these errors were encountered: