Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛BUG] ValueError: I/O operation on closed file. #2058

Open
zw81929 opened this issue Jun 12, 2024 · 1 comment
Open

[🐛BUG] ValueError: I/O operation on closed file. #2058

zw81929 opened this issue Jun 12, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@zw81929
Copy link

zw81929 commented Jun 12, 2024

描述这个 bug

feat[field].fillna(value=feat[field].mean(), inplace=True)
12 Jun 10:59 INFO Saving filtered dataset into [saved/bert4recbole-SequentialDataset.pth]
12 Jun 10:59 INFO bert4recbole
The number of users: 93328
Average actions of users: 2185.097806636879
The number of items: 93329
Average actions of items: 2185.1212202387333
The number of inters: 203928623
The sparsity of the dataset: 97.65874016271815%
Remain Fields: ['user_id', 'item_id', 'timestamp', 'area_id']
12 Jun 11:45 INFO Saving split dataloaders into: [saved/bert4recbole-for-BERT4Rec-dataloader.pth]
Traceback (most recent call last):
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/torch/serialization.py", line 632, in save
_legacy_save(obj, opened_file, pickle_module, pickle_protocol)
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/torch/serialization.py", line 776, in _legacy_save
storage._write_file(f, _should_read_directly(f), True, torch._utils._element_size(dtype))
MemoryError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data1/bert4rec/bert4rec-main/scripts/bole/run.py", line 7, in
run_recbole(model='BERT4Rec', dataset=r'bert4recbole',
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/recbole/quick_start/quick_start.py", line 133, in run_recbole
train_data, valid_data, test_data = data_preparation(config, dataset)
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/recbole/data/utils.py", line 194, in data_preparation
save_split_dataloaders(
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/recbole/data/utils.py", line 99, in save_split_dataloaders
pickle.dump(Serialization_dataloaders, f)
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/torch/storage.py", line 951, in reduce
torch.save(self, b, _use_new_zipfile_serialization=False)
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/torch/serialization.py", line 631, in save
with _open_file_like(f, 'wb') as opened_file:
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/torch/serialization.py", line 439, in exit
self.file_like.flush()
ValueError: I/O operation on closed file.

如何复现
复现这个 bug 的步骤:
yaml 文件
gpu_id: '0,1,2,3'
worker: 0

model config

n_layers: 2 # (int) The number of transformer layers in transformer encoder.
n_heads: 2 # (int) The number of attention heads for multi-head attention layer.
hidden_size: 64 # (int) The number of features in the hidden state.
inner_size: 256 # (int) The inner hidden size in feed-forward layer.
hidden_dropout_prob: 0.2 # (float) The probability of an element to be zeroed.
attn_dropout_prob: 0.2 # (float) The probability of an attention score to be zeroed.
hidden_act: 'gelu' # (str) The activation function in feed-forward layer.
layer_norm_eps: 1e-12 # (float) A value added to the denominator for numerical stability.
initializer_range: 0.02 # (float) The standard deviation for normal initialization.
mask_ratio: 0.2 # (float) The probability for a item replaced by MASK token.
loss_type: 'CE' # (str) The type of loss function.
transform: mask_itemseq # (str) The transform operation for batch data process.
ft_ratio: 0.5 # (float) The probability of generating fine-tuning samples

dataset config

field_separator: "," #指定数据集field的分隔符
seq_separator: " " #指定数据集中token_seq或者float_seq域里的分隔符
USER_ID_FIELD: user_id #指定用户id域
ITEM_ID_FIELD: item_id #指定物品id域
TIME_FIELD: timestamp #指定时间域
MAX_ITEM_LIST_LENGTH: 50 #指定最大序列长度
save_dataset: True #是否保存处理后的数据到本地
save_dataloaders: Ture #是否保存加载数据的方式

#指定从什么文件里读什么列,这里就是从ml-1m.inter里面读取user_id, item_id, rating, timestamp这四列,剩下的以此类推
load_col:
inter: [user_id, item_id, timestamp]
item: [item_id, area_id]

training settings

epochs: 500 #训练的最大轮数
train_batch_size: 128 #训练的batch_size
learner: adam #使用的pytorch内置优化器
learning_rate: 0.001 #学习率
training_neg_sample_num: 0 #负采样数目
eval_step: 1 #每次训练后做evalaution的次数
stopping_step: 10 #控制训练收敛的步骤数,在该步骤数内若选取的评测标准没有什么变化,就可以提前停止了

evalution settings

eval_setting: TO_LS,full #对数据按时间排序,设置留一法划分数据集,并使用全排序
metrics: ["Recall", "MRR","NDCG","Hit","Precision"] #评测标准
valid_metric: MRR@10 #选取哪个评测标准作为作为提前停止训练的标准
eval_batch_size: 8 #评测的batch_size

show_progress: True

预期
出现文件关闭的错误

实验环境(请补全下列信息):

  • 操作系统: ubuntu
  • RecBole 版本 1.2.0
  • Python 版本3.10.14
  • PyTorch 版本2.3.0
  • cudatoolkit CUDA Version: 12.3 Driver Version: 545.23.06
@zw81929 zw81929 added the bug Something isn't working label Jun 12, 2024
@zw81929 zw81929 changed the title [🐛BUG] 用一句话描述您的问题。 [🐛BUG] ValueError: I/O operation on closed file. Jun 12, 2024
@zhengbw0324 zhengbw0324 self-assigned this Jun 24, 2024
@zhengbw0324
Copy link
Collaborator

@zw81929
不建议将save_dataset和save_dataloaders都设置为True。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants