Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练时遇到的错误 #4

Open
DHHWILL opened this issue Sep 27, 2024 · 11 comments
Open

训练时遇到的错误 #4

DHHWILL opened this issue Sep 27, 2024 · 11 comments

Comments

@DHHWILL
Copy link

DHHWILL commented Sep 27, 2024

您好,根据您给的操作步骤执行,使用样例的数据时遇到了以下错误
[W C:\cb\pytorch_1000000000000\work\torch\csrc\CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors],然而我的设备是单卡,同时我并没有用过并行训练方法。是否是strategy = 'ddp_find_unused_parameters_false'的问题?但是我没有很好的解决方法
请问需要如何解决呢?期待您的回复,谢谢!

@Xcf-xcf
Copy link
Collaborator

Xcf-xcf commented Oct 4, 2024

请问您能提供更详细一点的信息吗?比如命令行命令以及配置文件等

@DHHWILL
Copy link
Author

DHHWILL commented Oct 4, 2024

对不起,是我提供的信息不全面,我用了/example/目录下的图片做测试,通过python TeethDreamer.py -b configs/TeethDreamer.yaml 等正常生成了多视角图片1832_lower_cond_000_000_000_000.png,但是在第六步步使用Neus方法时python run.py --img E:\mycode\TeethDreamer-main\output\1832_lower_cond_000_000_000_000.png --cpu 4 --dir E:\mycode\TeethDreamer-main\output\out --normal --rembg,出现了ZeroDivisionError: division by zero的除零错误,我又返回使用了第五步生成了0.jpg图片,也仍然是这个错误。配置文件和模型文件等均没有修改

@DHHWILL
Copy link
Author

DHHWILL commented Oct 5, 2024

File "E:\TeethDreamer-main\instant-nsr-pl\systems\neus.py", line 129, in training_step
train_num_rays = int(self.train_num_rays * (self.train_num_samples / out['num_samples_full'].sum().item()))
ZeroDivisionError: division by zero

@Xcf-xcf
Copy link
Collaborator

Xcf-xcf commented Oct 5, 2024

请问您是使用第五步手动将16张的新视角图片前景给抠出来了吗?如果所有前景都是完整的,按道理应该不会出现这种问题。这一问题一般是由于图片都是背景或者对应相机视角不匹配导致采样点均为背景所产生的。

@DHHWILL
Copy link
Author

DHHWILL commented Oct 5, 2024

是的,我使用第五步将新视角的前景扣出生成了一张新的图片,但是不管是
0
命令行为
python TeethDreamer.py -b configs/TeethDreamer.yaml --gpus 0 --test ckpt/TeethDreamer.ckpt --output E:\TeethDreamer-main\output data.params.test_dir=E:\TeethDreamer-main\output\segment
生成多视角图片后手动抠图生成0.png
python seg_foreground.py --img E:\TeethDreamer-main\output/oral_lower_cond_000_000_000_000.png --seg E:\TeethDreamer-main\output\seg/0.png
241005141108


python run.py --img E:\TeethDreamer-main\output\seg/0.png --cpu 4 --dir E:\TeethDreamer-main\output\reconstruction --normal --rembg
出现
File "E:\TeethDreamer-main\instant-nsr-pl\systems\neus.py", line 130, in training_step
train_num_rays = int(self.train_num_rays * (self.train_num_samples / out['num_samples_full'].sum().item()))
ZeroDivisionError: division by zero

@Xcf-xcf
Copy link
Collaborator

Xcf-xcf commented Oct 5, 2024

如果您手动抠出前景了,在命令行中就不需要加--rembg选项了。

@DHHWILL
Copy link
Author

DHHWILL commented Oct 5, 2024

我尝试删除了--rembg,仍然出现了相同的错误,然后我将第四步生成的oral_lower_cond_000_000_000_000.png输入到命令行中并且加上了--rembg选项
python run.py --img E:\TeethDreamer-main\output\seg/oral_lower_cond_000_000_000_000.png --cpu 4 --dir E:\TeethDreamer-main\output\reconstruction --normal --rembg
也是相同的错误
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\overrides\base.py", line 98, in forward
output = self._forward_module.training_step(*inputs, **kwargs)
File "E:\TeethDreamer-main\instant-nsr-pl\systems\neus.py", line 130, in training_step
train_num_rays = int(self.train_num_rays * (self.train_num_samples / out['num_samples_full'].sum().item()))
ZeroDivisionError: division by zero
我测试了upper和lower的图片,错误也完全一致

@DHHWILL
Copy link
Author

DHHWILL commented Oct 5, 2024

以下是/reconstruction/0/下的文件
train
train
test
test
val
val

看上去前景已经抠出,代码中
def training_step(self, batch, batch_idx):
out = self(batch)
loss = 0.
# update train_num_rays
if self.config.model.dynamic_ray_sampling:
train_num_rays = int(self.train_num_rays * (self.train_num_samples / out['num_samples_full'].sum().item()))
其中out['num_samples_full']为0,不知道来源于哪里
希望得到您的解答,谢谢!

@Xcf-xcf
Copy link
Collaborator

Xcf-xcf commented Oct 7, 2024

Sorry, I attend MICCAI2024 recently so the response may be late. In your case, the filename of manually segmented images in step 5 must include 'lower' or 'upper' letters which determines their camera poses.

@DHHWILL
Copy link
Author

DHHWILL commented Oct 8, 2024

抱歉多次打扰您,我按照您所说更改了图片名称后依然存在相同的问题
python run.py --img E:\TeethDreamer-main\output\seg/lower.png --cpu 4 --dir E\TeethDreamer-main\output\reconstruction --normal
而后我尝试不使用第五步、不修改文件名称从头走一遍流程
python run.py --img E:\TeethDreamer-main\output\1832_lower_cond_000_000_000_000.png --cpu 4 --dir E:\TeethDreamer-main\output\reconstruction --normal --rembg
依然是相同的问题
然后我尝试更改文件后缀如.jpg/.webp等也没有解决这个问题。
以下是完整的日志信息
(TeethDreamer) E:\TeethDreamer-main\instant-nsr-pl>python run.py --img E:\TeethDreamer-main\output\seg/lower.png --cpu 4 --dir E:\TeethDreamer-main\output\reconstruction --normal
Traceback (most recent call last):
File "tools.py", line 155, in
image_size = prepare_masked_img(args.input, os.path.join(args.output, 'train'), args.rembg, args.normal, args.real)
File "tools.py", line 106, in prepare_masked_img
data=imread(img_path)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\skimage\io_io.py", line 53, in imread
img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\skimage\io\manage_plugins.py", line 205, in call_plugin
return func(*args, **kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\skimage\io_plugins\imageio_plugin.py", line 11, in imread
out = np.asarray(imageio_imread(*args, **kwargs))
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\imageio\v3.py", line 53, in imread
with imopen(uri, "r", **plugin_kwargs) as img_file:
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\imageio\core\imopen.py", line 113, in imopen
request = Request(uri, io_mode, format_hint=format_hint, extension=extension)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\imageio\core\request.py", line 247, in init
self._parse_uri(uri)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\imageio\core\request.py", line 407, in _parse_uri
raise FileNotFoundError("No such file: '%s'" % fn)
FileNotFoundError: No such file: 'E:\TeethDreamer-main\output\seg\lower.png'
Global seed set to 42
Using 16bit None Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Trainer(limit_train_batches=1.0) was configured so 100% of the batches per epoch will be used..
Trainer(limit_val_batches=1) was configured so 1 batch will be used.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
fatal: not a git repository (or any of the parent directories): .git
E:\TeethDreamer-main\instant-nsr-pl\utils\callbacks.py:76: UserWarning: Code snapshot is not saved. Please make sure you have git installed and are in a git repository.
rank_zero_warn("Code snapshot is not saved. Please make sure you have git installed and are in a git repository.")

| Name | Type | Params

0 | cos | CosineSimilarity | 0
1 | model | NeuSModel | 14.0 M

14.0 M Trainable params
0 Non-trainable params
14.0 M Total params
27.955 Total estimated model params size (MB)
E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\lightning_fabric\loggers\csv_logs.py:183: UserWarning: Experiment logs directory E:\TeethDreamer-main\output\reconstruction\lower\neus\csv_logs exists and is not empty. Previous log files in this directory will be deleted when the new ones are saved!
rank_zero_warn(
Traceback (most recent call last):
File "launch.py", line 129, in
main()
File "launch.py", line 118, in main
trainer.fit(system, datamodule=dm)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 608, in fit
call._call_and_handle_interrupt(
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 650, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1103, in _run
results = self._run_stage()
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1182, in _run_stage
self._run_train()
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1205, in _run_train
self.fit_loop.run()
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
self.advance(*args, **kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 267, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
self.advance(*args, **kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 213, in advance
batch_output = self.batch_loop.run(kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
self.advance(*args, **kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\batch\training_batch_loop.py", line 88, in advance
outputs = self.optimizer_loop.run(optimizers, kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
self.advance(*args, **kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 202, in advance
result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 249, in _run_optimization
self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 370, in _optimizer_step
self.trainer._call_lightning_module_hook(
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1347, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\core\module.py", line 1744, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\core\optimizer.py", line 169, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\strategies\strategy.py", line 234, in optimizer_step
return self.precision_plugin.optimizer_step(
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\plugins\precision\native_amp.py", line 75, in optimizer_step
closure_result = closure()
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 149, in call
self._result = self.closure(*args, **kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 135, in closure
step_output = self._step_fn()
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 419, in _training_step
training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1485, in _call_strategy_hook
output = fn(*args, **kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\strategies\dp.py", line 134, in training_step
return self.model(*args, **kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\torch\nn\parallel\data_parallel.py", line 169, in forward
return self.module(*inputs[0], **kwargs[0])
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\overrides\data_parallel.py", line 77, in forward
output = super().forward(*inputs, **kwargs)
File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\overrides\base.py", line 98, in forward
output = self._forward_module.training_step(*inputs, **kwargs)
File "E:\TeethDreamer-main\instant-nsr-pl\systems\neus.py", line 129, in training_step
train_num_rays = int(self.train_num_rays * (self.train_num_samples / out['num_samples_full'].sum().item()))
ZeroDivisionError: division by zero
Epoch 0: : 0it [00:49, ?it/s]
[W C:\cb\pytorch_1000000000000\work\torch\csrc\CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

希望得到您的帮助,谢谢!

@yangyang117
Copy link

你好,请问问题解决了吗?
我也在用示例的数据做复现,我发现两个问题:
1.用example里面的示例,我用toothdreamer生成的上牙和下牙,从图片上来看没什么差异,而且和口腔照片不吻合(口腔照片上牙有明显的缺损,且上牙区域的切牙比下牙区域的切牙要大,但是生成的图片看不出来这些差异)
2.用instant nsr pl生成3D模型,20000轮生成的图片效果很差,而且无法进行下一步的geometry部分的isosurface,生成的mesh维度为0.

不知道你复现的时候有没有遇到这些问题?可以互相讨论一下吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants