Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch Problem! #6

Closed
bassemmohammed opened this issue Apr 16, 2024 · 3 comments
Closed

Torch Problem! #6

bassemmohammed opened this issue Apr 16, 2024 · 3 comments

Comments

@bassemmohammed
Copy link

I am getting this error. what is a potential solution?

`04-16 18:50:06, INFO Start preparing subvolumes!
04-16 18:50:14, INFO Done preparing subvolumes!
04-16 18:50:14, INFO Start training!
04-16 18:50:16, INFO Port number: 55179
learning rate 0.0003
['isonet_maps/J527_004_volume_map_half_A_data', 'isonet_maps/J527_004_volume_map_half_B_data']
0%| | 0/125 [00:00<?, ?batch/s][rank3]:[2024-04-16 18:50:31,316] [0/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
[rank0]:[2024-04-16 18:50:31,318] [0/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
[rank1]:[2024-04-16 18:50:31,319] [0/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
[rank2]:[2024-04-16 18:50:31,336] [0/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
/tmp/tmpsmozudmb/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpsmozudmb/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpsmozudmb/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpsmozudmb/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmpsmozudmb/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpp2i9rprr/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpp2i9rprr/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpp2i9rprr/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpp2i9rprr/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmpp2i9rprr/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpry1_48xe/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpry1_48xe/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpry1_48xe/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpry1_48xe/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmpry1_48xe/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpjvtra_tg/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpjvtra_tg/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpjvtra_tg/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpjvtra_tg/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmpjvtra_tg/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpmmftkvn2/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpmmftkvn2/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpmmftkvn2/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpmmftkvn2/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmpmmftkvn2/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpbi1c9pol/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpbi1c9pol/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpbi1c9pol/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpbi1c9pol/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmpbi1c9pol/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpnhv369rn/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpnhv369rn/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpnhv369rn/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpnhv369rn/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmpnhv369rn/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpk6tpnhg2/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpk6tpnhg2/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpk6tpnhg2/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpk6tpnhg2/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmpk6tpnhg2/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmplgr2pn1x/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmplgr2pn1x/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmplgr2pn1x/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmplgr2pn1x/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmplgr2pn1x/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpfso28o05/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpfso28o05/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpfso28o05/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpfso28o05/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmpfso28o05/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpau0cv_zl/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpau0cv_zl/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpau0cv_zl/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpau0cv_zl/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmpau0cv_zl/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmp2hxvb5sf/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmp2hxvb5sf/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmp2hxvb5sf/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmp2hxvb5sf/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmp2hxvb5sf/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
0%| | 0/125 [00:12<?, ?batch/s]
Traceback (most recent call last):
File "/home/bassem.mohammed/.conda/envs/spisonet/bin/spisonet.py", line 8, in
sys.exit(main())
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/spIsoNet/bin/spisonet.py", line 549, in main
fire.Fire(ISONET)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/spIsoNet/bin/spisonet.py", line 182, in reconstruct
map_refine_n2n(halfmap1,halfmap2, mask_vol, fsc3d, alpha = alpha,beta=beta, voxel_size=voxel_size, output_dir=output_dir,
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/spIsoNet/bin/map_refine.py", line 145, in map_refine_n2n
network.train([data_dir_1,data_dir_2], output_dir, alpha=alpha,beta=beta, output_base=output_base0, batch_size=batch_size, epochs = epochs, steps_per_epoch = 1000,
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/spIsoNet/models/network_n2n.py", line 265, in train
mp.spawn(ddp_train, args=(self.world_size, self.port_number, self.model,alpha,beta,
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 241, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
while not context.join():
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 158, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 3 terminated with the following error:
Traceback (most recent call last):
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 68, in _wrap
fn(i, *args)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/spIsoNet/models/network_n2n.py", line 116, in ddp_train
preds = model(x1)# + noise.cuda())
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/spIsoNet/models/unet.py", line 97, in forward
x, down_sampling_features = self.encoder(x)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/spIsoNet/models/unet.py", line 98, in resume_in_forward
x = self.decoder(x, down_sampling_features)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 652, in catch_errors
return hijacked_callback(frame, cache_entry, hooks, frame_state)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 727, in _convert_frame
result = inner_convert(frame, cache_entry, hooks, frame_state)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 383, in _convert_frame_assert
compiled_product = _compile(
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 646, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 244, in time_wrapper
r = func(*args, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 562, in compile_inner
out_code = transform_code_object(code, transform)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1033, in transform_code_object
transformations(instructions, code_options)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 151, in _fn
return fn(*args, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 527, in transform
tracer.run()
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2128, in run
super().run()
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 818, in run
and self.step()
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 781, in step
getattr(self, inst.opname)(inst)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2243, in RETURN_VALUE
self.output.compile_subgraph(
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 919, in compile_subgraph
self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1087, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 244, in time_wrapper
r = func(*args, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1159, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1140, in call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/dynamo/backends/distributed.py", line 312, in compile_fn
return self.backend_compile_fn(gm, example_inputs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/init.py", line 1668, in call
return compile_fx(model
, inputs
, config_patches=self.config)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1168, in compile_fx
return aot_autograd(
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 55, in compiler_fn
cg = aot_module_simplified(gm, example_inputs, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 887, in aot_module_simplified
compiled_fn = create_aot_dispatcher_function(
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 244, in time_wrapper
r = func(*args, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 600, in create_aot_dispatcher_function
compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 425, in aot_wrapper_dedupe
return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 630, in aot_wrapper_synthetic_base
return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 295, in aot_dispatch_autograd
compiled_fw_func = aot_config.fw_compiler(fw_module, adjusted_flat_args)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 244, in time_wrapper
r = func(*args, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1100, in fw_compiler_base
return inner_compile(
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 83, in debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/debug.py", line 305, in inner
return fn(*args, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 320, in compile_fx_inner
compiled_graph = fx_codegen_and_compile(
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 550, in fx_codegen_and_compile
compiled_fn = graph.compile_to_fn()
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1116, in compile_to_fn
return self.compile_to_module().call
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 244, in time_wrapper
r = func(*args, **kwargs)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1070, in compile_to_module
mod = PyCodeCache.load_by_key_path(
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 1892, in load_by_key_path
exec(code, mod.dict, mod.dict)
File "/tmp/torchinductor_bassem.mohammed/3s/c3sj43rjwf7es4dcg6diqwdbggn4vewhnvf5urwrpeue3m2txik5.py", line 67, in
async_compile.wait(globals())
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2486, in wait
scope[key] = result.result()
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2330, in result
kernel = self.kernel = _load_kernel(self.kernel_name, self.source_code)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2306, in _load_kernel
kernel.precompile()
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 188, in precompile
compiled_binary, launcher = self._precompile_config(
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 308, in _precompile_config
binary._init_handles()
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/triton/compiler/compiler.py", line 670, in _init_handles
bin_path = {driver.HIP: "hsaco_path", driver.CUDA: "cubin"}[driver.backend]
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/triton/runtime/driver.py", line 157, in getattr
self._initialize_obj()
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/triton/runtime/driver.py", line 154, in _initialize_obj
self._obj = self._init_fn()
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/triton/runtime/driver.py", line 187, in initialize_driver
return CudaDriver()
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/triton/runtime/driver.py", line 77, in init
self.utils = CudaUtils()
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/triton/runtime/driver.py", line 47, in init
so = _build("cuda_utils", src_path, tmpdir)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/triton/common/build.py", line 106, in _build
ret = subprocess.check_call(cc_cmd)
File "/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
torch._dynamo.exc.BackendCompilerFailed: backend='compile_fn' raised:
CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpau0cv_zl/main.c', '-O3', '-I/home/bassem.mohammed/.conda/envs/spisonet/lib/python3.10/site-packages/triton/common/../third_party/cuda/include', '-I/home/bassem.mohammed/.conda/envs/spisonet/include/python3.10', '-I/tmp/tmpau0cv_zl', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpau0cv_zl/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-L/lib64', '-L/lib', '-L/lib64', '-L/lib']' returned non-zero exit status 1.

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True`

@DanGonite57
Copy link

What version of gcc do you have? $ gcc -v

I had the same issue, and appear to have solved it by switching from gcc 4.8.5 to gcc 7.X

@bassemmohammed
Copy link
Author

I still got
[rank0]:[2024-04-17 09:37:25,757] [0/1] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored

But it it proceeded to the training and worked out. thank you!

@procyontao
Copy link
Collaborator

The "[rank0]:[2024-04-17 09:37:25,757] [0/1] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored" occurs every time but does not affect the execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants