-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MACE : Lammps with GPU support error #238
Comments
First I have trained using develop branch . |
It’s hard for us to help you from limited information. What is your LAMMPS
script? How are you launching LAMMPS?
It looks like maybe you are using 4 MPI processes. Are you also using
no_domain_decomposition? Those would be incompatible.
…On Fri, 24 Nov 2023 at 09:58, Amitcuhp ***@***.***> wrote:
First I have trained using develop branch .
After that converted the model by create-lammps.py script.
Then I am not able to use it.
—
Reply to this email directly, view it on GitHub
<#238 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACXHHTVYK3CYXEEJG7EQ4PDYGCYYZAVCNFSM6AAAAAA7Y6J6HGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRVG44DOOJYHE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Oh sorry sir, got it. It is working fine |
Hi @Amitcuhp : Can you please guide me the steps that you followed for installation of LAMMPS for MACE. |
(Here the libtorch-cxx version you have to look for your machine, for my machine the cu116 worked)
|
Thanks for the prompt reply @Amitcuhp. I am still facing an error with this installation. I followed the steps that you have listed. I face issues during the make -j 12 command. /usr/bin/ld: /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cuda.so: undefined reference to Thanks again. |
Hi @Amitcuhp , I was able to compile using your instructions thanks a lot. |
Installed gpu version of lammps succesfully. When trying lammps with gpu this error came:-
CUDA found, setting device type to torch::kCUDA.
Loading MACE model from "MACE_model_swa.model-lammps.pt" ...Loading MACE model from "MACE_model_swa.model-lammps.pt" ...Loading MACE model from "MACE_model_swa.model-lammps.pt" ...Loading MACE model from "MACE_model_swa.model-lammps.pt" ...Exception: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:31 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fdd75152457 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fdd7511c3ec in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(std::string const&, std::string const&, int, bool) + 0xb4 (0x7fdd75041c64 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libc10_cuda.so)
frame #3: + 0x222cc (0x7fdd7501d2cc in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libc10_cuda.so)
frame #4: + 0x2f15bd4 (0x7fdd0218dbd4 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cuda_cu.so)
frame #5: + 0x2d26936 (0x7fdd01f9e936 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cuda_cu.so)
frame #6: + 0x2d26a70 (0x7fdd01f9ea70 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cuda_cu.so)
frame #7: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRefc10::SymInt, c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional) + 0xf3 (0x7fdd2d9dc713 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #8: + 0x29f6efe (0x7fdd2dceeefe in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #9: at::_ops::empty_strided::call(c10::ArrayRefc10::SymInt, c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional) + 0x1bb (0x7fdd2da1f9cb in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #10: + 0x1c5eaa7 (0x7fdd2cf56aa7 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #11: at::native::_to_copy(at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x1823 (0x7fdd2d2bddc3 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #12: + 0x2bb171b (0x7fdd2dea971b in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #13: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0xf5 (0x7fdd2d6f7005 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #14: + 0x29f6c73 (0x7fdd2dceec73 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #15: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0xf5 (0x7fdd2d6f7005 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #16: + 0x3db99bb (0x7fdd2f0b19bb in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #17: + 0x3db9e2e (0x7fdd2f0b1e2e in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #18: at::_ops::_to_copy::call(at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x1f9 (0x7fdd2d777569 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #19: at::native::to(at::Tensor const&, c10::Device, c10::ScalarType, bool, bool, c10::optionalc10::MemoryFormat) + 0xc7 (0x7fdd2d2b60f7 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #20: + 0x2d6fb59 (0x7fdd2e067b59 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #21: at::_ops::to_device::call(at::Tensor const&, c10::Device, c10::ScalarType, bool, bool, c10::optionalc10::MemoryFormat) + 0x1ba (0x7fdd2d8dea6a in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #22: torch::jit::Unpickler::readInstruction() + 0x1af0 (0x7fdd300b42a0 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #23: torch::jit::Unpickler::run() + 0x90 (0x7fdd300b5080 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #24: torch::jit::Unpickler::parse_ivalue() + 0x18 (0x7fdd300b51d8 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #25: torch::jit::readArchiveAndTensors(std::string const&, std::string const&, std::string const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_typec10::ivalue::Object > (c10::StrongTypePtr, c10::IValue)> >, c10::optionalc10::Device, caffe2::serialize::PyTorchStreamReader&, c10::Type::SingletonOrSharedTypePtrc10::Type (*)(std::string const&), std::shared_ptrtorch::jit::DeserializationStorageContext) + 0x45a (0x7fdd300726fa in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #26: + 0x4d65297 (0x7fdd3005d297 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #27: + 0x4d67fdb (0x7fdd3005ffdb in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #28: torch::jit::import_ir_module(std::shared_ptrtorch::jit::CompilationUnit, std::string const&, c10::optionalc10::Device, std::unordered_map<std::string, std::string, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, std::string> > >&) + 0x3a2 (0x7fdd30061a12 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #29: torch::jit::import_ir_module(std::shared_ptrtorch::jit::CompilationUnit, std::string const&, c10::optionalc10::Device) + 0x7b (0x7fdd3006212b in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #30: torch::jit::load(std::string const&, c10::optionalc10::Device) + 0xa5 (0x7fdd30062205 in /home/Raman/lammps-mace-gpu/libtorch-gpu/lib/libtorch_cpu.so)
frame #31: LAMMPS_NS::PairMACE::coeff(int, char**) + 0x115 (0x7fdd6ecfc625 in /home/Raman/lammps-mace-gpu/lammps/build-kokkos-cuda/liblammps.so.0)
frame #32: LAMMPS_NS::Input::pair_coeff() + 0x1e3 (0x7fdd6eaf4b73 in /home/Raman/lammps-mace-gpu/lammps/build-kokkos-cuda/liblammps.so.0)
frame #33: LAMMPS_NS::Input::execute_command() + 0x76e (0x7fdd6eafa3de in /home/Raman/lammps-mace-gpu/lammps/build-kokkos-cuda/liblammps.so.0)
frame #34: LAMMPS_NS::Input::file() + 0x155 (0x7fdd6eafacb5 in /home/Raman/lammps-mace-gpu/lammps/build-kokkos-cuda/liblammps.so.0)
frame #35: /home/Raman/lammps-mace-gpu/lammps/build-kokkos-cuda/lmp() [0x404528]
frame #36: __libc_start_main + 0xf5 (0x7fdd6d4ef555 in /lib64/libc.so.6)
frame #37: /home/Raman/lammps-mace-gpu/lammps/build-kokkos-cuda/lmp() [0x40467e]
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2
The text was updated successfully, but these errors were encountered: