We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi all, I am running TVM from an Ubuntu 16.04 machine and I have the tracker running on the same machine.
An aarch64 machine is connected to the tracker.
When running from the master branch, the following Python code:
remote = autotvm.measure.request_remote(device_key, device_tracker, device_port, timeout=10000) ctx = remote.cpu() a = tvm.nd.array(np.ones((5041,720)).astype('float32'), ctx) b = tvm.nd.array(np.ones((720,192)).astype('float32'), ctx)
Produces the following error on the server: free(): invalid next size (normal)
free(): invalid next size (normal)
On the host side, I get this error instead:
Traceback (most recent call last): File "tvm/python/tvm/runtime/ndarray.py", line 503, in array return empty(arr.shape, arr.dtype, ctx).copyfrom(arr) File "tvm/python/tvm/runtime/ndarray.py", line 145, in copyfrom check_call(_LIB.TVMArrayCopyFromBytes(self.handle, data, nbytes)) File "tvm/python/tvm/_ffi/base.py", line 330, in check_call raise get_last_ffi_error() tvm._ffi.base.TVMError: Traceback (most recent call last): [bt] (7) tvm/build/libtvm.so(TVMArrayCopyFromBytes+0xa) [0x7f808df5397a] [bt] (6) tvm/build/libtvm.so(tvm::runtime::ArrayCopyFromBytes(DLTensor*, void const*, unsigned long)+0x7c4) [0x7f808df537c4] [bt] (5) tvm/build/libtvm.so(tvm::runtime::RPCDeviceAPI::CopyDataFromTo(void const*, unsigned long, void*, unsigned long, unsigned long, DLContext, DLContext, DLDataType, void*)+0x42f) [0x7f808df97e7f] [bt] (4) tvm/build/libtvm.so(tvm::runtime::RPCSession::CopyToRemote(void*, unsigned long, void*, unsigned long, unsigned long, DLContext, DLDataType)+0x28f) [0x7f808df8400f] [bt] (3) tvm/build/libtvm.so(tvm::runtime::RPCSession::HandleUntilReturnEvent(tvm::runtime::TVMRetValue*, bool, tvm::runtime::PackedFunc const*)+0x13f) [0x7f808df835ef] [bt] (2) tvm/build/libtvm.so(+0xd3824c) [0x7f808df9024c] [bt] (1) tvm/build/libtvm.so(tvm::support::Socket::Error(char const*)+0x90) [0x7f808df85220] [bt] (0) tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7f808d63dff2] File "/workspace/src/runtime/rpc/../../support/socket.h", line 362 TVMError: Socket SockChannel::Recv Error:Connection reset by peer
I investigated the issue and found out that it is related to this commit: afcf939
I.e., the commit before that (i.e., 9a8ed5b) works fine.
Any thoughts on what can be causing the issue?
I am cc'ing @jmorrill who is the author of the aforementioned PR.
Thanks, Giuseppe
P.S. I also started a discuss post here: https://discuss.tvm.ai/t/rpc-error-for-large-arrays/6591
The text was updated successfully, but these errors were encountered:
Looking into the diff you point out, perhaps the most relevant one would be the change on the ring buffer
Sorry, something went wrong.
Please see if #5516 fixes the problem
Hi @tqchen , Thanks for the prompt fix! It is now working fine (it was also nice to dig a bit around the RPC part of the codebase).
I will close the issue now.
tqchen
No branches or pull requests
Hi all,
I am running TVM from an Ubuntu 16.04 machine and I have the tracker running on the same machine.
An aarch64 machine is connected to the tracker.
When running from the master branch, the following Python code:
Produces the following error on the server:
free(): invalid next size (normal)
On the host side, I get this error instead:
I investigated the issue and found out that it is related to this commit: afcf939
I.e., the commit before that (i.e., 9a8ed5b) works fine.
Any thoughts on what can be causing the issue?
I am cc'ing @jmorrill who is the author of the aforementioned PR.
Thanks,
Giuseppe
P.S. I also started a discuss post here: https://discuss.tvm.ai/t/rpc-error-for-large-arrays/6591
The text was updated successfully, but these errors were encountered: