TVM RPC will fail when allocating large arrays on an Android phone #7758

Maximilianxu · 2021-03-27T03:17:26Z

Hi
I want to deploy the BERT-base model on an Android phone. One of its params has shape (30522, 768) with dtype float32, the RPC connection will be reset each time I allocate this array.

for pk, pv in params.items():
        print(pv.shape, pv.dtype)
        weights[pk] = tvm.nd.array((np.random.uniform(size=pv.shape)).astype(pv.dtype), ctx=ctx)

The error message:

Traceback (most recent call last):
  File "tune_network_x86.py", line 483, in <module>
    tune_network()
  File "tune_network_x86.py", line 423, in tune_network
    weights[pk] = tvm.nd.array((np.random.uniform(size=pv.shape)).astype(pv.dtype), ctx=ctx)
  File "/home/zyx/workspaces/python/tvm0.8_v2/python/tvm/runtime/ndarray.py", line 516, in array
    return empty(arr.shape, arr.dtype, ctx).copyfrom(arr)
  File "/home/zyx/workspaces/python/tvm0.8_v2/python/tvm/runtime/ndarray.py", line 154, in copyfrom
    check_call(_LIB.TVMArrayCopyFromBytes(self.handle, data, nbytes))
  File "/home/zyx/workspaces/python/tvm0.8_v2/python/tvm/_ffi/base.py", line 344, in check_call
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (6) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(TVMArrayCopyFromBytes+0xe) [0x7f097dcf53ae]
  [bt] (5) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(tvm::runtime::ArrayCopyFromBytes(DLTensor*, void const*, unsigned long)+0x2c9) [0x7f097dcf52e9]
  [bt] (4) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(tvm::runtime::RPCDeviceAPI::CopyDataFromTo(void const*, unsigned long, void*, unsigned long, unsigned long, DLContext, DLContext, DLDataType, void*)+0x346) [0x7f097dd265b6]
  [bt] (3) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(tvm::runtime::RPCEndpoint::CopyToRemote(void*, unsigned long, void*, unsigned long, unsigned long, DLContext, DLDataType)+0x75d) [0x7f097dd2a4cd]
  [bt] (2) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)+0x1a5) [0x7f097dd28955]
  [bt] (1) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(tvm::runtime::SockChannel::Send(void const*, unsigned long)+0xb8) [0x7f097dd490b8]
  [bt] (0) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(+0x1bc2838) [0x7f097dd44838]
  File "/home/zyx/workspaces/python/tvm0.8_v2/src/runtime/rpc/../../support/socket.h", line 360
TVMError: Socket SockChannel::Send Error:连接被对方重设

The BERT model was imported from Torch

            model_class = transformers.BertModel
            tokenizer_class = transformers.BertTokenizer

            # Better to download them manualy
            #   https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin
            #   https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt
            #   https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json
            # Then rename to pytorch_model.bin, vocab.txt & config.json
            # weight = 'path to downloaded model dir'
            weight = '/home/zyx/.torch/hub/bert-base-uncased'
            model = model_class.from_pretrained(weight)
            model = ModelWrapper(model)
            model.eval()

            # tokenizer = tokenizer_class.from_pretrained(weight)
            # A = torch.tensor([tokenizer.encode("Here is some text to encode", add_special_tokens=True)])
            # There is 30522 words in bert-base-uncased's vocabulary list
            input_shape = [batch_size, 128]
            input_name = 'input_ids'
            input_dtype = 'int64'
            A = torch.randint(30000, input_shape)
            scripted_model = torch.jit.trace(model, [A])
            shape_list = [('input_ids', input_shape)]
            mod, params = relay.frontend.from_pytorch(scripted_model, shape_list)
            mod = optimize_bert(mod, params)

The optimize_bert function has the following passes:

    new_mod = FastSoftmax(mod)
    new_mod = ShapeConstDedup(new_mod)
    new_mod = tvm.relay.transform.EliminateCommonSubexpr()(new_mod)
    BindPass = tvm.relay.transform.function_pass(lambda fn, new_mod, ctx:
            tvm.relay.build_module.bind_params_by_name(fn, params), opt_level=1)
    new_mod = BindPass(new_mod)
    new_mod = tvm.relay.transform.FoldConstant()(new_mod)
    new_mod = tvm.relay.transform.CombineParallelBatchMatmul()(new_mod)
    # new_mod = tvm.relay.transform._ffi_api.BatchMatmulWeightTranspose()(new_mod)
    new_mod = tvm.relay.transform.FoldConstant()(new_mod)
    ret_list.append(new_mod)

I also tried the commit #5516 for ring_buffer.h, but didn't work.

It seems that it will fail when the allocated space is over about 400 MB.

The text was updated successfully, but these errors were encountered:

Maximilianxu · 2021-03-27T06:58:05Z

Traceback (most recent call last):
  File "tune_network_x86.py", line 492, in <module>
    tune_network()
  File "tune_network_x86.py", line 423, in tune_network
    tmp = tvm.nd.array(np.random.uniform(size=(30522, 7680)).astype(np.float32), ctx)
  File "/home/zyx/workspaces/python/tvm0.8_v3/python/tvm/runtime/ndarray.py", line 513, in array
    return empty(arr.shape, arr.dtype, device).copyfrom(arr)
  File "/home/zyx/workspaces/python/tvm0.8_v3/python/tvm/runtime/ndarray.py", line 152, in copyfrom
    check_call(_LIB.TVMArrayCopyFromBytes(self.handle, data, nbytes))
  File "/home/zyx/workspaces/python/tvm0.8_v3/python/tvm/_ffi/base.py", line 346, in check_call
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  31: 0xffffffffffffffff
  30: _start
  29: __libc_start_main
  28: Py_BytesMain
  27: Py_RunMain
  26: PyRun_SimpleFileExFlags
  25: PyRun_FileExFlags
  24: 0x000000000067d61e
  23: 0x000000000067d5a0
  22: PyEval_EvalCode
  21: _PyEval_EvalCodeWithName
  20: _PyEval_EvalFrameDefault
  19: _PyFunction_Vectorcall
  18: _PyEval_EvalCodeWithName
  17: _PyEval_EvalFrameDefault
  16: _PyFunction_Vectorcall
  15: _PyEval_EvalCodeWithName
  14: _PyEval_EvalFrameDefault
  13: _PyFunction_Vectorcall
  12: _PyEval_EvalFrameDefault
  11: _PyObject_MakeTpCall
  10: 0x00007f19e8e4d7df
  9: _ctypes_callproc
  8: 0x00007f19e9d82409
  7: 0x00007f19e9d82ff4
  6: TVMArrayCopyFromBytes
  5: tvm::runtime::ArrayCopyFromBytes(DLTensor*, void const*, unsigned long)
  4: tvm::runtime::RPCDeviceAPI::CopyDataFromTo(DLTensor*, DLTensor*, void*)
  3: tvm::runtime::RPCEndpoint::CopyToRemote(void*, DLTensor*, unsigned long)
  2: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)
  1: tvm::runtime::SockChannel::Send(void const*, unsigned long)
  0: tvm::support::Socket::Error(char const*)
  File "/home/zyx/workspaces/python/tvm0.8_v3/src/runtime/rpc/../../support/socket.h", line 360

The last error message was generated using the latest version and the following code.

    local_demo = True if TARGET != "android" else False
    if local_demo:
        remote = rpc.LocalSession()
    else:
        tracker_host = os.environ.get("TVM_TRACKER_HOST", "192.168.1.103")
        tracker_port = int(os.environ.get("TVM_TRACKER_PORT", 9196))
        key = "huawei"
        tracker = rpc.connect_tracker(tracker_host, tracker_port)
        # When running a heavy model, we should increase the `session_timeout`
        remote = tracker.request(key, priority=0, session_timeout=1000)
    weights = {}
    ctx = remote.cpu(0) if TARGET != "cuda" else remote.gpu(0)
    MB = 0
    from functools import reduce
    byte = lambda x: reduce(lambda a, b: a * b, x) / 1e6
    for _ in range(100):
        tmp = tvm.nd.array(np.random.uniform(size=(30522, 7680)).astype(np.float32), ctx)
        MB += byte((30522, 7680)) * 4
        print(tmp.shape)
        print("MB:", MB)
    exit()

I cannot allocate a single array with the above code.

Maximilianxu · 2021-03-27T11:59:43Z

It seems that this is due to the Android memory limit for the TVM RPC APP.

jcf94 · 2021-03-30T02:24:05Z

I've met some similar problem like this. This seems to be a system limitation?
@FrozenGene do you have any suggestion?

tqchen · 2021-04-08T18:30:50Z

Seems due to the system limitation, please feel free to followup on https://discuss.tvm.apache.org/

tqchen closed this as completed Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TVM RPC will fail when allocating large arrays on an Android phone #7758

TVM RPC will fail when allocating large arrays on an Android phone #7758

Maximilianxu commented Mar 27, 2021

Maximilianxu commented Mar 27, 2021

Maximilianxu commented Mar 27, 2021

jcf94 commented Mar 30, 2021

tqchen commented Apr 8, 2021

TVM RPC will fail when allocating large arrays on an Android phone #7758

TVM RPC will fail when allocating large arrays on an Android phone #7758

Comments

Maximilianxu commented Mar 27, 2021

Maximilianxu commented Mar 27, 2021

Maximilianxu commented Mar 27, 2021

jcf94 commented Mar 30, 2021

tqchen commented Apr 8, 2021