Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TVM RPC will fail when allocating large arrays on an Android phone #7758

Closed
Maximilianxu opened this issue Mar 27, 2021 · 4 comments
Closed

Comments

@Maximilianxu
Copy link
Contributor

Hi
I want to deploy the BERT-base model on an Android phone. One of its params has shape (30522, 768) with dtype float32, the RPC connection will be reset each time I allocate this array.

for pk, pv in params.items():
        print(pv.shape, pv.dtype)
        weights[pk] = tvm.nd.array((np.random.uniform(size=pv.shape)).astype(pv.dtype), ctx=ctx)

The error message:

Traceback (most recent call last):
  File "tune_network_x86.py", line 483, in <module>
    tune_network()
  File "tune_network_x86.py", line 423, in tune_network
    weights[pk] = tvm.nd.array((np.random.uniform(size=pv.shape)).astype(pv.dtype), ctx=ctx)
  File "/home/zyx/workspaces/python/tvm0.8_v2/python/tvm/runtime/ndarray.py", line 516, in array
    return empty(arr.shape, arr.dtype, ctx).copyfrom(arr)
  File "/home/zyx/workspaces/python/tvm0.8_v2/python/tvm/runtime/ndarray.py", line 154, in copyfrom
    check_call(_LIB.TVMArrayCopyFromBytes(self.handle, data, nbytes))
  File "/home/zyx/workspaces/python/tvm0.8_v2/python/tvm/_ffi/base.py", line 344, in check_call
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (6) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(TVMArrayCopyFromBytes+0xe) [0x7f097dcf53ae]
  [bt] (5) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(tvm::runtime::ArrayCopyFromBytes(DLTensor*, void const*, unsigned long)+0x2c9) [0x7f097dcf52e9]
  [bt] (4) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(tvm::runtime::RPCDeviceAPI::CopyDataFromTo(void const*, unsigned long, void*, unsigned long, unsigned long, DLContext, DLContext, DLDataType, void*)+0x346) [0x7f097dd265b6]
  [bt] (3) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(tvm::runtime::RPCEndpoint::CopyToRemote(void*, unsigned long, void*, unsigned long, unsigned long, DLContext, DLDataType)+0x75d) [0x7f097dd2a4cd]
  [bt] (2) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)+0x1a5) [0x7f097dd28955]
  [bt] (1) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(tvm::runtime::SockChannel::Send(void const*, unsigned long)+0xb8) [0x7f097dd490b8]
  [bt] (0) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(+0x1bc2838) [0x7f097dd44838]
  File "/home/zyx/workspaces/python/tvm0.8_v2/src/runtime/rpc/../../support/socket.h", line 360
TVMError: Socket SockChannel::Send Error:连接被对方重设

The BERT model was imported from Torch

            model_class = transformers.BertModel
            tokenizer_class = transformers.BertTokenizer

            # Better to download them manualy
            #   https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin
            #   https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt
            #   https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json
            # Then rename to pytorch_model.bin, vocab.txt & config.json
            # weight = 'path to downloaded model dir'
            weight = '/home/zyx/.torch/hub/bert-base-uncased'
            model = model_class.from_pretrained(weight)
            model = ModelWrapper(model)
            model.eval()

            # tokenizer = tokenizer_class.from_pretrained(weight)
            # A = torch.tensor([tokenizer.encode("Here is some text to encode", add_special_tokens=True)])
            # There is 30522 words in bert-base-uncased's vocabulary list
            input_shape = [batch_size, 128]
            input_name = 'input_ids'
            input_dtype = 'int64'
            A = torch.randint(30000, input_shape)
            scripted_model = torch.jit.trace(model, [A])
            shape_list = [('input_ids', input_shape)]
            mod, params = relay.frontend.from_pytorch(scripted_model, shape_list)
            mod = optimize_bert(mod, params)

The optimize_bert function has the following passes:

    new_mod = FastSoftmax(mod)
    new_mod = ShapeConstDedup(new_mod)
    new_mod = tvm.relay.transform.EliminateCommonSubexpr()(new_mod)
    BindPass = tvm.relay.transform.function_pass(lambda fn, new_mod, ctx:
            tvm.relay.build_module.bind_params_by_name(fn, params), opt_level=1)
    new_mod = BindPass(new_mod)
    new_mod = tvm.relay.transform.FoldConstant()(new_mod)
    new_mod = tvm.relay.transform.CombineParallelBatchMatmul()(new_mod)
    # new_mod = tvm.relay.transform._ffi_api.BatchMatmulWeightTranspose()(new_mod)
    new_mod = tvm.relay.transform.FoldConstant()(new_mod)
    ret_list.append(new_mod)

I also tried the commit #5516 for ring_buffer.h, but didn't work.

It seems that it will fail when the allocated space is over about 400 MB.

@Maximilianxu
Copy link
Contributor Author

Traceback (most recent call last):
  File "tune_network_x86.py", line 492, in <module>
    tune_network()
  File "tune_network_x86.py", line 423, in tune_network
    tmp = tvm.nd.array(np.random.uniform(size=(30522, 7680)).astype(np.float32), ctx)
  File "/home/zyx/workspaces/python/tvm0.8_v3/python/tvm/runtime/ndarray.py", line 513, in array
    return empty(arr.shape, arr.dtype, device).copyfrom(arr)
  File "/home/zyx/workspaces/python/tvm0.8_v3/python/tvm/runtime/ndarray.py", line 152, in copyfrom
    check_call(_LIB.TVMArrayCopyFromBytes(self.handle, data, nbytes))
  File "/home/zyx/workspaces/python/tvm0.8_v3/python/tvm/_ffi/base.py", line 346, in check_call
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  31: 0xffffffffffffffff
  30: _start
  29: __libc_start_main
  28: Py_BytesMain
  27: Py_RunMain
  26: PyRun_SimpleFileExFlags
  25: PyRun_FileExFlags
  24: 0x000000000067d61e
  23: 0x000000000067d5a0
  22: PyEval_EvalCode
  21: _PyEval_EvalCodeWithName
  20: _PyEval_EvalFrameDefault
  19: _PyFunction_Vectorcall
  18: _PyEval_EvalCodeWithName
  17: _PyEval_EvalFrameDefault
  16: _PyFunction_Vectorcall
  15: _PyEval_EvalCodeWithName
  14: _PyEval_EvalFrameDefault
  13: _PyFunction_Vectorcall
  12: _PyEval_EvalFrameDefault
  11: _PyObject_MakeTpCall
  10: 0x00007f19e8e4d7df
  9: _ctypes_callproc
  8: 0x00007f19e9d82409
  7: 0x00007f19e9d82ff4
  6: TVMArrayCopyFromBytes
  5: tvm::runtime::ArrayCopyFromBytes(DLTensor*, void const*, unsigned long)
  4: tvm::runtime::RPCDeviceAPI::CopyDataFromTo(DLTensor*, DLTensor*, void*)
  3: tvm::runtime::RPCEndpoint::CopyToRemote(void*, DLTensor*, unsigned long)
  2: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)
  1: tvm::runtime::SockChannel::Send(void const*, unsigned long)
  0: tvm::support::Socket::Error(char const*)
  File "/home/zyx/workspaces/python/tvm0.8_v3/src/runtime/rpc/../../support/socket.h", line 360

The last error message was generated using the latest version and the following code.

    local_demo = True if TARGET != "android" else False
    if local_demo:
        remote = rpc.LocalSession()
    else:
        tracker_host = os.environ.get("TVM_TRACKER_HOST", "192.168.1.103")
        tracker_port = int(os.environ.get("TVM_TRACKER_PORT", 9196))
        key = "huawei"
        tracker = rpc.connect_tracker(tracker_host, tracker_port)
        # When running a heavy model, we should increase the `session_timeout`
        remote = tracker.request(key, priority=0, session_timeout=1000)
    weights = {}
    ctx = remote.cpu(0) if TARGET != "cuda" else remote.gpu(0)
    MB = 0
    from functools import reduce
    byte = lambda x: reduce(lambda a, b: a * b, x) / 1e6
    for _ in range(100):
        tmp = tvm.nd.array(np.random.uniform(size=(30522, 7680)).astype(np.float32), ctx)
        MB += byte((30522, 7680)) * 4
        print(tmp.shape)
        print("MB:", MB)
    exit()

I cannot allocate a single array with the above code.

@Maximilianxu
Copy link
Contributor Author

It seems that this is due to the Android memory limit for the TVM RPC APP.

@jcf94
Copy link
Contributor

jcf94 commented Mar 30, 2021

I've met some similar problem like this. This seems to be a system limitation?
@FrozenGene do you have any suggestion?

@tqchen
Copy link
Member

tqchen commented Apr 8, 2021

Seems due to the system limitation, please feel free to followup on https://discuss.tvm.apache.org/

@tqchen tqchen closed this as completed Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants