Skip to content
This repository has been archived by the owner on Aug 11, 2020. It is now read-only.

[WIP] Tensor shape overflow checking in Blas Engine #372

Closed
wants to merge 12 commits into from

Conversation

larroy
Copy link
Contributor

@larroy larroy commented Apr 2, 2019

Fixes apache/mxnet#14522

With this change, trying to multiply large matrices with BLAS Engine will cause the following exception in the python code instead of crashes inside blas.

Error in CustomOp.forward: Traceback (most recent call last):
  File "/Users/pllarroy/devel/mxnet/python/mxnet/operator.py", line 987, in forward_entry
    aux=tensors[4])
  File "repro.py", line 13, in forward
    c = mx.nd.batch_dot(a, b)
  File "<string>", line 59, in batch_dot
  File "/Users/pllarroy/devel/mxnet/python/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
    ctypes.byref(out_stypes)))
  File "/Users/pllarroy/devel/mxnet/python/mxnet/base.py", line 252, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [15:47:54] /Users/pllarroy/devel/mxnet/include/mshadow/./dot_engine-inl.h:352: Check failed: mult_not_overflow<int>(batch_count, m_n, &b_m_n) Result Tensor shape (100x7000x6000) is too big, will overflow gemm signed 32 bit index

Stack trace returned 10 entries:
[bt] (0) 0   libmxnet.dylib                      0x0000000112bf1b7d dmlc::StackTrace() + 877
[bt] (1) 1   libmxnet.dylib                      0x0000000112bf16d5 dmlc::LogMessageFatal::~LogMessageFatal() + 53
[bt] (2) 2   libmxnet.dylib                      0x0000000112bcdf35 dmlc::LogMessageFatal::~LogMessageFatal() + 21
[bt] (3) 3   libmxnet.dylib                      0x0000000114f2c91b mshadow::expr::BLASEngine<mshadow::cpu, float>::batched_gemm(mshadow::Stream<mshadow::cpu>*, bool, bool, long long, long long, long long, float, float const*, long long, float const*, long long, float, float*, long long, long long, float**) + 2139
[bt] (4) 4   libmxnet.dylib                      0x0000000114f21640 void mshadow::BatchGEMM<false, false, mshadow::cpu, float>(mshadow::Tensor<mshadow::cpu, 3, float>, mshadow::Tensor<mshadow::cpu, 3, float> const&, mshadow::Tensor<mshadow::cpu, 3, float> const&, float, float, mshadow::Tensor<mshadow::cpu, 1, float*>) + 4992
[bt] (5) 5   libmxnet.dylib                      0x0000000114eec939 void mxnet::op::BatchDotForward_<mshadow::cpu>(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&) + 3049
[bt] (6) 6   libmxnet.dylib                      0x0000000113100a55 void std::__1::__invoke_void_return_wrapper<void>::__call<void (*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&), nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&>(void (*&&&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&), nnvm::NodeAttrs const&&&, mxnet::OpContext const&&&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&&&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&&&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&&&) + 277
[bt] (7) 7   libmxnet.dylib                      0x0000000113100869 std::__1::__function::__func<void (*)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&), std::__1::allocator<void (*)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&)>, void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&)>::operator()(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&) + 121
[bt] (8) 8   libmxnet.dylib                      0x0000000112ebf939 std::__1::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&)>::operator()(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&) const + 217
[bt] (9) 9   libmxnet.dylib                      0x000000011306030f mxnet::imperative::PushFCompute(std::__1::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::Resource, std::__1::allocator<mxnet::Resource> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&)::'lambda'(mxnet::RunContext)::operator()(mxnet::RunContext) const + 2639

pp_C.push_back(C + i * m_n);
}
int m_k = 0;
CHECK(mult_not_overflow(m,k, &m_k));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a series call of this function cause runtime regression? Can we check this during debug mode?

Copy link
Contributor Author

@larroy larroy Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to have some performance cost, this should be measured, but a set of divisions independent of tensor shapes shouldn't be too bad, it's basically O(1) perfomance penalty. Compared to the call to gemm I would guess it's minor, but should be measured.

larroy added a commit to larroy/mxnet that referenced this pull request Apr 2, 2019
larroy added a commit to larroy/mxnet that referenced this pull request Apr 2, 2019
* Uses division method
*/
template<typename T>
inline bool mult_not_overflow_binary(T a, T b, T *result = nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use some built-in utility of gcc to check multiplication overflow: https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/Integer-Overflow-Builtins.html?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, can be an optimization if we are under gcc once the PR is working, there's some GPU problems I'm not sure if related to CI instability this weeks in GPU or a deeper problem.

larroy added a commit to larroy/mxnet that referenced this pull request Apr 5, 2019
larroy added a commit to larroy/mxnet that referenced this pull request May 21, 2019
@szha
Copy link
Member

szha commented Aug 4, 2019

This code base has been donated to the Apache MXNet project per #373, and repo is deprecated. Future development should continue in Apache MXNet.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@szha szha closed this Jul 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mx.nd.Custom conflicts with memory management
4 participants