Bug: CLBlast: Run-time error: -55 (500 is not divisible by 8) #340

quanghuy1258 · 2018-11-30T09:05:27Z

When I do Xgemm, I get this error:
CLBlast: Run-time error: -55 (500 is not divisible by 8)
Here is my source code to test:

#include "common.h"

#include "gtest/gtest.h"

#include "cl.hpp"

#include <clblast.h>

TEST(TestCLBlast, Bug_kInvalidLocalThreadsDim) {
  // Use command clinfo to select the most suitable platform and device
  int platform_id = 1;
  int device_id = 0;

  // Initializes the OpenCL platform
  auto platforms = std::vector<cl::Platform>();
  cl::Platform::get(&platforms);
  ASSERT_FALSE(platforms.size() == 0 || platform_id >= platforms.size());
  auto platform = platforms[platform_id];

  // Initializes the OpenCL device
  auto devices = std::vector<cl::Device>();
  platform.getDevices(CL_DEVICE_TYPE_ALL, &devices);
  ASSERT_FALSE(devices.size() == 0 || device_id >= devices.size());
  auto device = devices[device_id];

  // Creates the OpenCL context, queue, and an event
  auto device_as_vector = std::vector<cl::Device>{device};
  auto context = cl::Context(device_as_vector);
  auto queue = cl::CommandQueue(context, device);
  auto event = cl_event{nullptr};

  int M = 4000;
  int K = 6000;
  int N = 8000;
  std::unique_ptr<float[]> A(new float[M * K]);
  std::unique_ptr<float[]> B(new float[K * N]);
  std::unique_ptr<float[]> C(new float[M * N]);
  int count = 1;
  for (int i = 0; i < M * K; i++)
    A[i] = 1.0 / (count++);
  for (int i = 0; i < K * N; i++)
    B[i] = 1.0 / (count++);
  for (int i = 0; i < M * N; i++)
    C[i] = 0.0;

  auto device_a =
      cl::Buffer(context, CL_MEM_READ_WRITE, (M * K) * sizeof(float));
  auto device_b =
      cl::Buffer(context, CL_MEM_READ_WRITE, (K * N) * sizeof(float));
  auto device_c =
      cl::Buffer(context, CL_MEM_READ_WRITE, (M * N) * sizeof(float));
  queue.enqueueWriteBuffer(device_a, CL_TRUE, 0, (M * K) * sizeof(float),
                           A.get());
  queue.enqueueWriteBuffer(device_b, CL_TRUE, 0, (K * N) * sizeof(float),
                           B.get());
  queue.enqueueWriteBuffer(device_c, CL_TRUE, 0, (M * N) * sizeof(float),
                           C.get());

  auto queue_plain = queue();
  auto status = clblast::Gemm<float>(
      clblast::Layout::kColMajor, clblast::Transpose::kNo,
      clblast::Transpose::kNo, M, N, K, 1.0, device_a(), 0, M, device_b(), 0, K,
      0.0, device_c(), 0, M, &queue_plain, &event);

  ASSERT_TRUE(status == clblast::StatusCode::kSuccess);
  clWaitForEvents(1, &event);
  clReleaseEvent(event);
  queue.enqueueReadBuffer(device_c, CL_TRUE, 0, (M * N) * sizeof(float),
                          C.get());
  clblast::ClearCache();

  for (int i = 0; i < M * N; i++) {
    int r = i % M;
    int c = i / M;
    float val = 0.0;
    for (int j = 0; j < K; j++) {
      val += A[j * M + r] * B[c * K + j];
    }
    ASSERT_TRUE(std::abs(C[i] - val) < 1e-10);
  }
}

with common.h only has C/C++ STL libraries, gtest/gtest.h is Google Test Library.
Here is the parameters I retrieve:

VWM : 2
STRM : 0
SB : 0
VWN : 2
SA : 0
KWI : 1
NDIMB : 8
MWG : 32
KWG : 1
KREG : 2
GEMMK : 1
MDIMA : 4
MDIMC : 4
STRN : 0
NDIMC : 8
NWG : 64

I think this bug is from this file CLBlast/src/routines/level3/xgemm.cpp:

  CalculateInternalDimensions(m, n, k, db_["MWG"], db_["NWG"], db_["KWG"] * db_["KREG"],
                              a_one_i, a_two_i, b_one_i, b_two_i, c_one_i, c_two_i,
                              db_["GEMMK"]);

and

  const auto global = std::vector<size_t>{
    (c_one_i * db_["MDIMC"]) / db_["MWG"],
    (c_two_i * db_["NDIMC"]) / db_["NWG"]
  };

because you rotate c_one_i and c_two_i in CalculateInternalDimensions but do not anything there. So please check your source code here. Thank you.

The text was updated successfully, but these errors were encountered:

quanghuy1258 · 2018-11-30T10:53:34Z

I think fix this bug like that:

  const auto global = std::vector<size_t>{
    (c_one_i * db_["MDIMC"]) / ((c_one_i == m_ceiled) ? db_["MWG"] : db_["NWG"]),
    (c_two_i * db_["NDIMC"]) / ((c_two_i == n_ceiled) ? db_["NWG"] : db_["MWG"])
  };

It works for me.

CNugteren · 2018-11-30T19:30:53Z

Thanks a lot for your careful observation, code, and suggestion for solution. With your example I managed to reproduce it, and I fixed it more or less as you suggested:
#341
(this is now merged in master)

Could you have a look and verify if that indeed fixes it for you as well? Thanks!

CNugteren · 2018-12-06T19:51:05Z

Closing this as I believe it is solved, feel free to open if you think otherwise.

CNugteren added the correctness label Nov 30, 2018

CNugteren mentioned this issue Nov 30, 2018

Fixed an issue for the GEMMK == 1 kernel #341

Merged

CNugteren closed this as completed Dec 6, 2018

This was referenced May 21, 2024

Accuracy problem on Apple M1 and Intel(R) UHD Graphics 770 #542

Closed

Fix GEMMK=1 kernel #543

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: CLBlast: Run-time error: -55 (500 is not divisible by 8) #340

Bug: CLBlast: Run-time error: -55 (500 is not divisible by 8) #340

quanghuy1258 commented Nov 30, 2018

quanghuy1258 commented Nov 30, 2018

CNugteren commented Nov 30, 2018 •

edited

Loading

CNugteren commented Dec 6, 2018

Bug: CLBlast: Run-time error: -55 (500 is not divisible by 8) #340

Bug: CLBlast: Run-time error: -55 (500 is not divisible by 8) #340

Comments

quanghuy1258 commented Nov 30, 2018

quanghuy1258 commented Nov 30, 2018

CNugteren commented Nov 30, 2018 • edited Loading

CNugteren commented Dec 6, 2018

CNugteren commented Nov 30, 2018 •

edited

Loading