Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AUTOCUT] Distribution Build Failed for k-NN-2.9.0 (Unittest error on LINUX and FAISS compilation on Windows) #975

Closed
opensearch-ci-bot opened this issue Jul 12, 2023 · 10 comments
Assignees

Comments

@opensearch-ci-bot
Copy link
Collaborator

Received Error: Error building k-NN, retry with: ./build.sh manifests/2.9.0/opensearch-2.9.0.yml --component k-NN.
The distribution build for k-NN has failed for version: 2.9.0.
Please see build log at https://build.ci.opensearch.org/job/distribution-build-opensearch/8108/consoleFull

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jul 12, 2023

As of now there are two issues observed after debugging with @navneet1v:

  1. Unit Test failed on LINUX after compilation of nmslib/faisslib/java. We are able to reproduce it on the exact CentOS7 image we used in Jenkins and @navneet1v is trying to fix the issues.
    docker run -it -d -e 'JAVA_HOME=/opt/java/openjdk-17' -u 1000 opensearchstaging/ci-runner:ci-runner-centos7-opensearch-build-v3 bash

Logs: linux-centos7-knn.log


  1. The faiss lib has recently been updated to a newer version here:

It has since having issues compiling on Windows with these errors:
Logs: linux-windows-knn.log

Thanks.

cc: @bbarani @prudhvigodithi

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jul 13, 2023

We have figured out the reason the above 1st issue is failing.

The build image of centos7 is having gcc version of 4.8.5, while the newer version of faiss lib requires a higher version of gcc that is at least 4.9+.

After installing devtoolset-8 on centos7 manually we are able to successfully build and test knn now:

BUILD SUCCESSFUL in 6m 41s
71 actionable tasks: 68 executed, 3 up-to-date


gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Next step would need to update the build image of OpenSearch to include higher version of gcc.

More details on the issue per @navneet1v

The failure was happening because the downstream dependencie(Faiss) added new functions(for regex matching). These functions are not present in the version of GCC(4.8.5) present in the build image of Jenkins(CentOS7).

@peterzhuamazon peterzhuamazon changed the title [AUTOCUT] Distribution Build Failed for k-NN-2.9.0 [AUTOCUT] Distribution Build Failed for k-NN-2.9.0 (Unittest error on LINUX and FAISS compilation on Windows) Jul 13, 2023
@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jul 13, 2023

Update: @navneet1v has sent the new PR of #980 to fix the 2nd Windows issue.
Thanks.

@peterzhuamazon
Copy link
Member

We have seen specific segfault failure on arm64 with gcc8, and since change to gcc7 and the compile pass:

gcc8

[ 70%] Building CXX object external/faiss/faiss/CMakeFiles/faiss.dir/impl/pq4_fast_scan_search_1.cpp.o

during RTL pass: expand

/tmp/tmpcbgylmu4/k-NN/jni/external/faiss/faiss/impl/pq4_fast_scan_search_1.cpp: In function 'void faiss::{anonymous}::kernel_accumulate_block(int, const uint8_t*, const uint8_t*, ResultHandler&, const Scaler&) [with int NQ = 2; int BB = 1; ResultHandler = faiss::simd_result_handlers::FixedStorageHandler<2, 2>; Scaler = faiss::DummyScaler]':

/tmp/tmpcbgylmu4/k-NN/jni/external/faiss/faiss/impl/pq4_fast_scan_search_1.cpp:60:65: internal compiler error: Segmentation fault

             simd32uint8 chi = simd32uint8(simd16uint16(c) >> 4) & mask;

                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~

Please submit a full bug report,

with preprocessed source if appropriate.

See <http://bugzilla.redhat.com/bugzilla> for instructions.

Preprocessed source stored into /tmp/cc6iitn1.out file, please attach this to your bugreport.

make[3]: *** [external/faiss/faiss/CMakeFiles/faiss.dir/build.make:972: external/faiss/faiss/CMakeFiles/faiss.dir/impl/pq4_fast_scan_search_1.cpp.o] Error 1

make[2]: *** [CMakeFiles/Makefile2:609: external/faiss/faiss/CMakeFiles/faiss.dir/all] Error 2

make[1]: *** [CMakeFiles/Makefile2:374: CMakeFiles/opensearchknn_faiss.dir/rule] Error 2

make: *** [Makefile:215: opensearchknn_faiss] Error 2

gcc7


Building CXX object external/faiss/faiss/CMakeFiles/faiss.dir/impl/pq4_fast_scan.cpp.o
[ 70%] Building CXX object external/faiss/faiss/CMakeFiles/faiss.dir/impl/pq4_fast_scan_search_1.cpp.o
[ 70%] Building CXX object external/faiss/faiss/CMakeFiles/faiss.dir/impl/pq4_fast_scan_search_qbs.cpp.o

It seems like a sudden increase of memory of this file compilation yet observe the crash on arm64 with gcc8 specifically.

@junqiu-lei
Copy link
Member

Closing this issue as it got unblocked.

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Aug 1, 2023

I confirms there is no gcc7 on the rockylinux repos at all, even epel.
That means we either get the compilation runs on gcc8 as that is the default of rockylinux8.
Or we need to upgrade the gcc to even higher like gcc 9/10/11/12 like on Windows go for latest.

Related here:

Thanks.

@peterzhuamazon
Copy link
Member

We have issues on the 8.3.1 version on centos7 arm64, will try 8.5.0 version on rockylinux8.
Thanks.

@peterzhuamazon
Copy link
Member

We good on Rockylinux8 and the gcc 8.5.0 on arm64.
Thanks.

@peterzhuamazon
Copy link
Member

This issue should be resolved for now until we check again in 2.10.0 to see if it still happens due to nodejs18 upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants