Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPU linux arm64 wheel? #224

Closed
lidh15 opened this issue Jun 20, 2023 · 21 comments · Fixed by #434
Closed

SPU linux arm64 wheel? #224

lidh15 opened this issue Jun 20, 2023 · 21 comments · Fixed by #434
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@lidh15
Copy link

lidh15 commented Jun 20, 2023

Issue Type

Feature Request

Modules Involved

SPU runtime

Have you reproduced the bug with SPU HEAD?

Yes

Installation Kind

binary

SPU Version

spu0.4.0

OS Platform and Distribution

Linux Kylin 10 sp3

Python Version

3.8

Compiler Version

GCC 10.2.1

Current Behavior?

No supported wheels!

Standalone code to reproduce the issue

# https://pypi.org/project/spu/#files

Relevant log output

No response

@anakinxc anakinxc self-assigned this Jun 21, 2023
@anakinxc
Copy link
Contributor

Hi @lidh15

We do not have a Linux arm64 build for now, but we can try to provide later.

Thanks

@anakinxc anakinxc added the enhancement New feature or request label Jun 21, 2023
@lidh15
Copy link
Author

lidh15 commented Jun 21, 2023

I know.
I mean is that in our roadmap? what is the expectation of later?

@anakinxc
Copy link
Contributor

I know. I mean is that in our roadmap? what is the expectation of later?

I'm actually trying now.

There are just too many missing pieces on arm64 Centos7.

Out of curious, can you provide glibc version of Linux Kylin 10 sp3? You can get it by running ldd --version

I'm seriously consider raise glibc requirement for arm64 Linux...

@lidh15
Copy link
Author

lidh15 commented Jun 21, 2023

I know. I mean is that in our roadmap? what is the expectation of later?

I'm actually trying now.

There are just too many missing pieces on arm64 Centos7.

Out of curious, can you provide glibc version of Linux Kylin 10 sp3? You can get it by running ldd --version

I'm seriously consider raise glibc requirement for arm64 Linux...

kylin 10 is basically a wrapped centos 8, so glibc is 2.28 and libstdc++ is 6.0.24, raise it from centos 7 to centos 8 looks good to me.

@anakinxc
Copy link
Contributor

I know. I mean is that in our roadmap? what is the expectation of later?

I'm actually trying now.
There are just too many missing pieces on arm64 Centos7.
Out of curious, can you provide glibc version of Linux Kylin 10 sp3? You can get it by running ldd --version
I'm seriously consider raise glibc requirement for arm64 Linux...

kylin 10 is basically a wrapped centos 8, so glibc is 2.28 and libstdc++ is 6.0.24, raise it from centos 7 to centos 8 looks good to me.

stay tune, won't be a long wait :D

@anakinxc
Copy link
Contributor

anakinxc commented Jun 25, 2023

Hi @lidh15

Just got an experimental build, unfortunately we do not have a native Linux arm machine to test everything, so there might be some surprises.

Can you try this build and let me know if anything does not work properly on your end?

link here

@lidh15
Copy link
Author

lidh15 commented Jun 25, 2023

Hi @lidh15

Just got an experimental build, unfortunately we do not have a native Linux arm machine to test everything, so there might be some surprises.

Can you try this build and let me know if anything does not work properly on your end?

link here

Okay I'm working on it, but the first thing is that jaxlib doesn't provide with arm64 wheel.
Though it is said that we can build from source to use jaxlib, it is still a big obstacle for spu users...

@anakinxc
Copy link
Contributor

Hi @lidh15
Just got an experimental build, unfortunately we do not have a native Linux arm machine to test everything, so there might be some surprises.
Can you try this build and let me know if anything does not work properly on your end?
link here

Okay I'm working on it, but the first thing is that jaxlib doesn't provide with arm64 wheel. Though it is said that we can build from source to use jaxlib, it is still a big obstacle for spu users...

Jax has an open issue here

Linux arm still has a long way to go, our ci provider circleci has no Linux arm docker executor as well...

@lidh15
Copy link
Author

lidh15 commented Jun 28, 2023

we built arm64 jax 0.4.8 and run spu based on that, unfortunately, yacl brpc is not working:

stacktrace:
#0 yacl::link::Context::ConnectToMesh()+0xfffbd6d4e7c4
#1 spu::BindLink()::{lambda()#18}::operator()()+0xfffbd3839c00
#2 pybind11::detail::argument_loader<>::call_impl<>()+0xfffbd3847b04
#3 _ZNO8pybind116detail15argument_loaderIJRKN4yacl4link11ContextDescEmEE4callISt10shared_ptrINS3_7ContextEENS0_9void_typeERZN3spu8BindLinkERNS_7module_EEUlS6_mE16_EENSt9enable_ifIXntsrSt7is_voidIT_E5valueESK_E4typeEOT1_+0xfffbd384652c
#4 pybind11::cpp_function::initialize<>()::{lambda()#3}::operator()()+0xfffbd384380c
#5 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0xfffbd38438a4
#6 pybind11::cpp_function::dispatcher()+0xfffbd385d9b4
#7 PyCFunction_Call+0xfffc13e82ea4
#8 __Pyx_PyObject_Call+0xfffbdba42348
#9 __pyx_pf_19unbalanced_ecdh_psi_15EcdhOprfPsiBase_2setup_link+0xfffbdba25e84
#10 __pyx_pw_19unbalanced_ecdh_psi_15EcdhOprfPsiBase_3setup_link+0xfffbdba22850
#11 __Pyx_CyFunction_CallMethod+0xfffc06a53550
#12 __Pyx_CyFunction_Call+0xfffc06a53634
#13 __Pyx_CyFunction_CallAsMethod+0xfffc06a537ac
#14 __Pyx_PyObject_Call+0xfffbdba42348
#15 __Pyx__PyObject_CallOneArg+0xfffbdba435bc

shall I report this issue to YACL repo?

@anakinxc
Copy link
Contributor

we built arm64 jax 0.4.8 and run spu based on that, unfortunately, yacl brpc is not working:

stacktrace:
#0 yacl::link::Context::ConnectToMesh()+0xfffbd6d4e7c4
#1 spu::BindLink()::{lambda()#18}::operator()()+0xfffbd3839c00
#2 pybind11::detail::argument_loader<>::call_impl<>()+0xfffbd3847b04
#3 _ZNO8pybind116detail15argument_loaderIJRKN4yacl4link11ContextDescEmEE4callISt10shared_ptrINS3_7ContextEENS0_9void_typeERZN3spu8BindLinkERNS_7module_EEUlS6_mE16_EENSt9enable_ifIXntsrSt7is_voidIT_E5valueESK_E4typeEOT1_+0xfffbd384652c
#4 pybind11::cpp_function::initialize<>()::{lambda()#3}::operator()()+0xfffbd384380c
#5 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0xfffbd38438a4
#6 pybind11::cpp_function::dispatcher()+0xfffbd385d9b4
#7 PyCFunction_Call+0xfffc13e82ea4
#8 __Pyx_PyObject_Call+0xfffbdba42348
#9 __pyx_pf_19unbalanced_ecdh_psi_15EcdhOprfPsiBase_2setup_link+0xfffbdba25e84
#10 __pyx_pw_19unbalanced_ecdh_psi_15EcdhOprfPsiBase_3setup_link+0xfffbdba22850
#11 __Pyx_CyFunction_CallMethod+0xfffc06a53550
#12 __Pyx_CyFunction_Call+0xfffc06a53634
#13 __Pyx_CyFunction_CallAsMethod+0xfffc06a537ac
#14 __Pyx_PyObject_Call+0xfffbdba42348
#15 __Pyx__PyObject_CallOneArg+0xfffbdba435bc

shall I report this issue to YACL repo?

Can you provide repro steps?

@lidh15
Copy link
Author

lidh15 commented Jun 28, 2023

well, it was my fault that I didn't configure the link address correctly, now the story is:

terminate called after throwing an instance of 'yacl::EnforceNotMet'
  what():  [Enforce fail at libspu/psi/core/ecdh_oprf/basic_ecdh_oprf.cc:262] (status). fourq ecc_mul error, status = false
Stacktrace:
#0 spu::psi::(anonymous namespace)::FourQPointMul()+0xfffbc63dc60c
#1 spu::psi::FourQBasicEcdhOprfServer::SimpleEvaluate[abi:cxx11]()+0xfffbc63dc838
#2 spu::psi::EcdhOprfPsiServer::FullEvaluate()+0xfffbc63d216c
#3 (unknown)+0xfffbc2fc5930
#4 (unknown)+0xfffc038388cc

what I'm doing is like:

conf = spu.psi.BucketPsiConfig(
    psi_type=spu.psi.PsiType.Value('ECDH_OPRF_UB_PSI_2PC_OFFLINE'),
    bucket_size=int(5e7),
    curve_type=spu.psi.CurveType.CURVE_FOURQ,
    # and some data IO configs...
)
spu.psi.bucket_psi(some_link, conf)

@anakinxc
Copy link
Contributor

ecc_mul

thanks for repro steps, will take a look :D

@lidh15
Copy link
Author

lidh15 commented Jun 28, 2023

ecc_mul

thanks for repro steps, will take a look :D

I looked into microsoft 4qlib and I found tons of stuffs around ARM... there are many implementations for ecc_mul and I'm not sure about which one was called.

maybe there are other curve choices for ECDH_OPRF_UB_PSI_2PC_OFFLINE?

@anakinxc
Copy link
Contributor

ecc_mul

thanks for repro steps, will take a look :D

I looked into microsoft 4qlib and I found tons of stuffs around ARM... there are many implementations for ecc_mul and I'm not sure about which one was called.

maybe there are other curve choices for ECDH_OPRF_UB_PSI_2PC_OFFLINE?

SPU is using fourq provided by apsi

@lidh15
Copy link
Author

lidh15 commented Jun 28, 2023

I found these guys here, are they all useable in bucket_psi? how are they different from each other in speed?
image

@lidh15
Copy link
Author

lidh15 commented Jun 28, 2023

I tried other protocols and it seems okay, but there is another error:

error channel.h:~ChannelBase:112 ChannelBase destructor is called before WaitLinkTaskFinish, try stop send thread

Have I configured something wrong? I think the same code is working well on x86_64.

@anakinxc
Copy link
Contributor

I found these guys here, are they all useable in bucket_psi? how are they different from each other in speed? image

@zhanglei486 Can you take a look?

@anakinxc
Copy link
Contributor

I tried other protocols and it seems okay, but there is another error:

error channel.h:~ChannelBase:112 ChannelBase destructor is called before WaitLinkTaskFinish, try stop send thread

Have I configured something wrong? I think the same code is working well on x86_64.

@warriorpaw any idea? Looks like a brpc bug?

@warriorpaw
Copy link

I tried other protocols and it seems okay, but there is another error:

error channel.h:~ChannelBase:112 ChannelBase destructor is called before WaitLinkTaskFinish, try stop send thread

Have I configured something wrong? I think the same code is working well on x86_64.

Ignore this error for now, it should be a warn.

And this warn is due to the change in the behavior of the internal network interface: stop_link needs to be called before destroy it. But it should not cause any trouble, and we will fix it.

@lidh15
Copy link
Author

lidh15 commented Jun 29, 2023

cool! then I think the only concern in our arm64 PSI use case is FourQ not working

@zhanglei486
Copy link
Contributor

zhanglei486 commented Jun 29, 2023

I found these guys here, are they all useable in bucket_psi? how are they different from each other in speed? image

@zhanglei486 Can you take a look?

benchmark of ecdh-psi using 25519 curve, please reference:
(https://www.secretflow.org.cn/docs/spu/latest/en-US/development/psi)

on Intel Xeon Gen2 platform , ecdh-psi using fourq is the fast, fourq > 25519 >> sm2 = 256k1

on Intel Xeon Gen3(support AVX512-IFMA ) platform , ecdh-psi using 25519 is the fast, 25519 > fourq >> sm2 = 256k1

@anakinxc anakinxc added the help wanted Extra attention is needed label Aug 15, 2023
anakinxc added a commit that referenced this issue Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants