Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure storage linux support for https on non-RHEL distribution #514

Closed
Tracked by #740
phoebusm opened this issue Jun 23, 2023 · 6 comments · Fixed by #867
Closed
Tracked by #740

Azure storage linux support for https on non-RHEL distribution #514

phoebusm opened this issue Jun 23, 2023 · 6 comments · Fixed by #867
Assignees
Labels
enhancement New feature or request

Comments

@phoebusm
Copy link
Collaborator

phoebusm commented Jun 23, 2023

Related ticket of #284

Azure C++ sdk depends on libcurl on Linux, which has "burnt" the ssl certificate path during compile time, as libcurl is statically linked in the project.
As the wheel is built in the manylinux image, the certificate path is /etc/pki/tls/certs/ca-bundle.crt.
If the certificate cannot be found there, https connection to Azure cannot be made.
Azure c++ library does allow us to pass the ca cert path during runtime. However, we will need to figure out a smart way to get the directory of the ssl during runtime, so that no maintenance required and all linux distribution can be supported, or, we can
dynamically linked the libcurl library.

@phoebusm phoebusm added the enhancement New feature or request label Jun 23, 2023
@phoebusm phoebusm self-assigned this Jun 23, 2023
@phoebusm
Copy link
Collaborator Author

phoebusm commented Jun 23, 2023

Possible solution:

  1. Dynamically linked OS's libcurl library
  2. $(openssl version -d | grep -oh "/.*[^\"]") to get the default ssl path
  3. strace curl https://www.google.com |& grep open
  4. Hardcoded
  5. Pass from python? (curl?)
"/etc/ssl/certs/ca-certificates.crt",                // Debian/Ubuntu/Gentoo etc.
"/etc/pki/tls/certs/ca-bundle.crt",                  // Fedora/RHEL 6
"/etc/ssl/ca-bundle.pem",                            // OpenSUSE
"/etc/pki/tls/cacert.pem",                           // OpenELEC
"/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem", // CentOS/RHEL 7
"/etc/ssl/cert.pem",                                 // Alpine Linux

@phoebusm
Copy link
Collaborator Author

phoebusm commented Jun 26, 2023

Test shows that can bypass this issue by manually giving path of ca cert to Azure SDK. However, there is report that it may not work: Azure/azure-sdk-for-cpp#4645

@poodlewars
Copy link
Collaborator

Mainline idea: Dynamically link cUrl
Last resort: Check how Pandas / other Python projects load stuff from S3 and put that in the Python layer

@poodlewars poodlewars added this to the Release 2 milestone Jul 31, 2023
@phoebusm
Copy link
Collaborator Author

phoebusm commented Aug 9, 2023

  • Current state
    Ca cert path is set to be "/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem" by default. User can manually assign other location if necessary
  • Attempts to dynimcally link
    Two attempts:
  1. Using Vcpkg with dynamically linking option
  2. Use OS-installed libcurl-devel library for compilation
  • Why we are going to statically link
    Because difficulties are encountered in the above attempts:
  1. With manifest mode, Vcpkg put hash of the port to the naming of the output, e.g. libcurl-a5d13e77.so.4. The usual naming for libcurl library is libcurl.so.4. The difference in name makes user has to manually linked the os-installed library, in order to use it.
  2. The manylinux container used for GitHub action has only libcurl v7.29. However, AWS SDK, requires libcurl >v7.5. To fulfil the requirement, a separate compilation for libcurl is needed to be introduced. It is a complicated process and it undermines of having Vcpkg as the centralised source control.
  • How we’re going to statically link
    For the linking part, nothing will be changed.
  • Plan for conda
    At this moment, we are not sure how we are going to support azure sdk with condo. It will be picked up again once it is finalised.

@phoebusm phoebusm linked a pull request Aug 14, 2023 that will close this issue
5 tasks
@mehertz mehertz modified the milestones: 2.0.0, Release 3 Aug 14, 2023
@phoebusm phoebusm removed a link to a pull request Aug 15, 2023
5 tasks
@phoebusm
Copy link
Collaborator Author

Plan:
(One with higher order will be attempted first)
Discussion thread: https://arcticdb.slack.com/archives/C0553TEJ63F/p1692017895576129

  1. Follow the Arrow approach
  • Advantage: Probably the cleanest solution
  • Disadvantage: Need to adapt Azure SDK to allow capath
  • Needs to work on Conda too, so contributing a patch to Azure SDK probably easier than vcpkg overlay, and is the "right" thing to do anyway
  • Disadvantage: Assumes python, so same language binding disadvantage of the certifi way
  • Ref: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux apache/arrow#7580
  • As the solution requires a patch to vcpkg, if the patch cannot be merged quickly/easily, may need to drop this approach
  1. A patch to vcpkg to dynamically linked the libcurl in manifest mode
  • Advantage: Should work correctly out of the box for users
  • Advantage: Statically linking OpenSSL is a security issue that we should fix
  • Disadvantage: It's prohibitively difficult with vcpkg
  • Disadvantage: ArcticDB perf is sensitive to the exact cUrl version used eg Kestrel issue with the very old libcurl
  • To remove the checksum in the naming of the port
  • If it is proved to be too hard to do so, may need to drop this approach
  1. Use certifi to provide ca cert
  • Advantage: Will probably work, cross-OS
  • Advantage: The override is not too weird, cf REQUESTS_CA_BUNDLE
  • Disadvantage: Other language bindings will need to handle this complexity too
  • Disadvantage: Corporate users might be upset that we silently use a different .pem to their corporate one
  • It is done already (branch feature/auto select ca cert path)

@phoebusm
Copy link
Collaborator Author

#867 is raised for the Azure sdk overlay and corresponding ArcticDB changes for plan 1.
Documentation and auto-select OS-default CA cert directory will be given by a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants