Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with the SSL CA cert #8

Closed
djouallah opened this issue Sep 27, 2023 · 20 comments
Closed

Problem with the SSL CA cert #8

djouallah opened this issue Sep 27, 2023 · 20 comments

Comments

@djouallah
Copy link

it works fine in windows, but when running from a notebook using linux, I get this erros

duckdb.sql(''' create or replace view lineitem as select * from 'azure://tpch/lineitem/*.parquet';''')

Error: Invalid Error: Fail to get a new connection for: https://dddddddd.blob.core.windows.net./ Problem with the SSL CA cert (path? access rights?)
@Mause
Copy link
Member

Mause commented Sep 27, 2023

I'm able to replicate this under both WSL and Ubuntu, with the following:

INSTALL azure;
LOAD azure;
SET azure_storage_connection_string='DefaultEndpointsProtocol=https;AccountName=azuresdkdocs;AccountKey=redacted;EndpointSuffix=core.windows.net';
SELECT count(*) FROM 'azure://development/testing_of_duckdb/file.snappy.parquet';

@samansmink
Copy link
Collaborator

samansmink commented Oct 3, 2023

Partial workaround for users on linux running into this: installing curl seems to fix the issue at least on ubuntu for me. The problem here is that libcurl is statically linked into the extension and there are certificates missing or in the wrong path. Installing curl may resolve this issue for some environments but a more thorough solution is required.

People at ArcticDB seem to be running into the same issue here: man-group/ArcticDB#514. There's a PR up as we speak. They have actually already gone through the work of getting a PR in at the azure sdk for setting the path, it will be available at the 10th of november through vcpkg. We should be able to make use of their hard work by updating the azure sdk by then and exposing the path through duckdb.

@cholmes
Copy link

cholmes commented Oct 9, 2023

This one seems to be affecting my workflows as well, see discussion at microsoft/PlanetaryComputer#278 (reply in thread)

Would love to be able to use this azure extension, it'll make working with a few GeoParquet datasets a lot easier. Thanks for all your great work on this so far!

@deanm0000
Copy link

I'm using the python:3.10.13-bullseye image and I got the same issue. I already had the latest version of libcurl4-openssl-dev but I tried to also install libcurl4-gnutls-dev but still got

Error: Invalid Error: Fail to get a new connection for: https://stsynussp.blob.core.windows.net/. Problem with the SSL CA cert (path? access rights?)

@samansmink
Copy link
Collaborator

thanks for reporting @cholmes and @deanm0000. This is definitely something that will need fixing

@sulejmaninaim
Copy link

Same also happens to me using Ubuntu, there were no problem using Windows instead

@kmatt
Copy link

kmatt commented Dec 8, 2023

Are the binaries are built on an RHEL distribution ?: man-group/ArcticDB#514 (Azure/azure-sdk-for-cpp#4738)

The following "fixes" the error for me on Ubuntu 22.04, but I don't know if there are security implications:

mkdir -p /etc/pki/tls/certs
ln -s /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt

@sulejmaninaim
Copy link

Are the binaries are built on an RHEL distribution ?: man-group/ArcticDB#514 (Azure/azure-sdk-for-cpp#4738)

The following "fixes" the error for me on Ubuntu 22.04, but I don't know if there are security implications:

mkdir -p /etc/pki/tls/certs
ln -s /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt

This worked for me :)
Thanks @kmatt

@daviewales
Copy link

People at ArcticDB seem to be running into the same issue here: man-group/ArcticDB#514. There's a PR up as we speak. They have actually already gone through the work of getting a PR in at the azure sdk for setting the path, it will be available at the 10th of november through vcpkg. We should be able to make use of their hard work by updating the azure sdk by then and exposing the path through duckdb.

The PR mentioned above appears to be released now:
Azure/azure-sdk-for-cpp#4982

@samansmink
Copy link
Collaborator

@daviewales Thanks for the ping, latest vcpkg release now has this version of azure sdk as well, I will try to find some time to look into this issue in the near future!

@quentingodeau
Copy link
Contributor

Hello,
just for info I also got this annoying issue so I have make a PR#35 to make possible:

  1. to configure the path
  2. try to find the right path directly

hope it will help :)

@brianwyka
Copy link

brianwyka commented Feb 27, 2024

Is this fix now released in latest Azure extension for DuckDB v0.10.0?

@samansmink
Copy link
Collaborator

@brianwyka not yet!

however, once this job has succeeded, you can use the nightly build of azure, which does contain these fixes with: force install azure from 'http://nightly-extensions.duckdb.org';

@quentingodeau
Copy link
Contributor

The azure website page of the doc is not up to date so here you go:

Name Description Type Default
azure_transport_option_type Underlying adapter to use in the Azure SDK. Valid values are: default or curl. VARCHAR default

Setting azure_transport_option_type explicitly to curl with have the following effect:

  • On Linux, this may solve certificates issue (Error: Invalid Error: Fail to get a new connection for: https://<storage account name>.blob.core.windows.net/. Problem with the SSL CA cert (path? access rights?)) because when specifying the extension will try to find the bundle certificate in various paths (that is not done by curl by default and might be wrong due to static linking see issue).
  • On Windows, this replaces the default adapter (WinHTTP) allowing you to use all curl capabilities (for example using a socks proxies).
  • On all operating systems, it will honor the following environment variables:
    • CURL_CA_INFO: Path to a PEM encoded file containing the certificate authorities sent to libcurl. Note that this option is known to only work on Linux and might throw if set on other platforms.
    • CURL_CA_PATH: Path to a directory which holds PEM encoded file, containing the certificate authorities sent to libcurl.

@paradiddle-luuk
Copy link

@brianwyka not yet!

however, once this job has succeeded, you can use the nightly build of azure, which does contain these fixes with: force install azure from 'http://nightly-extensions.duckdb.org';

Thanks for the great work!

I am not sure this fixed it for me. I performed duckdb.sql("force install azure from 'http://nightly-extensions.duckdb.org';"). I still had to do a cert simlink for it to work on WSL ubuntu 22.04.

@quentingodeau
Copy link
Contributor

Hi @luuk-codebeez

Just to be sure, did you set the the variable ?

SET azure_transport_option_type = 'curl';

@paradiddle-luuk
Copy link

Hi @luuk-codebeez

Just to be sure, did you set the the variable ?

SET azure_transport_option_type = 'curl';

Woops that worked with the nightly build

@samansmink
Copy link
Collaborator

@luuk-codebeez Great to hear!

I just deployed the azure nightly binaries, so from now on with force install azure you will get the updated extension with this feature in it.

Also, this has now been added to the docs https://duckdb.org/docs/extensions/azure.html.

Thanks a lot for the effort here @quentingodeau!

@daviewales
Copy link

Would it make sense to make SET azure_transport_option_type = 'curl'; the default when running on Linux?

@quentingodeau
Copy link
Contributor

I thought about that but didn't do it, here my opinion/experience on this, when you change the default behavior of something it's really complex to rollbacks this changes.
When you change the default you have to keep the change so if tomorrow the azure SDK evolve to handle our use case by for example adding an AZURE_SSL_BUNDLE_PATH environment variable, then all user that are used to the SDK will expect that the env variable is handle exactly as it is describe in the azure doc. So in this case I will have to then handle this case with keeping the other parameters that I have allowed. I agree that it may not look like a great deal like this but then you have more and more small things like this pills up and making the code more and more complex for newcomers that do not have the full history.
But once again it's only an opinion ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests