Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azure-core-cpp version 1.11.0 introduced a binary breaking change. #5322

Closed
3 tasks done
teo-tsirpanis opened this issue Feb 5, 2024 · 6 comments
Closed
3 tasks done
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that

Comments

@teo-tsirpanis
Copy link
Contributor

Describe the bug
Since publishing azure-core-cpp 1.11.0 to Conda, users have hit segfaults when the destructor of CurlTransport is called, from a library compiled with version 1.10.3.

The issue was originally reported in conda-forge/azure-core-cpp-feedstock#10 (comment). Because the Azure SDK declares to be binary-compatible across minor releases, we had followed suit when we created the Conda feedstock, and the minor version bump did not cause downstreams to be rebuilt.

Exception or Stack Trace

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff7d2a3fe in free () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff7d2a3fe in free () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fffacc2236a in Azure::Core::Http::CurlTransport::~CurlTransport() ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../../././libazure-core.so
#2  0x00007fffacc58092 in Azure::Core::Http::Policies::_internal::TransportPolicy::~TransportPolicy() ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../../././libazure-core.so
#3  0x00007fffaccf7d5a in std::_Sp_counted_ptr_inplace<Azure::Core::Http::_internal::HttpPipeline, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../../././libazure-storage-blobs.so
#4  0x00007fffb03146f7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../.././libtiledb.so.2.19
#5  0x00007fffb04085bd in void tiledb::common::tiledb_delete<Azure::Storage::Blobs::BlobServiceClient>(Azure::Storage::Blobs::BlobServiceClient*) ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../.././libtiledb.so.2.19
#6  0x00007fffb04088cc in tiledb::sm::Azure::~Azure() ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../.././libtiledb.so.2.19
#7  0x00007fffb0c46820 in std::_Sp_counted_ptr_inplace<tiledb_ctx_handle_t, tiledb::common::GovernedAllocator<tiledb_ctx_handle_t, tiledb::common::(anonymous namespace)::TiledbTracedAllocator, tiledb::common::Governor>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../.././libtiledb.so.2.19
#8  0x00007fffb0c475c2 in tiledb_ctx_free () from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../.././libtiledb.so.2.19
#9  0x00007fffb38016b4 in tiledb::Context::free(tiledb_ctx_handle_t*) ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../../libgdal.so.34
#10 0x00007fffb38012de in std::_Sp_counted_deleter<tiledb_ctx_handle_t*, void (*)(tiledb_ctx_handle_t*), std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../../libgdal.so.34
#11 0x00007fffb3806c8e in TileDBDataset::Identify(GDALOpenInfo*) ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../../libgdal.so.34
#12 0x00007fffb3e81246 in GDALOpenEx () from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../../libgdal.so.34
#13 0x00007fffac571c3f in __pyx_f_5fiona_6ogrext_gdal_open_vector ()

To Reproduce
The bug reproduces when using TileDB built with Azure support and azure-core-cpp 1.10.3, and linked to 1.11.0 at runtime, and calling the tiledb_ctx_free API, which under the hood will free the Azure HttpTransport.

Code Snippet
TileDB creates the CurlTransport in question here:

https://github.com/TileDB-Inc/TileDB/blob/ad0371fb8cc87abf9aa7c23e053d40b1f87fa80d/tiledb/sm/filesystem/azure.cc#L1147-L1168

Expected behavior
Updating azure-core-cpp to 1.11.0 without recompiling does not segfault.

Screenshots
N/A

Setup (please complete the following information):

  • OS: Linux
  • IDE : N/A
  • Version of the Library used: 1.11.0

Additional context
For now we changed the Conda feedstock for azure-core-cpp (and azure-storage-common-cpp for good measure) to be compatible only across patch versions, and pinned TileDB to azure-core-cpp versions less than 1.11.0.

I am not very familiar with what constitutes a binary breaking change in C++, but after comparing versions 1.10.3 and 1.11.0, I am strongly suspecting #4982 to have caused the break. At the time TileDB got compiled, the type layout of CurlTransportOptions is different from at the time the app runs.

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Bug Description Added
  • Repro Steps Added
  • Setup information Added
@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Feb 5, 2024
@RickWinter RickWinter added Azure.Core Client This issue points to a problem in the data-plane of the library. labels Feb 5, 2024
@github-actions github-actions bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Feb 5, 2024
@LarryOsterman
Copy link
Member

To my knowledge, there are no expectations of binary stability across versions of the Azure SDK. There is an expectation of source stability across minor versions and patches. But whenever a new release of the Azure SDK is made (even patch releases), we expect that customers will recompile the libraries to consume the release.

Could you help us understand how we can improve our documentation to ensure that we help other customers to avoid making similar assumptions?

@teo-tsirpanis
Copy link
Contributor Author

teo-tsirpanis commented Feb 5, 2024

Could you help us understand how we can improve our documentation to ensure that we help other customers to avoid making similar assumptions?

You should change this line to COMPATIBILITY ExactVersion.

@antkmsft
Copy link
Member

antkmsft commented Feb 6, 2024

@teo-tsirpanis, we read it as "API (not ABI) is compatible within the same Major version". I.e. if you had your code written against Azure Core 1.0, you can expect to rebuild it with 1.1 and everything will compile and work the same. We do not read it as binary compatibility without recompilation. Why do you read that CMake's compatibility is talking about binary compatibility?

Binary compatibility is way harder to achieve - basically, sizeof(AnyClassOrStruct) needs to stay the same between versions (essentially, all classes and structs will have one single pointer of m_pImpl, and all the class members will live there), inline functions are also pretty much a no go (or severely limited). Plus there is almost no tooling exist to warn a developer on breaking binary changes, break the build in CI, etc. The library needs to be written with this goal in mind, a guidelines on allowed practices should exist there from the beginning of development, to minimize errors.
And, C++ does not have a standardized ABI, so even with all that it won't be possible to take DLL that was compiled by MSVC and swap it with a library compiled by Clang - we would basically need to distribute compiled DLLs (signed) for maximum customer success, and compile them for different architectures, with different compilers, different CRT options, CPU architectures etc, so that the customer has options to chose from.

I don't think majority of C++ (not C) libraries do provide binary compatibility.

@teo-tsirpanis
Copy link
Contributor Author

teo-tsirpanis commented Feb 6, 2024

Why do you read that CMake's compatibility is talking about binary compatibility?

My understanding might be wrong, but because the COMPATIBILITY option affects calls to find_package, which will find already built binaries, I interpreted it as specifying binary compatibility.

@LarryOsterman
Copy link
Member

FWIW, my reading of the CMAKE documentation says that the COMPATIBILITY section defines whether or not find_package will find an existing package. It says nothing about whether the package needs to have been rebuilt after a version change.

And as Anton said above, for C++ packages, the only safe choice is to recompile the package on every new version. And honestly, you should also recompile C++ packages every new version of the compiler as well - because (as I've said many times), C++ has no stable binary contract. Heck, I'm half-heartedly considering getting that as a tattoo, I say it so often.

@teo-tsirpanis
Copy link
Contributor Author

Thanks for the comments, turns out my understanding of COMPATIBILITY was wrong. find_package runs at build time, so the consuming binary will have to be rebuilt either way.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
Development

No branches or pull requests

4 participants