-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41004: [C++][FS][Azure] Don't run TestGetFileInfoGenerator() with Valgrind #41163
Conversation
… with Valgrind Because GetFileInfo() with generator reports false positive memory leak in Azure SDK for C++.
@github-actions crossbow submit test-conda-cpp-valgrind |
|
Revision: 5f4255f Submitted crossbow builds: ursacomputing/crossbow @ actions-29b32e4857
|
+1 |
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 6e8ac43. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 20 possible false positives for unstable benchmarks that are known to sometimes produce them. |
@@ -57,6 +57,7 @@ | |||
#cmakedefine ARROW_GCS | |||
#cmakedefine ARROW_HDFS | |||
#cmakedefine ARROW_S3 | |||
#cmakedefine ARROW_TEST_MEMCHECK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have ARROW_VALGRIND
, I don't understand why we would need to introduce this new constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry. I missed it. Can we move ARROW_VALGRIND
to config.h.cmake
from add_definitions(-DARROW_VALGRIND)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an information that is useful to third-party code, so I don't understand why that would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not be useful to third-party code but it helps us to know what macros are defined in a build.
If we use add_definitions(-DARROW_VALGRIND)
, we need to use ninja -v
and find a -DARROW_VALGRIND
from a long command line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kou Did we try to understand why a memory leak was reported? We shouldn't just silence the errors that annoy us. cc @felipecrv |
Ok... This seems to assume that Valgrind is buggy and doesn't handle a calloc() call correctly, while our Azure FS implementation (and/or the Azure SDK) is spotless and cannot contain any bugs. I'm quite skeptical. There are couple search results online for this, but very few, so I'm not sure it's really an issue in libxml or in the Azure SDK. Have you tried opening a bug report there? Perhaps they know more about the issue. Really, deciding that errors are "false positives" without any further analysis does not strike me as a serious way to deal with CI issues. And even if we decide it's a false positive, we can add a Valgrind suppression for the case of |
…Valgrind (#41163) ### Rationale for this change `GetFileInfo()` with generator reports false positive memory leak in Azure SDK for C++. ### What changes are included in this PR? Don't run `TestGetFileInfoGenerator()` with Valgrind. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #41004 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
The reason I didn't make progress on the issue was also this sense that this seems to be an actual leak and not a misreport that we can just suppress. But I suspect it's something in the Azure SDK (exception-safety failure?) because arrow/cpp/src/arrow/filesystem/azurefs.cc Lines 1683 to 1687 in 48a9639
|
That may definitely be possible (in which case the issue should be reported upstream nevertheless), but it would be nice to investigate a bit to make sure we're not missing anything on our side. |
Note that We see this in
Where does |
Answering myself: it's version 2.12.6 from conda-forge. |
And the Azure SDK depends on it:
|
Judging by the disassembly of |
Ok, that's because |
I've reported conda-forge/libxml2-feedstock#117 |
I've also investigated this more. I think that I could reproduce this by the following program that only uses // g++ -fsanitize=address -g3 -O0 -o aaa aaa.cxx $(pkg-config --cflags --libs libxml-2.0) && ./aaa
#include <chrono>
#include <condition_variable>
#include <mutex>
#include <thread>
#include <libxml/xmlreader.h>
int main(void) {
xmlInitParser();
std::mutex mutex;
std::condition_variable variable;
std::thread thread([&] {
xmlTextReaderPtr reader =
xmlReaderForMemory("<root/>", 7, nullptr, nullptr, 0);
xmlFreeTextReader(reader);
variable.notify_one();
{
std::unique_lock<std::mutex> lock(mutex);
variable.wait(lock);
}
});
{
std::unique_lock<std::mutex> lock(mutex);
variable.wait(lock);
}
xmlCleanupParser();
variable.notify_one();
thread.join();
return 0;
} The point of this program is that a thread that uses libxml2 API is finished after This is also happen in our test. A thread created by Note that both of them are happen by a C++ destructor. They are called on process exit. If we shutdown a thread of diff --git a/cpp/src/arrow/filesystem/azurefs_test.cc b/cpp/src/arrow/filesystem/azurefs_test.cc
index ed09bfc2fa..d87e3e0731 100644
--- a/cpp/src/arrow/filesystem/azurefs_test.cc
+++ b/cpp/src/arrow/filesystem/azurefs_test.cc
@@ -58,6 +58,7 @@
#include "arrow/util/logging.h"
#include "arrow/util/pcg_random.h"
#include "arrow/util/string.h"
+#include "arrow/util/thread_pool.h"
#include "arrow/util/unreachable.h"
#include "arrow/util/value_parsing.h"
@@ -371,6 +372,8 @@ class TestGeneric : public ::testing::Test, public GenericFileSystemTest {
if (azure_fs_) {
ASSERT_OK(azure_fs_->DeleteDir(container_name_));
}
+ // Dirty
+ ASSERT_OK(reinterpret_cast<::arrow::internal::ThreadPool *>(io_context_->executor())->Shutdown());
}
protected:
@@ -379,7 +382,8 @@ class TestGeneric : public ::testing::Test, public GenericFileSystemTest {
random::pcg32_fast rng((std::random_device()()));
container_name_ = PreexistingData::RandomContainerName(rng);
ASSERT_OK_AND_ASSIGN(auto options, MakeOptions(env_));
- ASSERT_OK_AND_ASSIGN(azure_fs_, AzureFileSystem::Make(options));
+ io_context_ = std::make_unique<io::IOContext>();
+ ASSERT_OK_AND_ASSIGN(azure_fs_, AzureFileSystem::Make(options, *io_context_));
ASSERT_OK(azure_fs_->CreateDir(container_name_, true));
fs_ = std::make_shared<SubTreeFileSystem>(container_name_, azure_fs_);
}
@@ -417,6 +421,7 @@ class TestGeneric : public ::testing::Test, public GenericFileSystemTest {
private:
std::string container_name_;
+ std::unique_ptr<io::IOContext> io_context_;
};
class TestAzuriteGeneric : public TestGeneric { |
… with Valgrind (apache#41163) ### Rationale for this change `GetFileInfo()` with generator reports false positive memory leak in Azure SDK for C++. ### What changes are included in this PR? Don't run `TestGetFileInfoGenerator()` with Valgrind. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#41004 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Well, we can wait for an updated conda-forge package to make sure that's the case. |
Looks like yet another case of a library polluting the thread-local storage area :) |
It's more a case of DLL finalization coming before whatever unit of work is still being executed in some worker thread, AFAIU. We had a similar problem with S3 (but resulting in worse misbehavior aka crashes) and had to jump through hoops to (hopefully) solve the problem. |
…Valgrind again We use conda's libxml2. It didn't use `--with-tls` but 2.12.6-2 or later uses `--with-tls`. It may suppress a detected leak.
It seems that |
… with Valgrind (apache#41163) ### Rationale for this change `GetFileInfo()` with generator reports false positive memory leak in Azure SDK for C++. ### What changes are included in this PR? Don't run `TestGetFileInfoGenerator()` with Valgrind. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#41004 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
… with Valgrind (apache#41163) ### Rationale for this change `GetFileInfo()` with generator reports false positive memory leak in Azure SDK for C++. ### What changes are included in this PR? Don't run `TestGetFileInfoGenerator()` with Valgrind. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#41004 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
… with Valgrind (apache#41163) ### Rationale for this change `GetFileInfo()` with generator reports false positive memory leak in Azure SDK for C++. ### What changes are included in this PR? Don't run `TestGetFileInfoGenerator()` with Valgrind. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#41004 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
… with Valgrind (apache#41163) ### Rationale for this change `GetFileInfo()` with generator reports false positive memory leak in Azure SDK for C++. ### What changes are included in this PR? Don't run `TestGetFileInfoGenerator()` with Valgrind. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#41004 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
… with Valgrind (apache#41163) ### Rationale for this change `GetFileInfo()` with generator reports false positive memory leak in Azure SDK for C++. ### What changes are included in this PR? Don't run `TestGetFileInfoGenerator()` with Valgrind. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#41004 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Rationale for this change
GetFileInfo()
with generator reports false positive memory leak in Azure SDK for C++.What changes are included in this PR?
Don't run
TestGetFileInfoGenerator()
with Valgrind.Are these changes tested?
Yes.
Are there any user-facing changes?
No.