Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpIntergrationTest fail at assert failure: !merge_in_progress #3287

Closed
qiwzhang opened this issue May 4, 2018 · 8 comments
Closed

HttpIntergrationTest fail at assert failure: !merge_in_progress #3287

qiwzhang opened this issue May 4, 2018 · 8 comments
Assignees
Labels
Milestone

Comments

@qiwzhang
Copy link
Contributor

qiwzhang commented May 4, 2018

Issue Template

Title: One line description

Description:

#3262

add an assertion in

envoy/source/common/stats/thread_local_store.cc:99] assert failure: !merge_in_progress_

Some HTTPIntegration tests run into that assertion. Its backtrace is:

IpVersions/JwtVerificationFilterIntegrationTestWithInjectedJwtResult.InjectedJwtResultSanitized/0
[2018-05-04 01:10:43.161][20626][critical][assert] external/envoy/source/common/stats/thread_local_store.cc:99] assert failure: !merge_in_progress_
[2018-05-04 01:10:43.161][20626][critical][backtrace] bazel-out/k8-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] Caught Aborted, suspect faulting address 0x3e900004fe8
[2018-05-04 01:10:43.162][20626][critical] Backtrace (most recent call first) from thread 20626:
#1 ?? ??:0
#2 ?? ??:0
#3
#4 Envoy::Stats::ThreadLocalStoreImpl::mergeHistograms(std::function<void ()>) at thread_local_store.cc:99 (discriminator 4)
#5 Envoy::Server::InstanceImpl::flushStats() at server.cc:159
#6 Envoy::Server::InstanceImpl::terminate() at server.cc:432
#7 Envoy::Server::InstanceImpl::run() at server.cc:410 (discriminator 2)
#8 Envoy::IntegrationTestServer::threadRoutine(Envoy::Network::Address::IpVersion, bool) at server.cc:109
#9 Envoy::IntegrationTestServer::start(Envoy::Network::Address::IpVersion, std::function<void ()>, bool)::{lambda()#1}::operator()() const at server.cc:38
#10 std::_Function_handler<void (), Envoy::IntegrationTestServer::start(Envoy::Network::Address::IpVersion, std::function<void ()>, bool)::{lambda()#1}>::_M_invoke(std::_Any_data const&) at functional:1871
#11 std::function<void ()>::operator()() const at functional:2267
#12 Envoy::Thread::Thread::Thread(std::function<void ()>)::{lambda(void*)#1}::operator()(void*) const at thread.cc:22
#13 Envoy::Thread::Thread::Thread(std::function<void ()>)::{lambda(void*)#1}::_FUN(void*) at thread.cc:24
#14
#15 ?? ??:0
#16
#17 ?? ??:0
#18

The test code is here

https://github.com/istio/proxy/blob/master/src/envoy/http/jwt_auth/integration_test/http_filter_integration_test.cc#L359

@qiwzhang
Copy link
Contributor Author

qiwzhang commented May 4, 2018

@htuch could you help?

Maybe my testing code have not been updated for too long.

@ramaraochavali
Copy link
Contributor

@mattklein123 could it be because of very low flush interval(50 ms) in integration tests https://github.com/envoyproxy/envoy/blob/master/test/integration/server.h#L56? Would the timer kick in again while flush process is not done with in 50 ms?

@mattklein123 mattklein123 added this to the 1.7.0 milestone May 4, 2018
@mattklein123
Copy link
Member

@ramaraochavali in the current code it's possible that when the server shuts down, it will attempt a flush while a current merge is happening. The best solution here is during the terminate flush, don't merge histograms, and just flush what you have. Can you look into making this change and adding tests?

@ramaraochavali
Copy link
Contributor

@mattklein123 I think we already have that guard condition while merging the histogram. I guess the problem is with the placement of ASSERT, it should be inside shutting_down_ if . https://github.com/envoyproxy/envoy/blob/master/source/common/stats/thread_local_store.cc#L99. WDYT?

@mattklein123
Copy link
Member

No, this is an actual bug. It has nothing to do with the ASSERT. See the call stack above. The issue is in server shutdown path.

@ramaraochavali
Copy link
Contributor

ramaraochavali commented May 4, 2018

Correct. But during shutdown path, stats_store's shutdownThreading is called before flushStats call which sets the shutting_down_ to true. If it is true, we do not try to merge at all. See https://github.com/envoyproxy/envoy/blob/master/source/common/stats/thread_local_store.cc#L100. But I think the problem is the ASSERT since it is outside of if condition, it is firing. Am I missing some thing here?

@mattklein123
Copy link
Member

Ah yes, correct, sorry. However, I think you should still flush on shutdown. IMO if you are shutting down, I would fire the completion callback immediately so that the existing stats get flushed.

@ramaraochavali
Copy link
Contributor

I agree. Here is the PR #3289

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants