Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relay is sending data randomly while kill -HUP happens #204

Closed
toni-moreno opened this issue Jul 18, 2016 · 17 comments
Closed

relay is sending data randomly while kill -HUP happens #204

toni-moreno opened this issue Jul 18, 2016 · 17 comments
Labels

Comments

@toni-moreno
Copy link

Hi @grobian , I've updated to carbon-c-relay v2.1 (b8e663), last week, and I 've found a unpleasant surprise when I have added a new match line and sent kill -HUP to the process.

We are currently processing 940K input metrics with 425 matching rules over 5 clusters, all matching rules follows the same format.

match ^m0001\. send to cluster01 stop ;
match ^m0002\. send to cluster01 stop ;
match ^m0003\. send to cluster02 stop ;
match ^m0004\. send to cluster03 stop ;
match ^m0005\. send to cluster03 stop ;
match ^m0006\. send to cluster05 stop ;
..
..
match ^m0425\. send to cluster02 stop ;

And all cluster are forward clusters like that

cluster  cluster01
forward
mycluster01.mydomain.es:2003
;

After my kill -HUP , and only during one little moment ( in the exact second on witch I did the -HUP signal) carbon-c-relay sended randomly .

By example:

On cluster04 we have received metrics from 10 match rules configured to send to cluster03, ( no all metrics only a 886 metrics , and only for this moment).
On cluster03 we have received metrics from 3 match rules configured to sent to cluster04 (191 metrics)
On cluster05 we have received metrics from 7 match rules configured to send to cluster02 (823 metrics)

although very few metrics erroneously sent. There is enough to avoid kill -HUP in the future. And I will need this feature to allow dinamic configuration for new systems.

Thank you very much for your patience.

@grobian grobian added the bug label Jul 18, 2016
@grobian
Copy link
Owner

grobian commented Jul 23, 2016

some questions:
your config doesn't involve aggregations, does it?
are your clusters really single target forward clusters, or do you use any_of, failover, ...?

@toni-moreno
Copy link
Author

We have not aggregations and we have only single target forward clusters.

IMHO this seems something related to this other issue (#199)

@grobian
Copy link
Owner

grobian commented Jul 24, 2016

Could be, perhaps, yet I'm searching for how this could happen, since you see misdirected traffic.

@sbengo
Copy link

sbengo commented Jul 28, 2016

Hi,

I'm working with @toni-moreno on the same project.
Following with the above comments and the same configuration, after the definition of the clusters we had a few rules on some hosts that we wanted to exclude, exactly 37 rules.
Each rule is filtering an unique host name.
Each metric is received as: a.b.c.hostname.product.*.metric$

match hostname01 send to blackhole stop ;
...
match hostname036 send to blackhole stop ;
match hostname037 send to blackhole stop ;

They worked perfectly until the upgrade.
It seems that carbon-c-relay is not matching the rules, so we are receiving these unwanted metrics.

To add more information, as you can see, the metrics had fallen from 31K to 2K after the kill -HUP described by @toni-moreno.

image

Thanks for all!

@grobian
Copy link
Owner

grobian commented Jul 29, 2016

Hmmm, that's absolutely no good!
What version were you upgrading from and to? That might help me narrow down the cause.

@grobian
Copy link
Owner

grobian commented Jul 31, 2016

FYI: I'm working on #199, but my time is limited at the moment. In the meanwhile I continue to look for clues how misdirection can happen.

@sbengo
Copy link

sbengo commented Aug 2, 2016

Hi, grobian.

Sorry, I couldn't retrieve the info earlier. Following lines are the direct output of carbon-c-relay -v before and after the upgrade:

Version FROM: carbon-c-relay v0.40 (2015-05-18)
Version TO: carbon-c-relay v2.1 (b8e663)

Thanks for all!

@grobian
Copy link
Owner

grobian commented Aug 2, 2016

thanks

@grobian
Copy link
Owner

grobian commented Aug 11, 2016

hmmm, metricsBlackholed only appeared in v0.45, so are you sure you went from 0.40?

@sbengo
Copy link

sbengo commented Aug 16, 2016

Sorry @grobian, couldn't answer you earlier.

Yes, we upgradded from 0.40:

· We upgraded from v0.40 (2015-05-18) to carbon-c-relay v2.1 (b8e663) at 7:32, so the new metric: metricsBlackholed was created at the moment of the upgrade/start of the service.

· At 7:57, we sent the kill -HUP to kill the process, as @toni-moreno explained on the first comment.

· After that, we had seen the strange behaivour explained on this issue.

image

Thanks for all,
Greetings

@grobian
Copy link
Owner

grobian commented Aug 18, 2016

I see, so the reload (SIGHUP) causes the problem

@Civil
Copy link

Civil commented Oct 3, 2016

We have the same problem with SIGHUP on recent 2.2 version. It starts sending data randomly during the reload.

@azhiltsov
Copy link

azhiltsov commented Oct 3, 2016

In addition to @Civil comment: we was upgraded from 2.1 with some patches to 2.2-2e76c18 one day before this happened. So 2.2-2e76c18 is definitely affected.

@Civil
Copy link

Civil commented Oct 3, 2016

That actually was 2.1 + cherry-picked patches to fix previous issues with SIGHUP (#188)

grobian added a commit that referenced this issue Dec 31, 2016
An embarassing logic error caused queues from totally unrelated servers
to be swapped, causing all kinds of random erroneously delivered
metrics.
@grobian
Copy link
Owner

grobian commented Dec 31, 2016

@toni-moreno: I found a logic error which explains your observed mis-behaviour. I'm confident that one's going to help lot. For the other people on this issue, I'm not sure it's the same problem, it feels not, so I'll keep the issue open for now.

@grobian
Copy link
Owner

grobian commented Dec 31, 2016

Civil seems to be talking about the same issue, @sbengo seems to hint at a problem with match rules no longer working after the HUP.

@grobian
Copy link
Owner

grobian commented Jan 26, 2017

@sbengo @toni-moreno is this problem still happening, or another problem now? I believe the original problem for this bug was fixed in v2.4. If you still have this problem, please reopen. If you see another problem, please file a new issue.

@grobian grobian closed this as completed Jan 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants