relay is sending data randomly while kill -HUP happens #204

toni-moreno · 2016-07-18T07:21:10Z

Hi @grobian , I've updated to carbon-c-relay v2.1 (b8e663), last week, and I 've found a unpleasant surprise when I have added a new match line and sent kill -HUP to the process.

We are currently processing 940K input metrics with 425 matching rules over 5 clusters, all matching rules follows the same format.

match ^m0001\. send to cluster01 stop ;
match ^m0002\. send to cluster01 stop ;
match ^m0003\. send to cluster02 stop ;
match ^m0004\. send to cluster03 stop ;
match ^m0005\. send to cluster03 stop ;
match ^m0006\. send to cluster05 stop ;
..
..
match ^m0425\. send to cluster02 stop ;

And all cluster are forward clusters like that

cluster  cluster01
forward
mycluster01.mydomain.es:2003
;

After my kill -HUP , and only during one little moment ( in the exact second on witch I did the -HUP signal) carbon-c-relay sended randomly .

By example:

On cluster04 we have received metrics from 10 match rules configured to send to cluster03, ( no all metrics only a 886 metrics , and only for this moment).
On cluster03 we have received metrics from 3 match rules configured to sent to cluster04 (191 metrics)
On cluster05 we have received metrics from 7 match rules configured to send to cluster02 (823 metrics)

although very few metrics erroneously sent. There is enough to avoid kill -HUP in the future. And I will need this feature to allow dinamic configuration for new systems.

Thank you very much for your patience.

The text was updated successfully, but these errors were encountered:

grobian · 2016-07-23T15:32:43Z

some questions:
your config doesn't involve aggregations, does it?
are your clusters really single target forward clusters, or do you use any_of, failover, ...?

toni-moreno · 2016-07-23T19:56:31Z

We have not aggregations and we have only single target forward clusters.

IMHO this seems something related to this other issue (#199)

grobian · 2016-07-24T06:34:20Z

Could be, perhaps, yet I'm searching for how this could happen, since you see misdirected traffic.

sbengo · 2016-07-28T09:38:33Z

Hi,

I'm working with @toni-moreno on the same project.
Following with the above comments and the same configuration, after the definition of the clusters we had a few rules on some hosts that we wanted to exclude, exactly 37 rules.
Each rule is filtering an unique host name.
Each metric is received as: a.b.c.hostname.product.*.metric$

match hostname01 send to blackhole stop ;
...
match hostname036 send to blackhole stop ;
match hostname037 send to blackhole stop ;

They worked perfectly until the upgrade.
It seems that carbon-c-relay is not matching the rules, so we are receiving these unwanted metrics.

To add more information, as you can see, the metrics had fallen from 31K to 2K after the kill -HUP described by @toni-moreno.

Thanks for all!

grobian · 2016-07-29T06:44:19Z

Hmmm, that's absolutely no good!
What version were you upgrading from and to? That might help me narrow down the cause.

grobian · 2016-07-31T08:07:17Z

FYI: I'm working on #199, but my time is limited at the moment. In the meanwhile I continue to look for clues how misdirection can happen.

sbengo · 2016-08-02T07:52:50Z

Hi, grobian.

Sorry, I couldn't retrieve the info earlier. Following lines are the direct output of carbon-c-relay -v before and after the upgrade:

Version FROM: carbon-c-relay v0.40 (2015-05-18)
Version TO: carbon-c-relay v2.1 (b8e663)

Thanks for all!

grobian · 2016-08-02T08:09:28Z

thanks

grobian · 2016-08-11T07:52:06Z

hmmm, metricsBlackholed only appeared in v0.45, so are you sure you went from 0.40?

sbengo · 2016-08-16T06:59:38Z

Sorry @grobian, couldn't answer you earlier.

Yes, we upgradded from 0.40:

· We upgraded from v0.40 (2015-05-18) to carbon-c-relay v2.1 (b8e663) at 7:32, so the new metric: metricsBlackholed was created at the moment of the upgrade/start of the service.

· At 7:57, we sent the kill -HUP to kill the process, as @toni-moreno explained on the first comment.

· After that, we had seen the strange behaivour explained on this issue.

Thanks for all,
Greetings

grobian · 2016-08-18T06:41:58Z

I see, so the reload (SIGHUP) causes the problem

Civil · 2016-10-03T10:24:30Z

We have the same problem with SIGHUP on recent 2.2 version. It starts sending data randomly during the reload.

azhiltsov · 2016-10-03T10:27:04Z

In addition to @Civil comment: we was upgraded from 2.1 with some patches to 2.2-2e76c18 one day before this happened. So 2.2-2e76c18 is definitely affected.

Civil · 2016-10-03T10:30:08Z

That actually was 2.1 + cherry-picked patches to fix previous issues with SIGHUP (#188)

An embarassing logic error caused queues from totally unrelated servers to be swapped, causing all kinds of random erroneously delivered metrics.

grobian · 2016-12-31T16:43:13Z

@toni-moreno: I found a logic error which explains your observed mis-behaviour. I'm confident that one's going to help lot. For the other people on this issue, I'm not sure it's the same problem, it feels not, so I'll keep the issue open for now.

grobian · 2016-12-31T16:53:00Z

Civil seems to be talking about the same issue, @sbengo seems to hint at a problem with match rules no longer working after the HUP.

grobian · 2017-01-26T07:31:36Z

@sbengo @toni-moreno is this problem still happening, or another problem now? I believe the original problem for this bug was fixed in v2.4. If you still have this problem, please reopen. If you see another problem, please file a new issue.

grobian added the bug label Jul 18, 2016

grobian added a commit that referenced this issue Dec 31, 2016

router_transplant_queues: fix inversed logic, issue #204

611450a

An embarassing logic error caused queues from totally unrelated servers to be swapped, causing all kinds of random erroneously delivered metrics.

grobian closed this as completed Jan 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

relay is sending data randomly while kill -HUP happens #204

relay is sending data randomly while kill -HUP happens #204

toni-moreno commented Jul 18, 2016

grobian commented Jul 23, 2016

toni-moreno commented Jul 23, 2016

grobian commented Jul 24, 2016

sbengo commented Jul 28, 2016

grobian commented Jul 29, 2016

grobian commented Jul 31, 2016

sbengo commented Aug 2, 2016

grobian commented Aug 2, 2016

grobian commented Aug 11, 2016

sbengo commented Aug 16, 2016

grobian commented Aug 18, 2016

Civil commented Oct 3, 2016

azhiltsov commented Oct 3, 2016 •

edited

Loading

Civil commented Oct 3, 2016

grobian commented Dec 31, 2016

grobian commented Dec 31, 2016

grobian commented Jan 26, 2017

relay is sending data randomly while kill -HUP happens #204

relay is sending data randomly while kill -HUP happens #204

Comments

toni-moreno commented Jul 18, 2016

grobian commented Jul 23, 2016

toni-moreno commented Jul 23, 2016

grobian commented Jul 24, 2016

sbengo commented Jul 28, 2016

grobian commented Jul 29, 2016

grobian commented Jul 31, 2016

sbengo commented Aug 2, 2016

grobian commented Aug 2, 2016

grobian commented Aug 11, 2016

sbengo commented Aug 16, 2016

grobian commented Aug 18, 2016

Civil commented Oct 3, 2016

azhiltsov commented Oct 3, 2016 • edited Loading

Civil commented Oct 3, 2016

grobian commented Dec 31, 2016

grobian commented Dec 31, 2016

grobian commented Jan 26, 2017

azhiltsov commented Oct 3, 2016 •

edited

Loading