Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregator doesn't expire metrics when rules split across multiple files #180

Closed
iain-buclaw-sociomantic opened this issue Jun 3, 2016 · 3 comments

Comments

@iain-buclaw-sociomantic
Copy link
Contributor

iain-buclaw-sociomantic commented Jun 3, 2016

(Snipping original comment)

Rules:

include /etc/carbon-c-relay/routes.d/*.conf
    ;

Where there are, say, three files.

05-team1.conf
10-team2.conf
15-team3.conf

Each with their own aggregate rules.

When spreading aggregate rules across multiple files like this, it looks like not all buckets get expired. Because very quickly after start-up I get bogus dropping metric too far in the future messages.

First error:
Date: 2016-06-03 08:45:30
Metric Timestamp: 2016-06-03 08:45:30
Last bucket start time: 2016-06-03 08:45:00

[2016-06-03 08:45:30] (ERR) aggregator: dropping metric too far in the future (1464943530 > 1464943500): _aggregator_stub_0x925ee0__aggregates.host.webservers-source.if_packets.rx from host-069.interface-eth0.if_packets.rx 1575.432766 1464943530
[2016-06-03 08:46:00] (ERR) aggregator: dropping metric too far in the future (1464943560 > 1464943500): _aggregator_stub_0x925ee0__aggregates.host.webservers-source.if_packets.rx from host-069.interface-eth0.if_packets.rx 1551.733283 1464943560
[2016-06-03 08:46:30] (ERR) aggregator: dropping metric too far in the future (1464943590 > 1464943500): _aggregator_stub_0x925ee0__aggregates.host.webservers-source.if_packets.rx from host-069.interface-eth0.if_packets.rx 1642.504282 1464943590
[2016-06-03 08:47:00] (ERR) aggregator: dropping metric too far in the future (1464943620 > 1464943500): _aggregator_stub_0x925ee0__aggregates.host.webservers-source.if_packets.rx from host-069.interface-eth0.if_packets.rx 1568.024141 1464943620
[2016-06-03 08:47:30] (ERR) aggregator: dropping metric too far in the future (1464943650 > 1464943500): _aggregator_stub_0x925ee0__aggregates.host.webservers-source.if_packets.rx from host-069.interface-eth0.if_packets.rx 1555.037178 1464943650
[2016-06-03 08:48:00] (ERR) aggregator: dropping metric too far in the future (1464943680 > 1464943500): _aggregator_stub_0x925ee0__aggregates.host.webservers-source.if_packets.rx from host-069.interface-eth0.if_packets.rx 2334.634330 1464943680
[2016-06-03 08:48:30] (ERR) aggregator: dropping metric too far in the future (1464943710 > 1464943500): _aggregator_stub_0x925ee0__aggregates.host.webservers-source.if_packets.rx from host-069.interface-eth0.if_packets.rx 1745.192256 1464943710
[2016-06-03 08:49:00] (ERR) aggregator: dropping metric too far in the future (1464943740 > 1464943500): _aggregator_stub_0x925ee0__aggregates.host.webservers-source.if_packets.rx from host-069.interface-eth0.if_packets.rx 1576.161690 1464943740
[2016-06-03 08:49:30] (ERR) aggregator: dropping metric too far in the future (1464943770 > 1464943500): _aggregator_stub_0x925ee0__aggregates.host.webservers-source.if_packets.rx from host-069.interface-eth0.if_packets.rx 1542.275990 1464943770
@iain-buclaw-sociomantic
Copy link
Contributor Author

Making the following change to give a little more detail in the error log.

time_t now;
time(&now);
now = ((now - s->expire) / s->interval) * s->interval;
logerr("aggregator: dropping metric too far in the "
                "future (%lld > %lld) (%lld == %lld): %s from %s", epoch,
                invocation->buckets[s->bucketcnt - 1].start,
                now, invocation->buckets[0].start,
                ometric, metric);

I get the following:

(ERR) aggregator: dropping metric too far in the future (1464952267 > 1464952230) (1464952170 == 1464952080)

Looks like the first entry in invocation->buckets is 90 seconds out of sync?

@iain-buclaw-sociomantic iain-buclaw-sociomantic changed the title Aggregator drops stats where timestamp == now() Aggregator doesn't expire metrics when rules split across multiple files Jun 3, 2016
@iain-buclaw-sociomantic
Copy link
Contributor Author

iain-buclaw-sociomantic commented Jun 3, 2016

Moving all aggregation rules into one file fixes the problem. So maybe this is a bug with include?

@iain-buclaw-sociomantic
Copy link
Contributor Author

iain-buclaw-sociomantic commented Jun 3, 2016

Got it! When setting a breakpoint in aggregator_expire, I noticed that only the aggregates in the last file being read (included) were having their buckets expired. Tracked it down to the router_readconfig initialization. Though it makes me wonder if slightly more complex configurations are broken also.

match foo
    send to foo
    ;
include a/file
    ;
match bar
    send to bar
    ;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant