sampling: use a data stream for sampled trace docs #4707

axw · 2021-02-09T06:08:15Z

Motivation/summary

Update tail-based sampling to index into and search a data stream. The data stream will be associated with an ILM policy that takes care of rollover and deletion.

When running in Fleet-managed mode, apm-server will expect the data stream and ILM policy to exist for the data stream called traces-sampled-<namespace>. Servers participating in tail-based sampling are required to be configured with the same namespace.

When running in standalone mode, apm-server will attempt to create an index template and ILM policy for a data stream called apm-sampled-traces. This is added for minimal support while we transition things to Fleet, and is intended to be removed in a future release. The data stream is not intended to adhere to the standard indexing strategy.

Checklist

Update CHANGELOG.asciidoc

How to test these changes

cd systemtest && go test -v -run TailSampling

For manual testing:

see Tail-sampling: transactions are being duplicated #4584 for how to test the fix for original issue
run apm-server with tail sampling enabled:
- ensure an index template and ILM policy both called "apm-sampled-traces" are created
- ensure a data stream called "apm-sampled-traces"

It's not currently possible to configure tail sampling with Fleet, so no testing steps are described for those parts. Just check that installing the package leads to the "sampled_traces" data stream index template, ILM policy, and ingest pipeline being installed.

Related issues

Closes #4584

This enables x-pack/apm-server code to alter behaviour based on whether APM Server is managed or not, and to create data streams with the configured namespace.

Update tail-based sampling to index into and search a data stream. The data stream will be associated with an ILM policy that takes care of rollover and deletion. When running in Fleet-managed mode, apm-server will expect the data stream and ILM policy to exist for the data stream called `traces-sampled-<namespace>`. Servers participating in tail-based sampling are required to be configured with the same namespace. When running in standalone mode, apm-server will attempt to create an index template and ILM policy for a data stream called `apm-sampled-traces`. This is added for minimal support while we transition things to Fleet, and is intended to be removed in a future release. The data stream is not intended to adhere to the standard indexing strategy.

Add a data stream for sampled trace documents, along with an ILM policy which rolls over after 1h, and then deletes after 1h.

When searching for beats monitoring docs, make sure we get the most recent one by sorting on 'timestamp'.

Update test to rely on apm-server to create its own data stream index template.

apmmachine · 2021-02-09T06:12:52Z

💔 Tests Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Build Cause: Pull request #4707 updated
Start Time: 2021-02-11T07:47:36.164+0000
Duration: 49 min 50 sec
Commit: 548dd7b

Test stats 🧪

Test	Results
Failed	1
Passed	4720
Skipped	124
Total	4845

Trends 🧪

Test errors

Expand to view the tests failures

`Build and Test / APM Integration Tests / test_conc_req_all_agents – tests.agent.test_multiple_agents`

Expand to view the error details

 AssertionError: queried for [('processor.event', 'transaction'), ('service.name', ['dotnetapp', 'dotnetapp', 'flaskapp', 'flaskapp', 'djangoapp', 'djangoapp', 'expressapp', 'expressapp', 'railsapp', 'railsapp', 'gonethttpapp', 'gonethttpapp', 'springapp', 'springapp'])], expected 7000, got 6999

Expand to view the stacktrace

 es = <tests.fixtures.es.es.<locals>.Elasticsearch object at 0x7f2bef34e190>
apm_server = <tests.fixtures.apm_server.apm_server.<locals>.APMServer object at 0x7f2bef34e4d0>
flask = <tests.fixtures.agents.Agent object at 0x7f2bb47ad490>
django = <tests.fixtures.agents.Agent object at 0x7f2bef99b950>
dotnet = <tests.fixtures.agents.Agent object at 0x7f2bef34e1d0>
express = <tests.fixtures.agents.Agent object at 0x7f2bb47b3990>
rails = <tests.fixtures.agents.Agent object at 0x7f2bef34ffd0>
go_nethttp = <tests.fixtures.agents.Agent object at 0x7f2bec09fcd0>
java_spring = <tests.fixtures.agents.Agent object at 0x7f2bd445f810>

    def test_conc_req_all_agents(es, apm_server, flask, django, dotnet, express, rails, go_nethttp, java_spring):
        dotnet_f = Concurrent.Endpoint(dotnet.foo.url,
                                       dotnet.app_name,
                                       ["foo"],
                                       "GET /foo",
                                       events_no=500)
        dotnet_b = Concurrent.Endpoint(dotnet.bar.url,
                                       dotnet.app_name,
                                       ["bar", "extra"],
                                       "GET /bar",
                                       events_no=500)
        flask_f = Concurrent.Endpoint(flask.foo.url,
                                      flask.app_name,
                                      ["app.foo"],
                                      "GET /foo",
                                      events_no=500)
        flask_b = Concurrent.Endpoint(flask.bar.url,
                                      flask.app_name,
                                      ["app.bar", "app.extra"],
                                      "GET /bar",
                                      events_no=500)
        django_f = Concurrent.Endpoint(django.foo.url,
                                       django.app_name,
                                       ["foo.views.foo"],
                                       "GET foo.views.show",
                                       events_no=500)
        django_b = Concurrent.Endpoint(django.bar.url,
                                       django.app_name,
                                       ["bar.views.bar", "bar.views.extra"],
                                       "GET bar.views.show",
                                       events_no=500)
        express_f = Concurrent.Endpoint(express.foo.url,
                                        express.app_name,
                                        ["app.foo"],
                                        "GET /foo",
                                        events_no=500)
        express_b = Concurrent.Endpoint(express.bar.url,
                                        express.app_name,
                                        ["app.bar", "app.extra"],
                                        "GET /bar",
                                        events_no=500)
        rails_f = Concurrent.Endpoint(rails.foo.url,
                                      rails.app_name,
                                      ["ApplicationController#foo"],
                                      "ApplicationController#foo",
                                      events_no=500)
        rails_b = Concurrent.Endpoint(rails.bar.url,
                                      rails.app_name,
                                      ["ApplicationController#bar", "app.extra"],
                                      "ApplicationController#bar",
                                      events_no=500)
        go_nethttp_f = Concurrent.Endpoint(go_nethttp.foo.url,
                                           go_nethttp.app_name,
                                           ["foo"],
                                           "GET /foo",
                                           events_no=500)
        go_nethttp_b = Concurrent.Endpoint(go_nethttp.bar.url,
                                           go_nethttp.app_name,
                                           ["bar", "extra"],
                                           "GET /bar",
                                           events_no=500)
        java_spring_f = Concurrent.Endpoint(java_spring.foo.url,
                                            java_spring.app_name,
                                            ["foo"],
                                            "GreetingController#foo",
                                            events_no=500)
        java_spring_b = Concurrent.Endpoint(java_spring.bar.url,
                                            java_spring.app_name,
                                            ["bar", "extra"],
                                            "GreetingController#bar",
                                            events_no=500)
    
        Concurrent(es, [
            dotnet_f, dotnet_b,
            flask_f, flask_b,
            django_f, django_b,
            express_f, express_b,
            rails_b, rails_f,
            go_nethttp_f, go_nethttp_b,
            java_spring_f, java_spring_b,
>       ], iters=1).run()

tests/agent/test_multiple_agents.py:84: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/agent/concurrent_requests.py:254: in run
    self.check_counts(it)
tests/agent/concurrent_requests.py:138: in check_counts
    assert_count([("processor.event", "transaction"), ("service.name", service_names)], transactions_count)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

terms = [('processor.event', 'transaction'), ('service.name', ['dotnetapp', 'dotnetapp', 'flaskapp', 'flaskapp', 'djangoapp', 'djangoapp', ...])]
expected = 7000

    def assert_count(terms, expected):
        """wait a bit for doc count to reach expectation"""
        @timeout_decorator.timeout(max_wait)
        def check_count(mut_actual):
            while True:
                rsp = self.es.count(index=self.index, body=self.elasticsearch.term_q(terms))
                mut_actual[0] = rsp["count"]
                if mut_actual[0] >= expected:
                    return
                time.sleep(backoff)
    
        mut_actual = [-1]  # keep actual count in this mutable
        try:
            check_count(mut_actual)
        except timeout_decorator.TimeoutError:
            pass
        actual = mut_actual[0]
>       assert actual == expected, err.format(terms, expected, actual)
E       AssertionError: queried for [('processor.event', 'transaction'), ('service.name', ['dotnetapp', 'dotnetapp', 'flaskapp', 'flaskapp', 'djangoapp', 'djangoapp', 'expressapp', 'expressapp', 'railsapp', 'railsapp', 'gonethttpapp', 'gonethttpapp', 'springapp', 'springapp'])], expected 7000, got 6999

tests/agent/concurrent_requests.py:132: AssertionError

Steps errors

Expand to view the steps failures

`Run Window tests`

Took 11 min 47 sec . View more details on here

`Compress`

Took 0 min 0 sec . View more details on here
Description: tar --exclude=coverage-files.tgz -czf coverage-files.tgz coverage

`Compress`

Took 0 min 0 sec . View more details on here
Description: tar --exclude=system-tests-linux-files.tgz -czf system-tests-linux-files.tgz system-tests

`Test Sync`

Took 3 min 26 sec . View more details on here
Description: ./.ci/scripts/sync.sh

Log output

Expand to view the last 100 lines of log output

[2021-02-11T08:21:44.674Z] --- PASS: TestRUMErrorSourcemapping (2.70s)
[2021-02-11T08:21:44.674Z] === RUN   TestKeepUnsampled
[2021-02-11T08:21:44.674Z] === RUN   TestKeepUnsampled/false
[2021-02-11T08:21:44.674Z] === RUN   TestKeepUnsampled/true
[2021-02-11T08:21:44.674Z] --- PASS: TestKeepUnsampled (5.14s)
[2021-02-11T08:21:44.674Z]     --- PASS: TestKeepUnsampled/false (2.59s)
[2021-02-11T08:21:44.674Z]     --- PASS: TestKeepUnsampled/true (2.55s)
[2021-02-11T08:21:44.674Z] === RUN   TestKeepUnsampledWarning
[2021-02-11T08:21:44.674Z] --- PASS: TestKeepUnsampledWarning (2.25s)
[2021-02-11T08:21:44.674Z] === RUN   TestTailSampling
[2021-02-11T08:21:44.674Z]     sampling_test.go:135: waiting for 100 "parent" transactions
[2021-02-11T08:21:44.674Z]     sampling_test.go:135: waiting for 100 "child" transactions
[2021-02-11T08:21:44.674Z] --- PASS: TestTailSampling (3.62s)
[2021-02-11T08:21:44.674Z] === RUN   TestTailSamplingUnlicensed
[2021-02-11T08:21:44.674Z] 2021/02/11 08:21:06 Starting container id: f0c0514dab39 image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT
[2021-02-11T08:21:44.674Z] 2021/02/11 08:21:07 Waiting for container id f0c0514dab39 image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT
[2021-02-11T08:21:44.674Z] 2021/02/11 08:21:27 Container is ready id: f0c0514dab39 image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT
[2021-02-11T08:21:44.674Z] --- PASS: TestTailSamplingUnlicensed (31.24s)
[2021-02-11T08:21:44.674Z] PASS
[2021-02-11T08:21:44.674Z] ok  	github.com/elastic/apm-server/systemtest	206.559s
[2021-02-11T08:21:44.674Z] === RUN   TestAPMServer
[2021-02-11T08:21:44.674Z] 2021/02/11 08:18:10 Building apm-server...
[2021-02-11T08:21:44.674Z] 2021/02/11 08:18:13 Built /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4707/src/github.com/elastic/apm-server/apm-server
[2021-02-11T08:21:44.674Z] --- PASS: TestAPMServer (6.10s)
[2021-02-11T08:21:44.674Z] === RUN   TestUnstartedAPMServer
[2021-02-11T08:21:44.674Z] --- PASS: TestUnstartedAPMServer (0.00s)
[2021-02-11T08:21:44.674Z] === RUN   TestAPMServerStartTLS
[2021-02-11T08:21:44.674Z] --- PASS: TestAPMServerStartTLS (0.40s)
[2021-02-11T08:21:44.674Z] === RUN   TestExpvar
[2021-02-11T08:21:44.674Z] --- PASS: TestExpvar (0.43s)
[2021-02-11T08:21:44.674Z] PASS
[2021-02-11T08:21:44.674Z] ok  	github.com/elastic/apm-server/systemtest/apmservertest	6.943s
[2021-02-11T08:21:44.675Z] ?   	github.com/elastic/apm-server/systemtest/estest	[no test files]
[2021-02-11T08:21:44.675Z] ?   	github.com/elastic/apm-server/systemtest/fleettest	[no test files]
[2021-02-11T08:21:44.675Z] + cleanup
[2021-02-11T08:21:44.675Z] + rm -rf /tmp/tmp.TvPUqMIWEv
[2021-02-11T08:21:44.675Z] + .ci/scripts/docker-get-logs.sh
[2021-02-11T08:21:45.715Z] Post stage
[2021-02-11T08:21:45.729Z] Running in /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4707/src/github.com/elastic/apm-server
[2021-02-11T08:21:45.771Z] Archiving artifacts
[2021-02-11T08:21:46.144Z] Recording test results
[2021-02-11T08:21:46.891Z] [Checks API] No suitable checks publisher found.
[2021-02-11T08:21:46.970Z] [WARN] tar: pathPrefix parameter is deprecated.
[2021-02-11T08:21:47.327Z] + tar --version
[2021-02-11T08:21:47.680Z] + tar --exclude=system-tests-linux-files.tgz -czf system-tests-linux-files.tgz system-tests
[2021-02-11T08:21:47.681Z] tar: system-tests: Cannot stat: No such file or directory
[2021-02-11T08:21:47.681Z] tar: Exiting with failure status due to previous errors
[2021-02-11T08:21:47.701Z] [INFO] system-tests-linux-files.tgz was not compressed or archived : script returned exit code 2
[2021-02-11T08:22:32.821Z] tests\system\test_requests.py ...........................                [ 73%]
[2021-02-11T08:22:32.821Z] tests\system\test_setup_index_management.py sssssssssssssss              [ 84%]
[2021-02-11T08:22:32.821Z] tests\system\test_tls.py ssssssssssssssssssssss                          [100%]
[2021-02-11T08:22:32.821Z] 
[2021-02-11T08:22:32.821Z] ============================== warnings summary ===============================
[2021-02-11T08:22:32.821Z] c:\users\jenkin~1.pac\appdata\local\temp\python-env\build\ve\windows\lib\site-packages\_pytest\junitxml.py:446
[2021-02-11T08:22:32.821Z]   c:\users\jenkin~1.pac\appdata\local\temp\python-env\build\ve\windows\lib\site-packages\_pytest\junitxml.py:446: PytestDeprecationWarning: The 'junit_family' default value will change to 'xunit2' in pytest 6.0. See:
[2021-02-11T08:22:32.821Z]     https://docs.pytest.org/en/stable/deprecations.html#junit-family-default-value-change-to-xunit2
[2021-02-11T08:22:32.821Z]   for more information.
[2021-02-11T08:22:32.821Z]     _issue_warning_captured(deprecated.JUNIT_XML_DEFAULT_FAMILY, config.hook, 2)
[2021-02-11T08:22:32.821Z] 
[2021-02-11T08:22:32.821Z] -- Docs: https://docs.pytest.org/en/stable/warnings.html
[2021-02-11T08:22:32.821Z] - generated xml file: C:\Users\jenkins\workspace\pm-server_apm-server-mbp_PR-4707\src\github.com\elastic\apm-server\build\TEST-python-unit.xml -
[2021-02-11T08:22:32.821Z] ============================ slowest 20 durations =============================
[2021-02-11T08:22:32.821Z] 8.09s call     tests/system/test_requests.py::RateLimitTest::test_rate_limit_small_hit
[2021-02-11T08:22:32.821Z] 7.68s call     tests/system/test_requests.py::ClientSideTest::test_ok
[2021-02-11T08:22:32.821Z] 7.68s call     tests/system/test_requests.py::CorsTest::test_ok
[2021-02-11T08:22:32.821Z] 5.33s call     tests/system/test_auth.py::TestAccessDefault::test_full_access
[2021-02-11T08:22:32.821Z] 4.19s call     tests/system/test_auth.py::TestAccessWithSecretToken::test_backend_intake
[2021-02-11T08:22:32.821Z] 4.16s call     tests/system/test_requests.py::CorsTest::test_preflight_bad_headers
[2021-02-11T08:22:32.821Z] 4.13s call     tests/system/test_requests.py::RateLimitTest::test_multiple_ips_rate_limit
[2021-02-11T08:22:32.821Z] 4.09s call     tests/system/test_requests.py::RateLimitTest::test_rate_limit
[2021-02-11T08:22:32.821Z] 3.63s call     tests/system/test_requests.py::RateLimitTest::test_multiple_ips_rate_limit_hit
[2021-02-11T08:22:32.821Z] 3.61s call     tests/system/test_requests.py::RateLimitTest::test_rate_limit_hit
[2021-02-11T08:22:32.821Z] 3.59s call     tests/system/test_requests.py::RateLimitTest::test_rate_limit_only_metadata
[2021-02-11T08:22:32.821Z] 3.18s call     tests/system/test_requests.py::Test::test_gzip
[2021-02-11T08:22:32.821Z] 3.16s call     tests/system/test_requests.py::Test::test_not_existent
[2021-02-11T08:22:32.821Z] 3.16s call     tests/system/test_requests.py::CorsTest::test_bad_origin
[2021-02-11T08:22:32.821Z] 3.16s call     tests/system/test_requests.py::Test::test_validation_fail
[2021-02-11T08:22:32.821Z] 3.16s call     tests/system/test_requests.py::Test::test_gzip_error
[2021-02-11T08:22:32.821Z] 3.16s call     tests/system/test_requests.py::Test::test_empty
[2021-02-11T08:22:32.821Z] 3.15s call     tests/system/test_requests.py::Test::test_rum_default_disabled
[2021-02-11T08:22:32.821Z] 3.15s call     tests/system/test_requests.py::CorsTest::test_preflight
[2021-02-11T08:22:32.821Z] 3.15s call     tests/system/test_requests.py::Test::test_bad_json
[2021-02-11T08:22:32.821Z] =========== 29 passed, 113 skipped, 1 warning in 113.23s (0:01:53) ============
[2021-02-11T08:22:32.821Z] >> python test: Unit Testing Complete
[2021-02-11T08:22:33.727Z] Post stage
[2021-02-11T08:22:33.748Z] Recording test results
[2021-02-11T08:22:34.505Z] [Checks API] No suitable checks publisher found.
[2021-02-11T08:36:22.252Z] [INFO] For detailed information see: https://apm-ci.elastic.co/job/apm-integration-tests-selector-mbp/job/master/13992/display/redirect
[2021-02-11T08:36:23.681Z] Copied 19 artifacts from "APM Integration Test MBP Selector » master" build number 13992
[2021-02-11T08:36:24.521Z] Post stage
[2021-02-11T08:36:24.531Z] Recording test results
[2021-02-11T08:36:25.253Z] [Checks API] No suitable checks publisher found.
[2021-02-11T08:36:25.597Z] Running on Jenkins in /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4707
[2021-02-11T08:36:25.651Z] [INFO] getVaultSecret: Getting secrets
[2021-02-11T08:36:25.807Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2021-02-11T08:36:26.475Z] + chmod 755 generate-build-data.sh
[2021-02-11T08:36:26.475Z] + ./generate-build-data.sh https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4707/ https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4707/runs/9 UNSTABLE 2930048
[2021-02-11T08:36:26.726Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4707/runs/9/steps/?limit=10000 -o steps-info.json
[2021-02-11T08:36:27.276Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4707/runs/9/tests/?status=FAILED -o tests-errors.json
[2021-02-11T08:36:27.276Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4707/runs/9/log/ -o pipeline-log.txt

axw · 2021-02-09T09:37:02Z

jenkins run the tests please

codecov-io · 2021-02-11T01:54:38Z

Codecov Report

Merging #4707 (bb1aa84) into master (ad89a47) will decrease coverage by 0.06%.
The diff coverage is 64.17%.

@@            Coverage Diff             @@
##           master    #4707      +/-   ##
==========================================
- Coverage   76.20%   76.14%   -0.07%     
==========================================
  Files         163      164       +1     
  Lines        9898     9935      +37     
==========================================
+ Hits         7543     7565      +22     
- Misses       2355     2370      +15

Impacted Files	Coverage Δ
beater/server.go	`63.33% <ø> (ø)`
x-pack/apm-server/main.go	`0.00% <ø> (ø)`
x-pack/apm-server/sampling/pubsub/datastream.go	`0.00% <0.00%> (ø)`
x-pack/apm-server/sampling/pubsub/pubsub.go	`87.61% <95.00%> (+2.47%)`	⬆️
beater/beater.go	`69.81% <100.00%> (+0.22%)`	⬆️
x-pack/apm-server/sampling/config.go	`100.00% <100.00%> (ø)`
x-pack/apm-server/sampling/processor.go	`79.54% <100.00%> (ø)`
x-pack/apm-server/sampling/pubsub/config.go	`100.00% <100.00%> (ø)`
...ack/apm-server/aggregation/txmetrics/aggregator.go	`93.36% <0.00%> (ø)`
... and 2 more

x-pack/apm-server/main.go

x-pack/apm-server/sampling/pubsub/datastream.go

axw · 2021-02-11T08:51:05Z

jenkins run the tests please

axw · 2021-02-11T08:51:52Z

On second thoughts, merging - the apm-it failure is clearly not related to this. It's been flaky recently.

* beater: add Managed and Namespace to ServerParams This enables x-pack/apm-server code to alter behaviour based on whether APM Server is managed or not, and to create data streams with the configured namespace. * sampling: use a data stream for sampled trace docs Update tail-based sampling to index into and search a data stream. The data stream will be associated with an ILM policy that takes care of rollover and deletion. When running in Fleet-managed mode, apm-server will expect the data stream and ILM policy to exist for the data stream called `traces-sampled-<namespace>`. Servers participating in tail-based sampling are required to be configured with the same namespace. When running in standalone mode, apm-server will attempt to create an index template and ILM policy for a data stream called `apm-sampled-traces`. This is added for minimal support while we transition things to Fleet, and is intended to be removed in a future release. The data stream is not intended to adhere to the standard indexing strategy. * apmpackage: add traces-sampled-* data stream Add a data stream for sampled trace documents, along with an ILM policy which rolls over after 1h, and then deletes after 1h. * systemtest: fetch most recent beats monitoring doc When searching for beats monitoring docs, make sure we get the most recent one by sorting on 'timestamp'. * systemtest: update tail-based sampling test Update test to rely on apm-server to create its own data stream index template. * Cross-reference sampling/pubsub and apmpackage # Conflicts: # changelogs/head.asciidoc # systemtest/elasticsearch.go

axw changed the title ~~Sampling datastream~~ sampling: use a data stream for sampled trace docs Feb 9, 2021

axw added 5 commits February 9, 2021 14:09

beater: add Managed and Namespace to ServerParams

edda88c

This enables x-pack/apm-server code to alter behaviour based on whether APM Server is managed or not, and to create data streams with the configured namespace.

apmpackage: add traces-sampled-* data stream

a27de4c

Add a data stream for sampled trace documents, along with an ILM policy which rolls over after 1h, and then deletes after 1h.

systemtest: fetch most recent beats monitoring doc

6b5f863

When searching for beats monitoring docs, make sure we get the most recent one by sorting on 'timestamp'.

systemtest: update tail-based sampling test

8b6a31d

Update test to rely on apm-server to create its own data stream index template.

axw force-pushed the sampling-datastream branch from 9e2ce55 to 8b6a31d Compare February 9, 2021 06:09

Merge branch 'master' into sampling-datastream

f22cadb

axw marked this pull request as ready for review February 9, 2021 07:24

axw requested a review from a team February 9, 2021 07:24

Merge branch 'master' into sampling-datastream

bb1aa84

Merge branch 'master' into sampling-datastream

93d618d

simitt approved these changes Feb 11, 2021

View reviewed changes

x-pack/apm-server/main.go Show resolved Hide resolved

x-pack/apm-server/sampling/pubsub/datastream.go Show resolved Hide resolved

Cross-reference sampling/pubsub and apmpackage

548dd7b

axw merged commit 0c7bd22 into elastic:master Feb 11, 2021

axw deleted the sampling-datastream branch February 11, 2021 08:52

axw mentioned this pull request Feb 19, 2021

[7.x] sampling: use a data stream for sampled trace docs (#4707) #4819

Merged

axw mentioned this pull request Feb 19, 2021

[7.12] sampling: use a data stream for sampled trace docs (#4707) #4820

Merged

jalvz assigned jalvz and unassigned jalvz Feb 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sampling: use a data stream for sampled trace docs #4707

sampling: use a data stream for sampled trace docs #4707

axw commented Feb 9, 2021 •

edited

Loading

apmmachine commented Feb 9, 2021 •

edited

Loading

Build stats

Test stats 🧪

Trends 🧪

`Build and Test / APM Integration Tests / test_conc_req_all_agents – tests.agent.test_multiple_agents`

`Run Window tests`

`Compress`

`Compress`

`Test Sync`

axw commented Feb 9, 2021

codecov-io commented Feb 11, 2021

axw commented Feb 11, 2021

axw commented Feb 11, 2021

sampling: use a data stream for sampled trace docs #4707

sampling: use a data stream for sampled trace docs #4707

Conversation

axw commented Feb 9, 2021 • edited Loading

Motivation/summary

Checklist

How to test these changes

Related issues

apmmachine commented Feb 9, 2021 • edited Loading

💔 Tests Failed

Build stats

Test stats 🧪

Trends 🧪

Test errors

Build and Test / APM Integration Tests / test_conc_req_all_agents – tests.agent.test_multiple_agents

Steps errors

Run Window tests

Compress

Compress

Test Sync

Log output

axw commented Feb 9, 2021

codecov-io commented Feb 11, 2021

Codecov Report

axw commented Feb 11, 2021

axw commented Feb 11, 2021

axw commented Feb 9, 2021 •

edited

Loading

apmmachine commented Feb 9, 2021 •

edited

Loading

`Build and Test / APM Integration Tests / test_conc_req_all_agents – tests.agent.test_multiple_agents`

`Run Window tests`

`Compress`

`Compress`

`Test Sync`