Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sampling: use a data stream for sampled trace docs #4707

Merged
merged 9 commits into from
Feb 11, 2021

Conversation

axw
Copy link
Member

@axw axw commented Feb 9, 2021

Motivation/summary

Update tail-based sampling to index into and search a data stream. The data stream will be associated with an ILM policy that takes care of rollover and deletion.

When running in Fleet-managed mode, apm-server will expect the data stream and ILM policy to exist for the data stream called traces-sampled-<namespace>. Servers participating in tail-based sampling are required to be configured with the same namespace.

When running in standalone mode, apm-server will attempt to create an index template and ILM policy for a data stream called apm-sampled-traces. This is added for minimal support while we transition things to Fleet, and is intended to be removed in a future release. The data stream is not intended to adhere to the standard indexing strategy.

Checklist

How to test these changes

cd systemtest && go test -v -run TailSampling

For manual testing:

  • see Tail-sampling: transactions are being duplicated #4584 for how to test the fix for original issue
  • run apm-server with tail sampling enabled:
    • ensure an index template and ILM policy both called "apm-sampled-traces" are created
    • ensure a data stream called "apm-sampled-traces"

It's not currently possible to configure tail sampling with Fleet, so no testing steps are described for those parts. Just check that installing the package leads to the "sampled_traces" data stream index template, ILM policy, and ingest pipeline being installed.

Related issues

Closes #4584

@axw axw changed the title Sampling datastream sampling: use a data stream for sampled trace docs Feb 9, 2021
This enables x-pack/apm-server code to alter behaviour
based on whether APM Server is managed or not, and to
create data streams with the configured namespace.
Update tail-based sampling to index into and search a
data stream. The data stream will be associated with an
ILM policy that takes care of rollover and deletion.

When running in Fleet-managed mode, apm-server will expect
the data stream and ILM policy to exist for the data stream
called `traces-sampled-<namespace>`. Servers participating
in tail-based sampling are required to be configured with
the same namespace.

When running in standalone mode, apm-server will attempt
to create an index template and ILM policy for a data
stream called `apm-sampled-traces`. This is added for
minimal support while we transition things to Fleet, and
is intended to be removed in a future release. The data
stream is not intended to adhere to the standard indexing
strategy.
Add a data stream for sampled trace documents,
along with an ILM policy which rolls over after
1h, and then deletes after 1h.
When searching for beats monitoring docs, make sure
we get the most recent one by sorting on 'timestamp'.
Update test to rely on apm-server to create its
own data stream index template.
@apmmachine
Copy link
Contributor

apmmachine commented Feb 9, 2021

💔 Tests Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #4707 updated

  • Start Time: 2021-02-11T07:47:36.164+0000

  • Duration: 49 min 50 sec

  • Commit: 548dd7b

Test stats 🧪

Test Results
Failed 1
Passed 4720
Skipped 124
Total 4845

Trends 🧪

Image of Build Times

Image of Tests

Test errors 1

Expand to view the tests failures

Build and Test / APM Integration Tests / test_conc_req_all_agents – tests.agent.test_multiple_agents
    Expand to view the error details

     AssertionError: queried for [('processor.event', 'transaction'), ('service.name', ['dotnetapp', 'dotnetapp', 'flaskapp', 'flaskapp', 'djangoapp', 'djangoapp', 'expressapp', 'expressapp', 'railsapp', 'railsapp', 'gonethttpapp', 'gonethttpapp', 'springapp', 'springapp'])], expected 7000, got 6999 
    

    Expand to view the stacktrace

     es = <tests.fixtures.es.es.<locals>.Elasticsearch object at 0x7f2bef34e190>
    apm_server = <tests.fixtures.apm_server.apm_server.<locals>.APMServer object at 0x7f2bef34e4d0>
    flask = <tests.fixtures.agents.Agent object at 0x7f2bb47ad490>
    django = <tests.fixtures.agents.Agent object at 0x7f2bef99b950>
    dotnet = <tests.fixtures.agents.Agent object at 0x7f2bef34e1d0>
    express = <tests.fixtures.agents.Agent object at 0x7f2bb47b3990>
    rails = <tests.fixtures.agents.Agent object at 0x7f2bef34ffd0>
    go_nethttp = <tests.fixtures.agents.Agent object at 0x7f2bec09fcd0>
    java_spring = <tests.fixtures.agents.Agent object at 0x7f2bd445f810>
    
        def test_conc_req_all_agents(es, apm_server, flask, django, dotnet, express, rails, go_nethttp, java_spring):
            dotnet_f = Concurrent.Endpoint(dotnet.foo.url,
                                           dotnet.app_name,
                                           ["foo"],
                                           "GET /foo",
                                           events_no=500)
            dotnet_b = Concurrent.Endpoint(dotnet.bar.url,
                                           dotnet.app_name,
                                           ["bar", "extra"],
                                           "GET /bar",
                                           events_no=500)
            flask_f = Concurrent.Endpoint(flask.foo.url,
                                          flask.app_name,
                                          ["app.foo"],
                                          "GET /foo",
                                          events_no=500)
            flask_b = Concurrent.Endpoint(flask.bar.url,
                                          flask.app_name,
                                          ["app.bar", "app.extra"],
                                          "GET /bar",
                                          events_no=500)
            django_f = Concurrent.Endpoint(django.foo.url,
                                           django.app_name,
                                           ["foo.views.foo"],
                                           "GET foo.views.show",
                                           events_no=500)
            django_b = Concurrent.Endpoint(django.bar.url,
                                           django.app_name,
                                           ["bar.views.bar", "bar.views.extra"],
                                           "GET bar.views.show",
                                           events_no=500)
            express_f = Concurrent.Endpoint(express.foo.url,
                                            express.app_name,
                                            ["app.foo"],
                                            "GET /foo",
                                            events_no=500)
            express_b = Concurrent.Endpoint(express.bar.url,
                                            express.app_name,
                                            ["app.bar", "app.extra"],
                                            "GET /bar",
                                            events_no=500)
            rails_f = Concurrent.Endpoint(rails.foo.url,
                                          rails.app_name,
                                          ["ApplicationController#foo"],
                                          "ApplicationController#foo",
                                          events_no=500)
            rails_b = Concurrent.Endpoint(rails.bar.url,
                                          rails.app_name,
                                          ["ApplicationController#bar", "app.extra"],
                                          "ApplicationController#bar",
                                          events_no=500)
            go_nethttp_f = Concurrent.Endpoint(go_nethttp.foo.url,
                                               go_nethttp.app_name,
                                               ["foo"],
                                               "GET /foo",
                                               events_no=500)
            go_nethttp_b = Concurrent.Endpoint(go_nethttp.bar.url,
                                               go_nethttp.app_name,
                                               ["bar", "extra"],
                                               "GET /bar",
                                               events_no=500)
            java_spring_f = Concurrent.Endpoint(java_spring.foo.url,
                                                java_spring.app_name,
                                                ["foo"],
                                                "GreetingController#foo",
                                                events_no=500)
            java_spring_b = Concurrent.Endpoint(java_spring.bar.url,
                                                java_spring.app_name,
                                                ["bar", "extra"],
                                                "GreetingController#bar",
                                                events_no=500)
        
            Concurrent(es, [
                dotnet_f, dotnet_b,
                flask_f, flask_b,
                django_f, django_b,
                express_f, express_b,
                rails_b, rails_f,
                go_nethttp_f, go_nethttp_b,
                java_spring_f, java_spring_b,
    >       ], iters=1).run()
    
    tests/agent/test_multiple_agents.py:84: 
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    tests/agent/concurrent_requests.py:254: in run
        self.check_counts(it)
    tests/agent/concurrent_requests.py:138: in check_counts
        assert_count([("processor.event", "transaction"), ("service.name", service_names)], transactions_count)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    
    terms = [('processor.event', 'transaction'), ('service.name', ['dotnetapp', 'dotnetapp', 'flaskapp', 'flaskapp', 'djangoapp', 'djangoapp', ...])]
    expected = 7000
    
        def assert_count(terms, expected):
            """wait a bit for doc count to reach expectation"""
            @timeout_decorator.timeout(max_wait)
            def check_count(mut_actual):
                while True:
                    rsp = self.es.count(index=self.index, body=self.elasticsearch.term_q(terms))
                    mut_actual[0] = rsp["count"]
                    if mut_actual[0] >= expected:
                        return
                    time.sleep(backoff)
        
            mut_actual = [-1]  # keep actual count in this mutable
            try:
                check_count(mut_actual)
            except timeout_decorator.TimeoutError:
                pass
            actual = mut_actual[0]
    >       assert actual == expected, err.format(terms, expected, actual)
    E       AssertionError: queried for [('processor.event', 'transaction'), ('service.name', ['dotnetapp', 'dotnetapp', 'flaskapp', 'flaskapp', 'djangoapp', 'djangoapp', 'expressapp', 'expressapp', 'railsapp', 'railsapp', 'gonethttpapp', 'gonethttpapp', 'springapp', 'springapp'])], expected 7000, got 6999
    
    tests/agent/concurrent_requests.py:132: AssertionError 
    

Steps errors 4

Expand to view the steps failures

Run Window tests
  • Took 11 min 47 sec . View more details on here
Compress
  • Took 0 min 0 sec . View more details on here
  • Description: tar --exclude=coverage-files.tgz -czf coverage-files.tgz coverage
Compress
  • Took 0 min 0 sec . View more details on here
  • Description: tar --exclude=system-tests-linux-files.tgz -czf system-tests-linux-files.tgz system-tests
Test Sync
  • Took 3 min 26 sec . View more details on here
  • Description: ./.ci/scripts/sync.sh

Log output

Expand to view the last 100 lines of log output

[2021-02-11T08:21:44.674Z] --- PASS: TestRUMErrorSourcemapping (2.70s)
[2021-02-11T08:21:44.674Z] === RUN   TestKeepUnsampled
[2021-02-11T08:21:44.674Z] === RUN   TestKeepUnsampled/false
[2021-02-11T08:21:44.674Z] === RUN   TestKeepUnsampled/true
[2021-02-11T08:21:44.674Z] --- PASS: TestKeepUnsampled (5.14s)
[2021-02-11T08:21:44.674Z]     --- PASS: TestKeepUnsampled/false (2.59s)
[2021-02-11T08:21:44.674Z]     --- PASS: TestKeepUnsampled/true (2.55s)
[2021-02-11T08:21:44.674Z] === RUN   TestKeepUnsampledWarning
[2021-02-11T08:21:44.674Z] --- PASS: TestKeepUnsampledWarning (2.25s)
[2021-02-11T08:21:44.674Z] === RUN   TestTailSampling
[2021-02-11T08:21:44.674Z]     sampling_test.go:135: waiting for 100 "parent" transactions
[2021-02-11T08:21:44.674Z]     sampling_test.go:135: waiting for 100 "child" transactions
[2021-02-11T08:21:44.674Z] --- PASS: TestTailSampling (3.62s)
[2021-02-11T08:21:44.674Z] === RUN   TestTailSamplingUnlicensed
[2021-02-11T08:21:44.674Z] 2021/02/11 08:21:06 Starting container id: f0c0514dab39 image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT
[2021-02-11T08:21:44.674Z] 2021/02/11 08:21:07 Waiting for container id f0c0514dab39 image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT
[2021-02-11T08:21:44.674Z] 2021/02/11 08:21:27 Container is ready id: f0c0514dab39 image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT
[2021-02-11T08:21:44.674Z] --- PASS: TestTailSamplingUnlicensed (31.24s)
[2021-02-11T08:21:44.674Z] PASS
[2021-02-11T08:21:44.674Z] ok  	github.com/elastic/apm-server/systemtest	206.559s
[2021-02-11T08:21:44.674Z] === RUN   TestAPMServer
[2021-02-11T08:21:44.674Z] 2021/02/11 08:18:10 Building apm-server...
[2021-02-11T08:21:44.674Z] 2021/02/11 08:18:13 Built /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4707/src/github.com/elastic/apm-server/apm-server
[2021-02-11T08:21:44.674Z] --- PASS: TestAPMServer (6.10s)
[2021-02-11T08:21:44.674Z] === RUN   TestUnstartedAPMServer
[2021-02-11T08:21:44.674Z] --- PASS: TestUnstartedAPMServer (0.00s)
[2021-02-11T08:21:44.674Z] === RUN   TestAPMServerStartTLS
[2021-02-11T08:21:44.674Z] --- PASS: TestAPMServerStartTLS (0.40s)
[2021-02-11T08:21:44.674Z] === RUN   TestExpvar
[2021-02-11T08:21:44.674Z] --- PASS: TestExpvar (0.43s)
[2021-02-11T08:21:44.674Z] PASS
[2021-02-11T08:21:44.674Z] ok  	github.com/elastic/apm-server/systemtest/apmservertest	6.943s
[2021-02-11T08:21:44.675Z] ?   	github.com/elastic/apm-server/systemtest/estest	[no test files]
[2021-02-11T08:21:44.675Z] ?   	github.com/elastic/apm-server/systemtest/fleettest	[no test files]
[2021-02-11T08:21:44.675Z] + cleanup
[2021-02-11T08:21:44.675Z] + rm -rf /tmp/tmp.TvPUqMIWEv
[2021-02-11T08:21:44.675Z] + .ci/scripts/docker-get-logs.sh
[2021-02-11T08:21:45.715Z] Post stage
[2021-02-11T08:21:45.729Z] Running in /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4707/src/github.com/elastic/apm-server
[2021-02-11T08:21:45.771Z] Archiving artifacts
[2021-02-11T08:21:46.144Z] Recording test results
[2021-02-11T08:21:46.891Z] [Checks API] No suitable checks publisher found.
[2021-02-11T08:21:46.970Z] [WARN] tar: pathPrefix parameter is deprecated.
[2021-02-11T08:21:47.327Z] + tar --version
[2021-02-11T08:21:47.680Z] + tar --exclude=system-tests-linux-files.tgz -czf system-tests-linux-files.tgz system-tests
[2021-02-11T08:21:47.681Z] tar: system-tests: Cannot stat: No such file or directory
[2021-02-11T08:21:47.681Z] tar: Exiting with failure status due to previous errors
[2021-02-11T08:21:47.701Z] [INFO] system-tests-linux-files.tgz was not compressed or archived : script returned exit code 2
[2021-02-11T08:22:32.821Z] tests\system\test_requests.py ...........................                [ 73%]
[2021-02-11T08:22:32.821Z] tests\system\test_setup_index_management.py sssssssssssssss              [ 84%]
[2021-02-11T08:22:32.821Z] tests\system\test_tls.py ssssssssssssssssssssss                          [100%]
[2021-02-11T08:22:32.821Z] 
[2021-02-11T08:22:32.821Z] ============================== warnings summary ===============================
[2021-02-11T08:22:32.821Z] c:\users\jenkin~1.pac\appdata\local\temp\python-env\build\ve\windows\lib\site-packages\_pytest\junitxml.py:446
[2021-02-11T08:22:32.821Z]   c:\users\jenkin~1.pac\appdata\local\temp\python-env\build\ve\windows\lib\site-packages\_pytest\junitxml.py:446: PytestDeprecationWarning: The 'junit_family' default value will change to 'xunit2' in pytest 6.0. See:
[2021-02-11T08:22:32.821Z]     https://docs.pytest.org/en/stable/deprecations.html#junit-family-default-value-change-to-xunit2
[2021-02-11T08:22:32.821Z]   for more information.
[2021-02-11T08:22:32.821Z]     _issue_warning_captured(deprecated.JUNIT_XML_DEFAULT_FAMILY, config.hook, 2)
[2021-02-11T08:22:32.821Z] 
[2021-02-11T08:22:32.821Z] -- Docs: https://docs.pytest.org/en/stable/warnings.html
[2021-02-11T08:22:32.821Z] - generated xml file: C:\Users\jenkins\workspace\pm-server_apm-server-mbp_PR-4707\src\github.com\elastic\apm-server\build\TEST-python-unit.xml -
[2021-02-11T08:22:32.821Z] ============================ slowest 20 durations =============================
[2021-02-11T08:22:32.821Z] 8.09s call     tests/system/test_requests.py::RateLimitTest::test_rate_limit_small_hit
[2021-02-11T08:22:32.821Z] 7.68s call     tests/system/test_requests.py::ClientSideTest::test_ok
[2021-02-11T08:22:32.821Z] 7.68s call     tests/system/test_requests.py::CorsTest::test_ok
[2021-02-11T08:22:32.821Z] 5.33s call     tests/system/test_auth.py::TestAccessDefault::test_full_access
[2021-02-11T08:22:32.821Z] 4.19s call     tests/system/test_auth.py::TestAccessWithSecretToken::test_backend_intake
[2021-02-11T08:22:32.821Z] 4.16s call     tests/system/test_requests.py::CorsTest::test_preflight_bad_headers
[2021-02-11T08:22:32.821Z] 4.13s call     tests/system/test_requests.py::RateLimitTest::test_multiple_ips_rate_limit
[2021-02-11T08:22:32.821Z] 4.09s call     tests/system/test_requests.py::RateLimitTest::test_rate_limit
[2021-02-11T08:22:32.821Z] 3.63s call     tests/system/test_requests.py::RateLimitTest::test_multiple_ips_rate_limit_hit
[2021-02-11T08:22:32.821Z] 3.61s call     tests/system/test_requests.py::RateLimitTest::test_rate_limit_hit
[2021-02-11T08:22:32.821Z] 3.59s call     tests/system/test_requests.py::RateLimitTest::test_rate_limit_only_metadata
[2021-02-11T08:22:32.821Z] 3.18s call     tests/system/test_requests.py::Test::test_gzip
[2021-02-11T08:22:32.821Z] 3.16s call     tests/system/test_requests.py::Test::test_not_existent
[2021-02-11T08:22:32.821Z] 3.16s call     tests/system/test_requests.py::CorsTest::test_bad_origin
[2021-02-11T08:22:32.821Z] 3.16s call     tests/system/test_requests.py::Test::test_validation_fail
[2021-02-11T08:22:32.821Z] 3.16s call     tests/system/test_requests.py::Test::test_gzip_error
[2021-02-11T08:22:32.821Z] 3.16s call     tests/system/test_requests.py::Test::test_empty
[2021-02-11T08:22:32.821Z] 3.15s call     tests/system/test_requests.py::Test::test_rum_default_disabled
[2021-02-11T08:22:32.821Z] 3.15s call     tests/system/test_requests.py::CorsTest::test_preflight
[2021-02-11T08:22:32.821Z] 3.15s call     tests/system/test_requests.py::Test::test_bad_json
[2021-02-11T08:22:32.821Z] =========== 29 passed, 113 skipped, 1 warning in 113.23s (0:01:53) ============
[2021-02-11T08:22:32.821Z] >> python test: Unit Testing Complete
[2021-02-11T08:22:33.727Z] Post stage
[2021-02-11T08:22:33.748Z] Recording test results
[2021-02-11T08:22:34.505Z] [Checks API] No suitable checks publisher found.
[2021-02-11T08:36:22.252Z] [INFO] For detailed information see: https://apm-ci.elastic.co/job/apm-integration-tests-selector-mbp/job/master/13992/display/redirect
[2021-02-11T08:36:23.681Z] Copied 19 artifacts from "APM Integration Test MBP Selector » master" build number 13992
[2021-02-11T08:36:24.521Z] Post stage
[2021-02-11T08:36:24.531Z] Recording test results
[2021-02-11T08:36:25.253Z] [Checks API] No suitable checks publisher found.
[2021-02-11T08:36:25.597Z] Running on Jenkins in /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4707
[2021-02-11T08:36:25.651Z] [INFO] getVaultSecret: Getting secrets
[2021-02-11T08:36:25.807Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2021-02-11T08:36:26.475Z] + chmod 755 generate-build-data.sh
[2021-02-11T08:36:26.475Z] + ./generate-build-data.sh https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4707/ https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4707/runs/9 UNSTABLE 2930048
[2021-02-11T08:36:26.726Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4707/runs/9/steps/?limit=10000 -o steps-info.json
[2021-02-11T08:36:27.276Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4707/runs/9/tests/?status=FAILED -o tests-errors.json
[2021-02-11T08:36:27.276Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4707/runs/9/log/ -o pipeline-log.txt

@axw axw marked this pull request as ready for review February 9, 2021 07:24
@axw axw requested a review from a team February 9, 2021 07:24
@axw
Copy link
Member Author

axw commented Feb 9, 2021

jenkins run the tests please

@codecov-io
Copy link

Codecov Report

Merging #4707 (bb1aa84) into master (ad89a47) will decrease coverage by 0.06%.
The diff coverage is 64.17%.

@@            Coverage Diff             @@
##           master    #4707      +/-   ##
==========================================
- Coverage   76.20%   76.14%   -0.07%     
==========================================
  Files         163      164       +1     
  Lines        9898     9935      +37     
==========================================
+ Hits         7543     7565      +22     
- Misses       2355     2370      +15     
Impacted Files Coverage Δ
beater/server.go 63.33% <ø> (ø)
x-pack/apm-server/main.go 0.00% <ø> (ø)
x-pack/apm-server/sampling/pubsub/datastream.go 0.00% <0.00%> (ø)
x-pack/apm-server/sampling/pubsub/pubsub.go 87.61% <95.00%> (+2.47%) ⬆️
beater/beater.go 69.81% <100.00%> (+0.22%) ⬆️
x-pack/apm-server/sampling/config.go 100.00% <100.00%> (ø)
x-pack/apm-server/sampling/processor.go 79.54% <100.00%> (ø)
x-pack/apm-server/sampling/pubsub/config.go 100.00% <100.00%> (ø)
...ack/apm-server/aggregation/txmetrics/aggregator.go 93.36% <0.00%> (ø)
... and 2 more

x-pack/apm-server/main.go Show resolved Hide resolved
@axw
Copy link
Member Author

axw commented Feb 11, 2021

jenkins run the tests please

@axw
Copy link
Member Author

axw commented Feb 11, 2021

On second thoughts, merging - the apm-it failure is clearly not related to this. It's been flaky recently.

@axw axw merged commit 0c7bd22 into elastic:master Feb 11, 2021
@axw axw deleted the sampling-datastream branch February 11, 2021 08:52
axw added a commit to axw/apm-server that referenced this pull request Feb 19, 2021
* beater: add Managed and Namespace to ServerParams

This enables x-pack/apm-server code to alter behaviour
based on whether APM Server is managed or not, and to
create data streams with the configured namespace.

* sampling: use a data stream for sampled trace docs

Update tail-based sampling to index into and search a
data stream. The data stream will be associated with an
ILM policy that takes care of rollover and deletion.

When running in Fleet-managed mode, apm-server will expect
the data stream and ILM policy to exist for the data stream
called `traces-sampled-<namespace>`. Servers participating
in tail-based sampling are required to be configured with
the same namespace.

When running in standalone mode, apm-server will attempt
to create an index template and ILM policy for a data
stream called `apm-sampled-traces`. This is added for
minimal support while we transition things to Fleet, and
is intended to be removed in a future release. The data
stream is not intended to adhere to the standard indexing
strategy.

* apmpackage: add traces-sampled-* data stream

Add a data stream for sampled trace documents,
along with an ILM policy which rolls over after
1h, and then deletes after 1h.

* systemtest: fetch most recent beats monitoring doc

When searching for beats monitoring docs, make sure
we get the most recent one by sorting on 'timestamp'.

* systemtest: update tail-based sampling test

Update test to rely on apm-server to create its
own data stream index template.

* Cross-reference sampling/pubsub and apmpackage
# Conflicts:
#	changelogs/head.asciidoc
#	systemtest/elasticsearch.go
axw added a commit to axw/apm-server that referenced this pull request Feb 19, 2021
* beater: add Managed and Namespace to ServerParams

This enables x-pack/apm-server code to alter behaviour
based on whether APM Server is managed or not, and to
create data streams with the configured namespace.

* sampling: use a data stream for sampled trace docs

Update tail-based sampling to index into and search a
data stream. The data stream will be associated with an
ILM policy that takes care of rollover and deletion.

When running in Fleet-managed mode, apm-server will expect
the data stream and ILM policy to exist for the data stream
called `traces-sampled-<namespace>`. Servers participating
in tail-based sampling are required to be configured with
the same namespace.

When running in standalone mode, apm-server will attempt
to create an index template and ILM policy for a data
stream called `apm-sampled-traces`. This is added for
minimal support while we transition things to Fleet, and
is intended to be removed in a future release. The data
stream is not intended to adhere to the standard indexing
strategy.

* apmpackage: add traces-sampled-* data stream

Add a data stream for sampled trace documents,
along with an ILM policy which rolls over after
1h, and then deletes after 1h.

* systemtest: fetch most recent beats monitoring doc

When searching for beats monitoring docs, make sure
we get the most recent one by sorting on 'timestamp'.

* systemtest: update tail-based sampling test

Update test to rely on apm-server to create its
own data stream index template.

* Cross-reference sampling/pubsub and apmpackage
# Conflicts:
#	changelogs/head.asciidoc
#	systemtest/elasticsearch.go
axw added a commit that referenced this pull request Feb 19, 2021
* beater: add Managed and Namespace to ServerParams

This enables x-pack/apm-server code to alter behaviour
based on whether APM Server is managed or not, and to
create data streams with the configured namespace.

* sampling: use a data stream for sampled trace docs

Update tail-based sampling to index into and search a
data stream. The data stream will be associated with an
ILM policy that takes care of rollover and deletion.

When running in Fleet-managed mode, apm-server will expect
the data stream and ILM policy to exist for the data stream
called `traces-sampled-<namespace>`. Servers participating
in tail-based sampling are required to be configured with
the same namespace.

When running in standalone mode, apm-server will attempt
to create an index template and ILM policy for a data
stream called `apm-sampled-traces`. This is added for
minimal support while we transition things to Fleet, and
is intended to be removed in a future release. The data
stream is not intended to adhere to the standard indexing
strategy.

* apmpackage: add traces-sampled-* data stream

Add a data stream for sampled trace documents,
along with an ILM policy which rolls over after
1h, and then deletes after 1h.

* systemtest: fetch most recent beats monitoring doc

When searching for beats monitoring docs, make sure
we get the most recent one by sorting on 'timestamp'.

* systemtest: update tail-based sampling test

Update test to rely on apm-server to create its
own data stream index template.

* Cross-reference sampling/pubsub and apmpackage
# Conflicts:
#	changelogs/head.asciidoc
#	systemtest/elasticsearch.go
axw added a commit that referenced this pull request Feb 19, 2021
* beater: add Managed and Namespace to ServerParams

This enables x-pack/apm-server code to alter behaviour
based on whether APM Server is managed or not, and to
create data streams with the configured namespace.

* sampling: use a data stream for sampled trace docs

Update tail-based sampling to index into and search a
data stream. The data stream will be associated with an
ILM policy that takes care of rollover and deletion.

When running in Fleet-managed mode, apm-server will expect
the data stream and ILM policy to exist for the data stream
called `traces-sampled-<namespace>`. Servers participating
in tail-based sampling are required to be configured with
the same namespace.

When running in standalone mode, apm-server will attempt
to create an index template and ILM policy for a data
stream called `apm-sampled-traces`. This is added for
minimal support while we transition things to Fleet, and
is intended to be removed in a future release. The data
stream is not intended to adhere to the standard indexing
strategy.

* apmpackage: add traces-sampled-* data stream

Add a data stream for sampled trace documents,
along with an ILM policy which rolls over after
1h, and then deletes after 1h.

* systemtest: fetch most recent beats monitoring doc

When searching for beats monitoring docs, make sure
we get the most recent one by sorting on 'timestamp'.

* systemtest: update tail-based sampling test

Update test to rely on apm-server to create its
own data stream index template.

* Cross-reference sampling/pubsub and apmpackage
# Conflicts:
#	changelogs/head.asciidoc
#	systemtest/elasticsearch.go
@jalvz jalvz assigned jalvz and unassigned jalvz Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tail-sampling: transactions are being duplicated
5 participants