[8.15] [Automatic Import] Reproducible sampling of log entries (#191598) #192507

kibanamachine · 2024-09-10T18:04:33Z

Backport

This will backport the following commits from main to 8.15:

[Automatic Import] Reproducible sampling of log entries (#191598)

Questions ?

Please refer to the Backport tool documentation

## Release note Automatic Import now performs reproducible sampling from the list of log entries instead of just truncating them. ## Summary When the user uploads a log sample that is too large for us to handle, we would previously simply truncate it at `MaxLogsSampleRows` entries. With this change, we perform a reproducible random sampling instead. User notification remains the same ("truncated") for now (also it's also translated into different languages). This sampling process: 1. Keeps the first entry as-is for header detection. 2. Selects at random remaining entries from the list. 3. Shuffles the entries other than the first one (even if there are less entries than `MaxLogsSampleRows`). 4. Is reproducible since the random seed is fixed. **Sampling** allows us to extract more information from the user-provided data compared to truncation, while **reproducibility** is important to be able to provide customer support. This brings us another step towards the implementation of elastic/security-team#9844 ### Risk Matrix | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Behaviour of `seedrandom` package changes in the future, breaking the tests | Low | Low | This package is also already used in Kibana | | Users misunderstand how the sampling works and upload non-anonymized data expecting that only the first rows are sent to the LLM | Low | Low | We should change the text in a future PR | --------- Co-authored-by: Elastic Machine <[email protected]> (cherry picked from commit 444fc48)

kibana-ci · 2024-09-10T19:30:05Z

💚 Build Succeeded

Buildkite Build
Commit: 900ab30

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`integrationAssistant`	547	556	+9

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`integrationAssistant`	939.5KB	947.3KB	+7.8KB

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ilyannn

kibanamachine assigned ilyannn Sep 10, 2024

kibanamachine added the backport label Sep 10, 2024

kibanamachine enabled auto-merge (squash) September 10, 2024 18:04

kibanamachine mentioned this pull request Sep 10, 2024

[Automatic Import] Reproducible sampling of log entries #191598

Merged

2 tasks

github-actions bot approved these changes Sep 10, 2024

View reviewed changes

kibanamachine merged commit 3671c5d into elastic:8.15 Sep 10, 2024
20 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8.15] [Automatic Import] Reproducible sampling of log entries (#191598) #192507

[8.15] [Automatic Import] Reproducible sampling of log entries (#191598) #192507

kibanamachine commented Sep 10, 2024

kibana-ci commented Sep 10, 2024

[8.15] [Automatic Import] Reproducible sampling of log entries (#191598) #192507

[8.15] [Automatic Import] Reproducible sampling of log entries (#191598) #192507

Conversation

kibanamachine commented Sep 10, 2024

Backport

Questions ?

kibana-ci commented Sep 10, 2024

💚 Build Succeeded

Metrics [docs]

Module Count

Async chunks