-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution] Stream-based installation of the package with prebuilt rules #192350
Closed
Tracked by
#174168
Labels
8.17 candidate
Feature:Prebuilt Detection Rules
Security Solution Prebuilt Detection Rules area
performance
Team:Detection Rule Management
Security Detection Rule Management Team
Team:Detections and Resp
Security Detection Response Team
Team:Fleet
Team label for Observability Data Collection Fleet team
Team: SecuritySolution
Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
v8.17.0
Comments
banderror
added
8.16 candidate
Feature:Prebuilt Detection Rules
Security Solution Prebuilt Detection Rules area
performance
Team: SecuritySolution
Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Team:Detection Rule Management
Security Detection Rule Management Team
Team:Detections and Resp
Security Detection Response Team
labels
Sep 9, 2024
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
Pinging @elastic/security-solution (Team: SecuritySolution) |
Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management) |
xcrzx
removed
performance
Team:Detections and Resp
Security Detection Response Team
Team: SecuritySolution
Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Team:Detection Rule Management
Security Detection Rule Management Team
Feature:Prebuilt Detection Rules
Security Solution Prebuilt Detection Rules area
8.17 candidate
v8.17.0
labels
Oct 28, 2024
banderror
added
performance
Team:Fleet
Team label for Observability Data Collection Fleet team
Team:Detections and Resp
Security Detection Response Team
Team: SecuritySolution
Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
and removed
needs-team
Issues missing a team label
labels
Oct 30, 2024
banderror
added
Team:Detection Rule Management
Security Detection Rule Management Team
Feature:Prebuilt Detection Rules
Security Solution Prebuilt Detection Rules area
8.17 candidate
v8.17.0
labels
Oct 30, 2024
Pinging @elastic/fleet (Team:Fleet) |
xcrzx
added a commit
that referenced
this issue
Nov 5, 2024
…am-based approach (#195888) **Resolves: #192350 ## Summary Implemented stream-based installation of the detection rules package. **Background**: The installation of the detection rules package was causing OOM (Out of Memory) errors in Serverless environments where the available memory is limited to 1GB. The root cause of the errors was that during installation, the package was being read and unzipped entirely into memory. Given the large package size, this led to OOMs. To address these memory issues, the following changes were made: 1. Added a branching logic to the `installPackageFromRegistry` and `installPackageByUpload` methods, where based on the package name is decided to use streaming or not. Only one `security_detection_engine` package is currently hardcoded to use streaming. 2. In the state machine then defined a separate set of steps for the stream-based package installation. It is reduced to cover only Kibana assets installation at this stage. 3. A new `stepInstallKibanaAssetsWithStreaming` step is added to handle assets installation. While this method still reads the package archive into memory (since unzipping from a readable stream is [not possible due to the design of the .zip format](https://github.com/thejoshwolfe/yauzl?tab=readme-ov-file#no-streaming-unzip-api)), the package is unzipped using streams after being read into a buffer. This allows only a small portion of the archive (100 saved objects at a time) to be unpacked into memory, reducing memory usage. 4. The new method also includes several optimizations, such as only removing previously installed assets if they are missing in the new package and using `savedObjectClient.bulkCreate` instead of the less efficient `savedObjectClient.import`. ### Test environment 1. Prebuilt detection rules package with ~20k saved objects; 118MB zipped. 5. Local package registry. 6. Production build of Kibana running locally with a 700MB max old space limit, pointed to that registry. Setting up a test environment is not completely straightforward. Here's a rough outline of the steps: <details> <summary> How to test this PR </summary> 1. Create a package containing a large number of prebuilt rules. 1. I used the `package-storage` repository to find one of the previously released prebuilt rules packages. 2. Multiplied the number of assets in the package to 20k historical versions. 4. Built the package using `elastic-package build`. 2. Start a local package registry serving the built package using `elastic-package stack up --services package-registry`. 4. Create a production build of Kibana. To speed up the process, unnecessary artifacts can be skipped: ``` node scripts/build --skip-cdn-assets --skip-docker-ubi --skip-docker-ubuntu --skip-docker-wolfi --skip-docker-fips ``` 7. Provide the built Kibana with a config pointing to the local registry. The config is located in `build/default/kibana-9.0.0-SNAPSHOT-darwin-aarch64/config/kibana.yml`. You can use the following config: ``` csp.strict: false xpack.security.encryptionKey: 've4Vohnu oa0Fu9ae Eethee8c oDieg4do Nohrah1u ao9Hu2oh Aeb4Ieyi Aew1aegi' xpack.encryptedSavedObjects.encryptionKey: 'Shah7nai Eew6izai Eir7OoW0 Gewi2ief eiSh8woo shoogh7E Quae6hal ce6Oumah' xpack.fleet.internal.registry.kibanaVersionCheckEnabled: false xpack.fleet.registryUrl: https://localhost:8080 elasticsearch: username: 'kibana_system' password: 'changeme' hosts: 'http://localhost:9200' ``` 8. Override the Node options Kibana starts with to allow it to connect to the local registry and set the memory limit. For this, you need to edit the `build/default/kibana-9.0.0-SNAPSHOT-darwin-aarch64/bin/kibana` file: ``` NODE_OPTIONS="--no-warnings --max-http-header-size=65536 --unhandled-rejections=warn --dns-result-order=ipv4first --openssl-legacy-provider --max_old_space_size=700 --inspect" NODE_ENV=production NODE_EXTRA_CA_CERTS=~/.elastic-package/profiles/default/certs/ca-cert.pem exec "${NODE}" "${DIR}/src/cli/dist" "${@}" ``` 9. Navigate to the build folder: `build/default/kibana-9.0.0-SNAPSHOT-darwin-aarch64`. 10. Start Kibana using `./bin/kibana`. 11. Kibana is now running in debug mode, with the debugger started on port 9229. You can connect to it using VS Code's debug config or Chrome's DevTools. 12. Now you can install prebuilt detection rules by calling the `POST /internal/detection_engine/prebuilt_rules/_bootstrap` endpoint, which uses the new streaming installation under the hood. </details> ### Test results locally **Without the streaming approach** Guaranteed OOM. Even smaller packages, up to 10k rules, caused sporadic OOM errors. So for comparison, tested the package installation without memory limits. ![Screenshot 2024-10-14 at 14 15 26](https://github.com/user-attachments/assets/131cb877-2404-4638-b619-b1370a53659f) 1. Heap memory usage spikes up to 2.5GB 5. External memory consumes up to 450 Mb, which is four times the archive size 13. RSS (Resident Set Size) exceeds 4.5GB **With the streaming approach** No OOM errors observed. The memory consumption chart looks like the following: ![Screenshot 2024-10-14 at 11 15 21](https://github.com/user-attachments/assets/b47ba8c9-2ba7-42de-b921-c33104d4481e) 1. Heap memory remains stable, around 450MB, without any spikes. 2. External memory jumps to around 250MB at the beginning of the installation, then drops to around 120MB, which is roughly equal to the package archive size. I couldn't determine why the external memory consumption exceeds the package size by 2x when the installation starts. I checked the code for places where the package might be loaded into memory twice but found nothing suspicious. This might be worth investigating further. 3. RSS remains stable, peaking slightly above 1GB. I believe this is the upper limit for a package that can be handled without errors in a Serverless environment, where the memory limit is dictated by pod-level settings rather than Node settings and is set to 1GB. I'll verify this on a real Serverless instance to confirm. ### Test results on Serverless ![Screenshot 2024-10-31 at 12 31 34](https://github.com/user-attachments/assets/d20d2860-fa96-4e56-be2b-7b3c0b5c7b77)
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this issue
Nov 5, 2024
…am-based approach (elastic#195888) **Resolves: elastic#192350 ## Summary Implemented stream-based installation of the detection rules package. **Background**: The installation of the detection rules package was causing OOM (Out of Memory) errors in Serverless environments where the available memory is limited to 1GB. The root cause of the errors was that during installation, the package was being read and unzipped entirely into memory. Given the large package size, this led to OOMs. To address these memory issues, the following changes were made: 1. Added a branching logic to the `installPackageFromRegistry` and `installPackageByUpload` methods, where based on the package name is decided to use streaming or not. Only one `security_detection_engine` package is currently hardcoded to use streaming. 2. In the state machine then defined a separate set of steps for the stream-based package installation. It is reduced to cover only Kibana assets installation at this stage. 3. A new `stepInstallKibanaAssetsWithStreaming` step is added to handle assets installation. While this method still reads the package archive into memory (since unzipping from a readable stream is [not possible due to the design of the .zip format](https://github.com/thejoshwolfe/yauzl?tab=readme-ov-file#no-streaming-unzip-api)), the package is unzipped using streams after being read into a buffer. This allows only a small portion of the archive (100 saved objects at a time) to be unpacked into memory, reducing memory usage. 4. The new method also includes several optimizations, such as only removing previously installed assets if they are missing in the new package and using `savedObjectClient.bulkCreate` instead of the less efficient `savedObjectClient.import`. ### Test environment 1. Prebuilt detection rules package with ~20k saved objects; 118MB zipped. 5. Local package registry. 6. Production build of Kibana running locally with a 700MB max old space limit, pointed to that registry. Setting up a test environment is not completely straightforward. Here's a rough outline of the steps: <details> <summary> How to test this PR </summary> 1. Create a package containing a large number of prebuilt rules. 1. I used the `package-storage` repository to find one of the previously released prebuilt rules packages. 2. Multiplied the number of assets in the package to 20k historical versions. 4. Built the package using `elastic-package build`. 2. Start a local package registry serving the built package using `elastic-package stack up --services package-registry`. 4. Create a production build of Kibana. To speed up the process, unnecessary artifacts can be skipped: ``` node scripts/build --skip-cdn-assets --skip-docker-ubi --skip-docker-ubuntu --skip-docker-wolfi --skip-docker-fips ``` 7. Provide the built Kibana with a config pointing to the local registry. The config is located in `build/default/kibana-9.0.0-SNAPSHOT-darwin-aarch64/config/kibana.yml`. You can use the following config: ``` csp.strict: false xpack.security.encryptionKey: 've4Vohnu oa0Fu9ae Eethee8c oDieg4do Nohrah1u ao9Hu2oh Aeb4Ieyi Aew1aegi' xpack.encryptedSavedObjects.encryptionKey: 'Shah7nai Eew6izai Eir7OoW0 Gewi2ief eiSh8woo shoogh7E Quae6hal ce6Oumah' xpack.fleet.internal.registry.kibanaVersionCheckEnabled: false xpack.fleet.registryUrl: https://localhost:8080 elasticsearch: username: 'kibana_system' password: 'changeme' hosts: 'http://localhost:9200' ``` 8. Override the Node options Kibana starts with to allow it to connect to the local registry and set the memory limit. For this, you need to edit the `build/default/kibana-9.0.0-SNAPSHOT-darwin-aarch64/bin/kibana` file: ``` NODE_OPTIONS="--no-warnings --max-http-header-size=65536 --unhandled-rejections=warn --dns-result-order=ipv4first --openssl-legacy-provider --max_old_space_size=700 --inspect" NODE_ENV=production NODE_EXTRA_CA_CERTS=~/.elastic-package/profiles/default/certs/ca-cert.pem exec "${NODE}" "${DIR}/src/cli/dist" "${@}" ``` 9. Navigate to the build folder: `build/default/kibana-9.0.0-SNAPSHOT-darwin-aarch64`. 10. Start Kibana using `./bin/kibana`. 11. Kibana is now running in debug mode, with the debugger started on port 9229. You can connect to it using VS Code's debug config or Chrome's DevTools. 12. Now you can install prebuilt detection rules by calling the `POST /internal/detection_engine/prebuilt_rules/_bootstrap` endpoint, which uses the new streaming installation under the hood. </details> ### Test results locally **Without the streaming approach** Guaranteed OOM. Even smaller packages, up to 10k rules, caused sporadic OOM errors. So for comparison, tested the package installation without memory limits. ![Screenshot 2024-10-14 at 14 15 26](https://github.com/user-attachments/assets/131cb877-2404-4638-b619-b1370a53659f) 1. Heap memory usage spikes up to 2.5GB 5. External memory consumes up to 450 Mb, which is four times the archive size 13. RSS (Resident Set Size) exceeds 4.5GB **With the streaming approach** No OOM errors observed. The memory consumption chart looks like the following: ![Screenshot 2024-10-14 at 11 15 21](https://github.com/user-attachments/assets/b47ba8c9-2ba7-42de-b921-c33104d4481e) 1. Heap memory remains stable, around 450MB, without any spikes. 2. External memory jumps to around 250MB at the beginning of the installation, then drops to around 120MB, which is roughly equal to the package archive size. I couldn't determine why the external memory consumption exceeds the package size by 2x when the installation starts. I checked the code for places where the package might be loaded into memory twice but found nothing suspicious. This might be worth investigating further. 3. RSS remains stable, peaking slightly above 1GB. I believe this is the upper limit for a package that can be handled without errors in a Serverless environment, where the memory limit is dictated by pod-level settings rather than Node settings and is set to 1GB. I'll verify this on a real Serverless instance to confirm. ### Test results on Serverless ![Screenshot 2024-10-31 at 12 31 34](https://github.com/user-attachments/assets/d20d2860-fa96-4e56-be2b-7b3c0b5c7b77) (cherry picked from commit 67cdb93)
4 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
8.17 candidate
Feature:Prebuilt Detection Rules
Security Solution Prebuilt Detection Rules area
performance
Team:Detection Rule Management
Security Detection Rule Management Team
Team:Detections and Resp
Security Detection Response Team
Team:Fleet
Team label for Observability Data Collection Fleet team
Team: SecuritySolution
Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
v8.17.0
Epics: https://github.com/elastic/security-team/issues/1974 (internal), #174168
Related to: #187646
Summary
The Fleet team won't be able to implement stream-based package installation by mid October, which means we will need to implement it on our side to complete Milestone 3 in time.
Rough plan:
bootstrap
endpoint. The key point is that the implementation will be entirely on the Security Solution side.savedObject.import
tosavedObject.bulkCreate
for better memory efficiency.Details
An important note here is that we'll be using the EPR API directly to fetch package information and download package content (or read from disk if it's prebundled). To ensure compatibility with Fleet, we'll reuse the package saved object type, so even if the package is installed through the Security Solution endpoint, it will still be visible in the Integrations UI. The detection rules package will remain installable and upgradeable via Fleet's UI, but this will not be the recommended method. In Security Solution, we'll exclusively use the new installation endpoint.
The text was updated successfully, but these errors were encountered: