Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Stream-based installation of the package with prebuilt rules #192350

Closed
Tracked by #174168
banderror opened this issue Sep 9, 2024 · 4 comments · Fixed by #195888
Closed
Tracked by #174168

[Security Solution] Stream-based installation of the package with prebuilt rules #192350

banderror opened this issue Sep 9, 2024 · 4 comments · Fixed by #195888
Assignees
Labels
8.17 candidate Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area performance Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team:Fleet Team label for Observability Data Collection Fleet team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.17.0

Comments

@banderror
Copy link
Contributor

banderror commented Sep 9, 2024

Epics: https://github.com/elastic/security-team/issues/1974 (internal), #174168
Related to: #187646

Summary

The Fleet team won't be able to implement stream-based package installation by mid October, which means we will need to implement it on our side to complete Milestone 3 in time.

Rough plan:

  1. Introduce a new endpoint in Security Solution for detection rule installation or reuse the existing bootstrap endpoint. The key point is that the implementation will be entirely on the Security Solution side.
  2. Copy the existing package installation logic from Fleet and strip out all code not related to saved object installation.
  3. Rewrite the saved object installation process, switching from savedObject.import to savedObject.bulkCreate for better memory efficiency.
  4. Implement incremental saved object installation without deleting existing objects.
  5. Add Stream Support

Details

An important note here is that we'll be using the EPR API directly to fetch package information and download package content (or read from disk if it's prebundled). To ensure compatibility with Fleet, we'll reuse the package saved object type, so even if the package is installed through the Security Solution endpoint, it will still be visible in the Integrations UI. The detection rules package will remain installable and upgradeable via Fleet's UI, but this will not be the recommended method. In Security Solution, we'll exclusively use the new installation endpoint.

@banderror banderror added 8.16 candidate Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area performance Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team labels Sep 9, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)

@xcrzx xcrzx removed performance Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area 8.17 candidate v8.17.0 labels Oct 28, 2024
@botelastic botelastic bot added the needs-team Issues missing a team label label Oct 28, 2024
@banderror banderror added performance Team:Fleet Team label for Observability Data Collection Fleet team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. and removed needs-team Issues missing a team label labels Oct 30, 2024
@banderror banderror added Team:Detection Rule Management Security Detection Rule Management Team Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area 8.17 candidate v8.17.0 labels Oct 30, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

xcrzx added a commit that referenced this issue Nov 5, 2024
…am-based approach (#195888)

**Resolves: #192350

## Summary

Implemented stream-based installation of the detection rules package.

**Background**: The installation of the detection rules package was
causing OOM (Out of Memory) errors in Serverless environments where the
available memory is limited to 1GB. The root cause of the errors was
that during installation, the package was being read and unzipped
entirely into memory. Given the large package size, this led to OOMs. To
address these memory issues, the following changes were made:

1. Added a branching logic to the `installPackageFromRegistry` and
`installPackageByUpload` methods, where based on the package name is
decided to use streaming or not. Only one `security_detection_engine`
package is currently hardcoded to use streaming.
2. In the state machine then defined a separate set of steps for the
stream-based package installation. It is reduced to cover only Kibana
assets installation at this stage.
3. A new `stepInstallKibanaAssetsWithStreaming` step is added to handle
assets installation. While this method still reads the package archive
into memory (since unzipping from a readable stream is [not possible due
to the design of the .zip
format](https://github.com/thejoshwolfe/yauzl?tab=readme-ov-file#no-streaming-unzip-api)),
the package is unzipped using streams after being read into a buffer.
This allows only a small portion of the archive (100 saved objects at a
time) to be unpacked into memory, reducing memory usage.
4. The new method also includes several optimizations, such as only
removing previously installed assets if they are missing in the new
package and using `savedObjectClient.bulkCreate` instead of the less
efficient `savedObjectClient.import`.

### Test environment

1. Prebuilt detection rules package with ~20k saved objects; 118MB
zipped.
5. Local package registry.
6. Production build of Kibana running locally with a 700MB max old space
limit, pointed to that registry.

Setting up a test environment is not completely straightforward. Here's
a rough outline of the steps:
<details>
<summary>
How to test this PR
</summary>

1. Create a package containing a large number of prebuilt rules.
1. I used the `package-storage` repository to find one of the previously
released prebuilt rules packages.
2. Multiplied the number of assets in the package to 20k historical
versions.
   4. Built the package using `elastic-package build`.
2. Start a local package registry serving the built package using
`elastic-package stack up --services package-registry`.
4. Create a production build of Kibana. To speed up the process,
unnecessary artifacts can be skipped:
    ```
node scripts/build --skip-cdn-assets --skip-docker-ubi
--skip-docker-ubuntu --skip-docker-wolfi --skip-docker-fips
    ```
7. Provide the built Kibana with a config pointing to the local
registry. The config is located in
`build/default/kibana-9.0.0-SNAPSHOT-darwin-aarch64/config/kibana.yml`.
You can use the following config:
    ```
    csp.strict: false
xpack.security.encryptionKey: 've4Vohnu oa0Fu9ae Eethee8c oDieg4do
Nohrah1u ao9Hu2oh Aeb4Ieyi Aew1aegi'
xpack.encryptedSavedObjects.encryptionKey: 'Shah7nai Eew6izai Eir7OoW0
Gewi2ief eiSh8woo shoogh7E Quae6hal ce6Oumah'

    xpack.fleet.internal.registry.kibanaVersionCheckEnabled: false
    xpack.fleet.registryUrl: https://localhost:8080

    elasticsearch:
      username: 'kibana_system'
      password: 'changeme'
      hosts: 'http://localhost:9200'
    ```
8. Override the Node options Kibana starts with to allow it to connect
to the local registry and set the memory limit. For this, you need to
edit the `build/default/kibana-9.0.0-SNAPSHOT-darwin-aarch64/bin/kibana`
file:
    ```
NODE_OPTIONS="--no-warnings --max-http-header-size=65536
--unhandled-rejections=warn --dns-result-order=ipv4first
--openssl-legacy-provider --max_old_space_size=700 --inspect"
NODE_ENV=production
NODE_EXTRA_CA_CERTS=~/.elastic-package/profiles/default/certs/ca-cert.pem
exec "${NODE}" "${DIR}/src/cli/dist" "${@}"
    ```
9. Navigate to the build folder:
`build/default/kibana-9.0.0-SNAPSHOT-darwin-aarch64`.
10. Start Kibana using `./bin/kibana`.
11. Kibana is now running in debug mode, with the debugger started on
port 9229. You can connect to it using VS Code's debug config or
Chrome's DevTools.
12. Now you can install prebuilt detection rules by calling the `POST
/internal/detection_engine/prebuilt_rules/_bootstrap` endpoint, which
uses the new streaming installation under the hood.

</details>

### Test results locally

**Without the streaming approach**

Guaranteed OOM. Even smaller packages, up to 10k rules, caused sporadic
OOM errors. So for comparison, tested the package installation without
memory limits.

![Screenshot 2024-10-14 at 14 15
26](https://github.com/user-attachments/assets/131cb877-2404-4638-b619-b1370a53659f)

1. Heap memory usage spikes up to 2.5GB
5. External memory consumes up to 450 Mb, which is four times the
archive size
13. RSS (Resident Set Size) exceeds 4.5GB

**With the streaming approach**

No OOM errors observed. The memory consumption chart looks like the
following:

![Screenshot 2024-10-14 at 11 15
21](https://github.com/user-attachments/assets/b47ba8c9-2ba7-42de-b921-c33104d4481e)

1. Heap memory remains stable, around 450MB, without any spikes.
2. External memory jumps to around 250MB at the beginning of the
installation, then drops to around 120MB, which is roughly equal to the
package archive size. I couldn't determine why the external memory
consumption exceeds the package size by 2x when the installation starts.
I checked the code for places where the package might be loaded into
memory twice but found nothing suspicious. This might be worth
investigating further.
3. RSS remains stable, peaking slightly above 1GB. I believe this is the
upper limit for a package that can be handled without errors in a
Serverless environment, where the memory limit is dictated by pod-level
settings rather than Node settings and is set to 1GB. I'll verify this
on a real Serverless instance to confirm.

### Test results on Serverless

![Screenshot 2024-10-31 at 12 31
34](https://github.com/user-attachments/assets/d20d2860-fa96-4e56-be2b-7b3c0b5c7b77)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Nov 5, 2024
…am-based approach (elastic#195888)

**Resolves: elastic#192350

## Summary

Implemented stream-based installation of the detection rules package.

**Background**: The installation of the detection rules package was
causing OOM (Out of Memory) errors in Serverless environments where the
available memory is limited to 1GB. The root cause of the errors was
that during installation, the package was being read and unzipped
entirely into memory. Given the large package size, this led to OOMs. To
address these memory issues, the following changes were made:

1. Added a branching logic to the `installPackageFromRegistry` and
`installPackageByUpload` methods, where based on the package name is
decided to use streaming or not. Only one `security_detection_engine`
package is currently hardcoded to use streaming.
2. In the state machine then defined a separate set of steps for the
stream-based package installation. It is reduced to cover only Kibana
assets installation at this stage.
3. A new `stepInstallKibanaAssetsWithStreaming` step is added to handle
assets installation. While this method still reads the package archive
into memory (since unzipping from a readable stream is [not possible due
to the design of the .zip
format](https://github.com/thejoshwolfe/yauzl?tab=readme-ov-file#no-streaming-unzip-api)),
the package is unzipped using streams after being read into a buffer.
This allows only a small portion of the archive (100 saved objects at a
time) to be unpacked into memory, reducing memory usage.
4. The new method also includes several optimizations, such as only
removing previously installed assets if they are missing in the new
package and using `savedObjectClient.bulkCreate` instead of the less
efficient `savedObjectClient.import`.

### Test environment

1. Prebuilt detection rules package with ~20k saved objects; 118MB
zipped.
5. Local package registry.
6. Production build of Kibana running locally with a 700MB max old space
limit, pointed to that registry.

Setting up a test environment is not completely straightforward. Here's
a rough outline of the steps:
<details>
<summary>
How to test this PR
</summary>

1. Create a package containing a large number of prebuilt rules.
1. I used the `package-storage` repository to find one of the previously
released prebuilt rules packages.
2. Multiplied the number of assets in the package to 20k historical
versions.
   4. Built the package using `elastic-package build`.
2. Start a local package registry serving the built package using
`elastic-package stack up --services package-registry`.
4. Create a production build of Kibana. To speed up the process,
unnecessary artifacts can be skipped:
    ```
node scripts/build --skip-cdn-assets --skip-docker-ubi
--skip-docker-ubuntu --skip-docker-wolfi --skip-docker-fips
    ```
7. Provide the built Kibana with a config pointing to the local
registry. The config is located in
`build/default/kibana-9.0.0-SNAPSHOT-darwin-aarch64/config/kibana.yml`.
You can use the following config:
    ```
    csp.strict: false
xpack.security.encryptionKey: 've4Vohnu oa0Fu9ae Eethee8c oDieg4do
Nohrah1u ao9Hu2oh Aeb4Ieyi Aew1aegi'
xpack.encryptedSavedObjects.encryptionKey: 'Shah7nai Eew6izai Eir7OoW0
Gewi2ief eiSh8woo shoogh7E Quae6hal ce6Oumah'

    xpack.fleet.internal.registry.kibanaVersionCheckEnabled: false
    xpack.fleet.registryUrl: https://localhost:8080

    elasticsearch:
      username: 'kibana_system'
      password: 'changeme'
      hosts: 'http://localhost:9200'
    ```
8. Override the Node options Kibana starts with to allow it to connect
to the local registry and set the memory limit. For this, you need to
edit the `build/default/kibana-9.0.0-SNAPSHOT-darwin-aarch64/bin/kibana`
file:
    ```
NODE_OPTIONS="--no-warnings --max-http-header-size=65536
--unhandled-rejections=warn --dns-result-order=ipv4first
--openssl-legacy-provider --max_old_space_size=700 --inspect"
NODE_ENV=production
NODE_EXTRA_CA_CERTS=~/.elastic-package/profiles/default/certs/ca-cert.pem
exec "${NODE}" "${DIR}/src/cli/dist" "${@}"
    ```
9. Navigate to the build folder:
`build/default/kibana-9.0.0-SNAPSHOT-darwin-aarch64`.
10. Start Kibana using `./bin/kibana`.
11. Kibana is now running in debug mode, with the debugger started on
port 9229. You can connect to it using VS Code's debug config or
Chrome's DevTools.
12. Now you can install prebuilt detection rules by calling the `POST
/internal/detection_engine/prebuilt_rules/_bootstrap` endpoint, which
uses the new streaming installation under the hood.

</details>

### Test results locally

**Without the streaming approach**

Guaranteed OOM. Even smaller packages, up to 10k rules, caused sporadic
OOM errors. So for comparison, tested the package installation without
memory limits.

![Screenshot 2024-10-14 at 14 15
26](https://github.com/user-attachments/assets/131cb877-2404-4638-b619-b1370a53659f)

1. Heap memory usage spikes up to 2.5GB
5. External memory consumes up to 450 Mb, which is four times the
archive size
13. RSS (Resident Set Size) exceeds 4.5GB

**With the streaming approach**

No OOM errors observed. The memory consumption chart looks like the
following:

![Screenshot 2024-10-14 at 11 15
21](https://github.com/user-attachments/assets/b47ba8c9-2ba7-42de-b921-c33104d4481e)

1. Heap memory remains stable, around 450MB, without any spikes.
2. External memory jumps to around 250MB at the beginning of the
installation, then drops to around 120MB, which is roughly equal to the
package archive size. I couldn't determine why the external memory
consumption exceeds the package size by 2x when the installation starts.
I checked the code for places where the package might be loaded into
memory twice but found nothing suspicious. This might be worth
investigating further.
3. RSS remains stable, peaking slightly above 1GB. I believe this is the
upper limit for a package that can be handled without errors in a
Serverless environment, where the memory limit is dictated by pod-level
settings rather than Node settings and is set to 1GB. I'll verify this
on a real Serverless instance to confirm.

### Test results on Serverless

![Screenshot 2024-10-31 at 12 31
34](https://github.com/user-attachments/assets/d20d2860-fa96-4e56-be2b-7b3c0b5c7b77)

(cherry picked from commit 67cdb93)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.17 candidate Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area performance Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team:Fleet Team label for Observability Data Collection Fleet team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.17.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants