Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance benchmark workflow #4282

Closed
wants to merge 1 commit into from

Conversation

abraakie
Copy link

Description

This will run a benchmark workflow for each push and pull request against a base branch (main, 1., 2.).
For pull requests the results of the benchmark are compared to the results of previous runs on the target branch. If the threshold (currently 150%) is exceeded, the benchmark workflow will fail.

This workflow helps identifying performance impacts of code changes in an early state.

Issues Resolved

Testing

The workflow is tested manually. The result fluctuation is reasonable.

Check List

  • New functionality includes testing
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating the PR @abraakie Maybe you could help me understand more of the vision for these changes by extracting out parts of the RFC [1] that you are addressing vs those that you aren't planning vs things you are still implementing for this PR?

- name: Prepare Security Plugin
run: mv ./build/distributions/opensearch-security-*.zip ./benchmark/docker/
- name: Build Docker Image
run: docker build --platform linux/amd64 --build-arg VERSION=${{ env.OPENSEARCH_VERSION }} -f benchmark/docker/benchmark.dockerfile -t opensearchproject/security-benchmark:latest benchmark/docker/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why containerize with docker? We already have a supported model for starting and executing tests against a cluster, see this workflow for an example:
https://github.com/opensearch-project/security/blob/main/.github/workflows/plugin_install.yml

run: mv ./build/distributions/opensearch-security-*.zip ./benchmark/docker/
- name: Build Docker Image
run: docker build --platform linux/amd64 --build-arg VERSION=${{ env.OPENSEARCH_VERSION }} -f benchmark/docker/benchmark.dockerfile -t opensearchproject/security-benchmark:latest benchmark/docker/
- name: Run Benchmark Cluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think test will produce consistent metrics when run in a cluster. If we are trying to detect 'large' issues, a cluster adds more variables. What results have you seen in your testing - can you help build my confidence on these choices?

- name: Install Benchmark
run: pip install opensearch-benchmark
- name: Execute Benchmarks
run: opensearch-benchmark execute-test --pipeline=benchmark-only --results-format=csv --results-file=./results.csv --on-error=abort --workload=percolator --target-host=https://localhost:9200 --client-options=basic_auth_user:admin,basic_auth_password:${{ env.OPENSEARCH_ADMIN_PASSWORD }},verify_certs:false,timeout:60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow uses percolator which doesn't exercise any of the security plugin's functionality, until we are doing tests with multiple users and different feature configurations this isn't any more useful than that nightly benchmarks against that are already published and managed via https://opensearch.org/benchmarks/

with:
path: ./cache
key: ${{ runner.os }}-benchmark
- name: Store Benchmark Results
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link to a couple of workflows that show no-regression vs regression so we can better understand how this looks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -0,0 +1,53 @@
import json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we see about making a PR to benchmarks to support another output format natively rather than baking it into the security repo?

Copy link

codecov bot commented Apr 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 66.05%. Comparing base (c09fad5) to head (06015f0).
Report is 197 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4282      +/-   ##
==========================================
- Coverage   66.06%   66.05%   -0.01%     
==========================================
  Files         301      301              
  Lines       21711    21711              
  Branches     3506     3506              
==========================================
- Hits        14343    14341       -2     
- Misses       5607     5610       +3     
+ Partials     1761     1760       -1     

see 3 files with indirect coverage changes

run: opensearch-benchmark execute-test --pipeline=benchmark-only --results-format=csv --results-file=./results.csv --on-error=abort --workload=percolator --target-host=https://localhost:9200 --client-options=basic_auth_user:admin,basic_auth_password:${{ env.OPENSEARCH_ADMIN_PASSWORD }},verify_certs:false,timeout:60
- name: Prepare Benchmark Results
run: python benchmark/result_rewriter.py ./results.csv ./results.json
- name: Download previous benchmark data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this eventually be extended to the last n runs? i.e. Get average or median of last 10 runs instead of only the previous run?

@DarshitChanpura
Copy link
Member

@abraakie Closing this PR since there hasn't been any activity since 2+ months. Feel free to open a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] Security Performance Test Suite
4 participants