Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charles rendle/issue563 #622

Merged
merged 43 commits into from
Nov 9, 2022
Merged

Charles rendle/issue563 #622

merged 43 commits into from
Nov 9, 2022

Conversation

CharlesRendle
Copy link
Contributor

This PR adds two scripts to be used when stress testing csvcubed along with the setup and cleanup scripts to address ticket #563

The user must have some essential tooling installed which can be found with an installation guide on confluence

Copied from the confluence page, the stress test follows this structure:

  1. Start running a performance monitor in the background
  2. Generate a "maximally complex" CSV file with the number of rows the user has given as a parameter to the bash script.
  3. The inspect command test then runs 1 build command on this CSV file so that a .csv-metadata.json file can be used.
  4. Run the test's designated command 5 times using the results of the preprocess step above as inputs.
  5. The performance monitor records CPU and Memory usage throughout the duration of the test's 5 runs.
  6. A postprocess removes any unwanted resultant files.
  7. Steps 2 - 4 are repeated for the second command
  8. The metrics files are cleaned up and placed into relevant folders.
  9. Finally, the performance monitor is terminated.

The stresstest has currently only been run locally with plans to run via a VM in the future but for now that aspect has been removed from the ticket.

If there is anything which needs further explanation (no doubt..) please get in contact with myself or @nimshi89

CharlesRendle and others added 30 commits September 26, 2022 10:16
Copy link
Contributor

@GDonRanasinghe GDonRanasinghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CharlesRendle I've left a few comments.

- Start running a performance monitor in the background
- Generate a "maximally complex" CSV file with the number of rows the user has given as a parameter to the bash script.
1. This is a preprocess which is a groovy script which in turn runs a python script because JMeter will not execute `.py` files.
- The inspect command test then runs 1 build command on this CSV file so that a `.csv-metadata.json` file can be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant to say "The inspect command test then runs the build command on this CSV file so that the json-ld output generated by the build command can be used as the input to the inspect command"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed wording. This is far more succinct - thank you

assert (tmp_dir / "stress.csv").exists()


def test_generated_csv_shape_and_num_unique_values():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use assert_frame_equal provided by pandas test utils to assert the entire data frame?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked test so that it uses assert_frame_equals. I have kept in some of the original assertions because I think it will be useful to see if these specific checks fail due to future changes.

average_time = temp_array[len(temp_array) - 1] / n_runs
total_test_time = timedelta(seconds=temp_array[len(temp_array) - 1])

max_value_cpu = round(df2["localhost CPU"].max(), 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we have the same values when these tests run on a different env (e.g. on a pipeline or on user's machine - linux/windows)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stress test results I would expect to be different when run in a different environment + on different hardware. This portion of the ticket had been pushed to outside of scope for generating preliminary results. Perhaps this will form part of ticket #563

For the unit tests purpose, we use a prewritten metrics file from which we have calculated the expected results and use this as our comparison.

@robons
Copy link
Contributor

robons commented Nov 8, 2022

Here's a script you can use to set everything up on OS X:

#!/bin/bash

# Script to install jmeter and associated tools necessary for running the stress test on OSX.

brew install jmeter wget

# Install perfmon ServerAgent
SERVER_AGENT_VERSION="2.2.3"
SERVER_AGENT_ID="ServerAgent-$SERVER_AGENT_VERSION"
wget "https://github.com/undera/perfmon-agent/releases/download/$SERVER_AGENT_VERSION/$SERVER_AGENT_ID.zip"
unzip "$SERVER_AGENT_ID.zip"
rm "$SERVER_AGENT_ID.zip"
chmod +x "$SERVER_AGENT_ID/startAgent.sh"
mv "$SERVER_AGENT_ID" /usr/local/share
echo "#\!/bin/bash\n/usr/local/share/$SERVER_AGENT_ID/startAgent.sh \$@" > /usr/local/bin/startAgent.sh
chmod +x /usr/local/bin/startAgent.sh

# Installing perfmon plugin for jmeter
PLUGIN_VERSION="2.1"
PLUGIN_IDENTIFIER="jpgc-perfmon-$PLUGIN_VERSION"

wget "https://jmeter-plugins.org/files/packages/$PLUGIN_IDENTIFIER.zip"
# Add plugin to jmeter installation directory
JMETER_INSTALLATION_DIR="$(brew --prefix jmeter)"
unzip "$PLUGIN_IDENTIFIER.zip" -d "$JMETER_INSTALLATION_DIR/libexec"
rm "$PLUGIN_IDENTIFIER.zip"

Could you include this somewhere in your documentation?

Copy link
Contributor

@robons robons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good work. I know this ticket was a tough one, but you've got something that's useful and will help us validate whether our future performance improvements are effective or not.

I imagine you've learnt quite a bit in this ticket too.

Take a look at my comments before merging. They're mostly about clarifying things in code comments.


- Buildmetrics-timestamp.csv
- Inspectmetrics-timestamp.csv
- jmeter.log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this jmeter.Build.log and jmeter.Inspect.log now?

- Generate a "maximally complex" CSV file with the number of rows the user has given as a parameter to the bash script.
1. This is a preprocess which is a groovy script which in turn runs a python script because JMeter will not execute `.py` files.
- The inspect command test then runs the build command on this CSV file so that the json-ld output generated by the build command can be used as the input to the inspect command.
- Run the test's designated command 5 times using the results of the preprocess step above as inputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make sure to mention that they're run in series and not in parallel?

## Installation Guide

### From Bash Script
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should put this in back ticks (`) to ensure it's easy to copy and paste and isn't subject to formatting. Also can you make it clear to the user that this script will only work on OSX? It won't work in any other environment which might confuse someone.

numb_rows: int, temp_dir: Path = Path("temp_dir"), max_num_measures: int = 20
):

temp_dir.mkdir(exist_ok=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment to make it clear that the output of this function is to be paired with test-qube-config.json to ensure we correctly identify the type of each column when we build the CSV-W?

data = pd.read_csv(csv_metrics_in)

if data.shape[0] == 1:
raise IndexError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice choice of exception type.

@robons
Copy link
Contributor

robons commented Nov 8, 2022

Oh, and add a follow up ticket to make sure we cover the limited RAM/CPU VM testing we wanted to look in to.

@github-actions
Copy link

github-actions bot commented Nov 9, 2022

ubuntu-latest-python3.9 test results

360 tests  +5   360 ✔️ +5   3m 44s ⏱️ +45s
    8 suites ±0       0 💤 ±0 
    8 files   ±0       0 ±0 

Results for commit 259bd85. ± Comparison against base commit 77501e1.

This pull request removes 70 and adds 68 tests. Note that renamed tests count towards both.
CatalogMetadata.Testing CatalogMetadata ‑ py3.9.15 (main, Oct 18 2022, 07:15:17) 
ConfigSchema.Testing cube from config json ‑ py3.9.15 (main, Oct 18 2022, 07:15:17) 
[GCC 9.4.0]A QbCube configured by convention should contain appropriate datatypes
[GCC 9.4.0]A QbCube configured with Standard URI style should include file endings in URIs
[GCC 9.4.0]A QbCube configured with WithoutFileExtensions URI style should exclude file endings in URIs
[GCC 9.4.0]A QbCube should fail to validate where foreign key constraints are not met.
[GCC 9.4.0]A QbCube should generate appropriate DCAT Metadata
[GCC 9.4.0]A QbCube should generate csvcubed version specific rdf
[GCC 9.4.0]A QbCube should validate successfully where foreign key constraints are met.
[GCC 9.4.0]A QbCube which references a legacy composite code list should pass all tests
…
CatalogMetadata.Testing CatalogMetadata ‑ py3.9.15 (main, Oct 18 2022, 07:15:17) 
[GCC 9.4.0]This should succeed when multiple landing pages are supported
ConfigSchema.Testing cube from config json ‑ py3.9.15 (main, Oct 18 2022, 07:15:17) 
[GCC 9.4.0]This should succeed when a a valid csv and config.json file are provided
cli.Test the csvcubed Command Line Interface. ‑ py3.9.15 (main, Oct 18 2022, 07:15:17) 
[GCC 9.4.0]The csvcubed build command should output validation errors file
cube.Cube! ‑ py3.9.15 (main, Oct 18 2022, 07:15:17) 
[GCC 9.4.0]Output a cube and errors when created data only
cube.Cube! ‑ py3.9.15 (main, Oct 18 2022, 07:15:17) 
[GCC 9.4.0]Output a cube and errors when created from both config and data
cube.Cube! ‑ py3.9.15 (main, Oct 18 2022, 07:15:17) 
[GCC 9.4.0]Output a cube combining config and convention
cube.Cube! ‑ py3.9.15 (main, Oct 18 2022, 07:15:17) 
[GCC 9.4.0]Output a cube when an inline code list is defined using code list config schema v1.0 and when there are references to concepts defined elsewhere.
cube.Cube! ‑ py3.9.15 (main, Oct 18 2022, 07:15:17) 
[GCC 9.4.0]Output a cube when an inline code list is defined using code list config schema v1.0, and the sort order is defined with sort object
cube.Cube! ‑ py3.9.15 (main, Oct 18 2022, 07:15:17) 
[GCC 9.4.0]Output a cube when the code list is defined using code list config schema v1.0 and when the concepts are hierarchical, and the sort order is defined with sort object
cube.Cube! ‑ py3.9.15 (main, Oct 18 2022, 07:15:17) 
[GCC 9.4.0]Output a cube when the code list is defined using code list config schema v1.0 and when the concepts are not hierarchical, and the sort order is defined with sort object
…

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Nov 9, 2022

ubuntu-latest-python3.10 test results

360 tests  +5   360 ✔️ +5   2m 42s ⏱️ - 1m 21s
    8 suites ±0       0 💤 ±0 
    8 files   ±0       0 ±0 

Results for commit 259bd85. ± Comparison against base commit 77501e1.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Nov 9, 2022

windows-latest-python3.10 test results

    9 files  ±0    10 suites  ±0   6m 58s ⏱️ -12s
374 tests +5  374 ✔️ +5  0 💤 ±0  0 ±0 
387 runs  +5  387 ✔️ +5  0 💤 ±0  0 ±0 

Results for commit 259bd85. ± Comparison against base commit 77501e1.

♻️ This comment has been updated with latest results.

@sonarcloud
Copy link

sonarcloud bot commented Nov 9, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@CharlesRendle CharlesRendle merged commit 0d777fc into main Nov 9, 2022
@CharlesRendle CharlesRendle deleted the CharlesRendle/issue563 branch November 9, 2022 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants