Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats: New argument "--csv-full-history" appends stats entries every interval in a new "_stats_history.csv" File #1146

Merged
merged 7 commits into from
Dec 5, 2019

Conversation

mehta-ankit
Copy link
Contributor

@mehta-ankit mehta-ankit commented Nov 14, 2019

  • Adds --csv-full-history flag to enable appending of the stats entries to a new CSV log file (_stats_history.csv). It allows for tracking changes in response times over the course of long test runs.
  • Renames _requests to _stats.csv
  • merges response times from _distribution.csv to _stats.csv file so there will no longer be a distribution.csv

@mehta-ankit
Copy link
Contributor Author

This PR does the same work started in : #1007.
Author of the above mentioned PR cannot continue to work on it, and since we had a need for these changes for our team's project i created a PR.

@codecov
Copy link

codecov bot commented Nov 14, 2019

Codecov Report

Merging #1146 into master will decrease coverage by 0.12%.
The diff coverage is 62.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1146      +/-   ##
==========================================
- Coverage   79.26%   79.13%   -0.13%     
==========================================
  Files          20       20              
  Lines        1895     1912      +17     
  Branches      294      299       +5     
==========================================
+ Hits         1502     1513      +11     
- Misses        321      323       +2     
- Partials       72       76       +4
Impacted Files Coverage Δ
locust/web.py 88.37% <100%> (ø) ⬆️
locust/main.py 35.06% <33.33%> (+0.28%) ⬆️
locust/stats.py 84.35% <59.37%> (-0.83%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4aaedfc...f0c6faa. Read the comment docs.

@mehta-ankit
Copy link
Contributor Author

Rebased to resolve merge conflicts.

@mehta-ankit
Copy link
Contributor Author

@heyman Can you please take a look at this PR.
You made a comment on the other PR that this PR replaces and i want to make sure i resolve all your concerns.
As far as configuring the interval goes let me know if you still want a command line flag ?
or if it's ok for the users to set overwrite it themselves by importing locust.stats ?

import locust.stats

@mehta-ankit
Copy link
Contributor Author

mehta-ankit commented Nov 15, 2019

Also not sure why py38 job failed on travis. Running tox locally seem to be passing:
image

@heyman
Copy link
Member

heyman commented Nov 15, 2019

This is what I think we should do:

  • Rename the CSV files that we currently create to CSVBASENAME_stats.csv and CSVBASENAME_response_times.csv. Or some other name if we can come up with something better, but _requests and _distribution that we currently use is not very good, and if we're changing the CSV files we should take the opportunity to choose better names.
  • Change so that both files have a Type column and a Name column. At the moment the _distribution-file has a single Name column that is the type and name concatenated, while the _requests-file has two columns called Method and Name.
  • Make locust automatically create a new file (when --csv is specified) called something like CSVBASENAME_stats_history.csv (perhaps there's a better name?), and in this file we'll append a row for the Aggregated stats entry each interval. So by default we'll only append a single row every interval. Also we should probably have a timestamp column in this CSV file. These rows should use the StatsEntry.current_rps and StatsEntry.current_fail_per_sec instead of total_rps and total_fail_per_sec(total_fail_per_sec should also be added to the _stats CSV file, because it's currently missing).
  • Add a new command line flag to enable locust to also append a row for each other stats entries at every iteration. We could maybe call this --csv-full-history. The reason this has to be an option is because we otherwise risk creating huge files for long running tests with many stats entries.

I know this is much more job, but the current state of Locust's CSV handling is not very good, and I don't want to add a whole new CSV feature before we've improved on it. Let me know if you're interested in working on this.

or if it's ok for the users to set overwrite it themselves by importing locust.stats

I don't think it's necessary to have a command line argument for it. However, if we're doing the above changes, I think we should move out the CSV-related code into a separate python module (locust.csv perhaps).

Also not sure why py38 job failed on travis

I think it was just some random network error at Travis. Restarted the build and it completed without errors.

@mehta-ankit
Copy link
Contributor Author

@heyman Thanks for the reply with the detailed suggestions.
I ll be more than happy to work on this since this is something we want and it would be best if we have this as part of the main locust fork, rather than a tweak in our own fork of locust.

Will make more commits with all changes as suggested and tag you once i am done.

@mehta-ankit
Copy link
Contributor Author

@heyman Just to confirm - do you mean we should have Type, Name columns in both files and where Type will be the same value as method in _requests-file ?

  • Change so that both files have a Type column and a Name column. At the moment the _distribution-file has a single Name column that is the type and name concatenated, while the _requests-file has two columns called Method and Name.

@heyman
Copy link
Member

heyman commented Nov 18, 2019

I ll be more than happy to work on this

That's great! Just let me know if you have any questions.

do you mean we should have Type, Name columns in both files and where Type will be the same value as method in _requests-file ?

Exactly! From the beginning Locust was HTTP-only so it made sense to call it Method, while Type is more generic.

@mehta-ankit mehta-ankit force-pushed the csvAppend branch 2 times, most recently from c899756 to 3e2374e Compare November 20, 2019 19:40
@mehta-ankit
Copy link
Contributor Author

@heyman Couple of clarifying questions:

  1. For the _stats_history.csv file without the --csv-full-history flag (that means only printing the aggregate entries), do we show current_rps and current_fail_per_sec or the total ones. You mention in your comment it should be the current_rps but making sure.

  2. if --csv-full-history flag is provided, that means we will add each stats entry every iteration, do we still add the aggregated entries to this file ? Having discussed it with my team i feel we should, it would help in analyzing the data. But follow up question is what rps and failure_per_sec would make sense in this case (current or total).

Thanks in advance!

@mehta-ankit
Copy link
Contributor Author

mehta-ankit commented Nov 22, 2019

@matthiaslee @mckornfield Can you review this please.

@mehta-ankit
Copy link
Contributor Author

FYI
Example of new CSV file with --csv-full-history flag passed in:

"Type","Name","Timestamp","# requests","# failures","Requests/s","Requests Failed/s,"Median response time","Average response time","Min response time","Max response time","Average Content Size","50%","66%","75%","80%","90%","95%","98%","99%","99.9%","99.99%","99.999","100%"
"None","Aggregated","1574373141",0,0,0.00,0.00,0,0,0,0.00,0.00,"N/A","N/A","N/A","N/A","N/A","N/A","N/A","N/A","N/A","N/A","N/A","N/A"
"GET","Login","1574373143",1,0,0.00,0.00,28,28,28,28.74,0.00,28,28,28,28,28,28,28,28,28,28,28,28
"GET","/some/endpoint","1574373143",1,0,0.00,0.00,77,77,77,77.15,427827.00,77,77,77,77,77,77,77,77,77,77,77,77
"GET","aboutUs","1574373143",1,0,0.00,0.00,105,105,105,105.89,11383.00,110,110,110,110,110,110,110,110,110,110,110,110
"POST","page1","1574373143",1,0,0.00,0.00,520,520,520,520.62,452939.00,520,520,520,520,520,520,520,520,520,520,520,520
"GET","page2","1574373143",2,0,0.00,0.00,120,140,119,161.40,35908.00,160,160,160,160,160,160,160,160,160,160,160,160
"GET","logout","1574373143",1,0,0.00,0.00,107,107,107,107.87,107909.00,110,110,110,110,110,110,110,110,110,110,110,110
"None","Aggregated","1574373143",7,0,0.00,0.00,110,160,28,520.62,153124.86,110,120,160,160,520,520,520,520,520,520,520,520

Copy link

@mckornfield mckornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, some questions

locust/main.py Outdated Show resolved Hide resolved
locust/stats.py Outdated Show resolved Hide resolved
locust/stats.py Outdated Show resolved Hide resolved
locust/stats.py Show resolved Hide resolved
locust/stats.py Show resolved Hide resolved
@heyman
Copy link
Member

heyman commented Nov 25, 2019

For the _stats_history.csv file without the --csv-full-history flag (that means only printing the aggregate entries), do we show current_rps and current_fail_per_sec or the total ones. You mention in your comment it should be the current_rps but making sure.

Yes, I think it makes most sense to use the current_rps and current_fail_per_sec here.

if --csv-full-history flag is provided, that means we will add each stats entry every iteration, do we still add the aggregated entries to this file ? Having discussed it with my team i feel we should, it would help in analyzing the data. But follow up question is what rps and failure_per_sec would make sense in this case (current or total).

I agree that we should also output a row for the aggregated StatsEntry when --csv-full-history is set. We should use the current rps/failure_per_sec.

@heyman
Copy link
Member

heyman commented Nov 25, 2019

Hmm, we should probably be able to merge the _response_times columns into the _stats CSV file. I think this was proposed by someone else some time ago, and I don't know why I didn't think about that when I wrote my previous comment.

@heyman
Copy link
Member

heyman commented Nov 25, 2019

Also we should make sure to use the StatsEntry.get_current_response_time_percentile() when retrieving response time stats for the _history CSV file (so that we'll output the current response times instead of the total for the whole test run).

@mehta-ankit
Copy link
Contributor Author

@heyman The _stats_history file anyways has columns from _response_times.csv and _stats.csv combined. So are you saying we will not need the _response_times.csv anymore ? And we will only have 2 files in total. Otherwise i don't get what we will achieve from [this](Do you think this is needed in the current PR ?) change.

@mehta-ankit
Copy link
Contributor Author

mehta-ankit commented Nov 25, 2019

Also we should make sure to use the StatsEntry.get_current_response_time_percentile() when retrieving response time stats for the _history CSV file (so that we'll output the current response times instead of the total for the whole test run).

To use this, don't we have to pass in use_response_times_cache=True. Right now for Aggregated entries it's passed in as true, self.total = StatsEntry(self, "Aggregated", None, use_response_times_cache=True) but otherwise it uses the default entry = StatsEntry(self, name, method) which is False.
So i guess we will need to set use_response_times_cache to true when calling StatsEntry.
image

@mehta-ankit
Copy link
Contributor Author

mehta-ankit commented Dec 2, 2019

@heyman Do you have any update on my 2 clarifying comments/questions regarding your last suggestions:
#1146 (comment)
#1146 (comment)

EDIT:
I made 2 commits based on your suggestion and my understanding of it. Let me know if that is what we need.

Also, since _response_times.csv is no longer a file we create, i removed it from web.py and index.html so it is not available for download from web ui.
Do we would want stats_history file to be added to web.py as a route, so it can be downloaded from the web UI. I created a commit to implement that. Let me know if that is ok.

@mehta-ankit mehta-ankit force-pushed the csvAppend branch 2 times, most recently from 9ed2f10 to 30060bb Compare December 3, 2019 16:15
@mehta-ankit mehta-ankit force-pushed the csvAppend branch 2 times, most recently from d208867 to bd8e973 Compare December 3, 2019 19:40
@mehta-ankit
Copy link
Contributor Author

@cyberw Will you be able to take a look at my last comment and last 2 commits. @heyman suggested those changes and i haven't heard back so was wondering if any of the other contributors want to take a look.
Thanks 🙏

@cyberw
Copy link
Collaborator

cyberw commented Dec 5, 2019

I have just one question: The header line in the CSV now says "Type" but you are logging s.method? Shouldnt say "Method"?

Other than that it looks good to me and I have no issues merging this.

@mehta-ankit
Copy link
Contributor Author

mehta-ankit commented Dec 5, 2019

@cyberw this was a suggestion from Heyman and i asked a clarifying question and here is the reply from him.

Also should i squash all my commits, or is it ok to keep separate commits ? The only commit i want to squash even if we keep all of then would be this one: 84bb2d3

@cyberw
Copy link
Collaborator

cyberw commented Dec 5, 2019

@cyberw this was a suggestion from Heyman and i asked a clarifying question and here is the reply from him.

Also should i squash all my commits, or is it ok to keep separate commits ? The only commit i want to squash even if we keep all of then would be this one: 84bb2d3

Ah, I didnt see that one in this long history, sorry :)

No need to squash. Thank you for your contribution!

@cyberw
Copy link
Collaborator

cyberw commented Dec 5, 2019

Oh, one more thing before I hit the merge button, could you update the PR title?

@mehta-ankit mehta-ankit changed the title Stats: New argument "--csv-append" appends instead of replacing Stats: New argument "--csv-full-history" appends stats entries every interval in a new "_stats_history.csv" File Dec 5, 2019
@mehta-ankit
Copy link
Contributor Author

mehta-ankit commented Dec 5, 2019

@cyberw Done!
One more question, will you make a new release as well ? The reason we wanted to do this was so that we can start using main Locust fork instead of a fork of it we use for our purposes.

@cyberw cyberw merged commit 3ea6822 into locustio:master Dec 5, 2019
@cyberw
Copy link
Collaborator

cyberw commented Dec 5, 2019

Unforutunately I can't make releases. But I'm sure @heyman or @mbeacom can make one shortly.

@mehta-ankit
Copy link
Contributor Author

Unforutunately I can't make releases. But I'm sure @heyman or @mbeacom can make one shortly.

Thanks. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants