Stats: New argument "--csv-full-history" appends stats entries every interval in a new "_stats_history.csv" File #1146

mehta-ankit · 2019-11-14T19:18:58Z

Adds --csv-full-history flag to enable appending of the stats entries to a new CSV log file (_stats_history.csv). It allows for tracking changes in response times over the course of long test runs.
Renames _requests to _stats.csv
merges response times from _distribution.csv to _stats.csv file so there will no longer be a distribution.csv

mehta-ankit · 2019-11-14T19:21:58Z

This PR does the same work started in : #1007.
Author of the above mentioned PR cannot continue to work on it, and since we had a need for these changes for our team's project i created a PR.

codecov · 2019-11-14T19:26:25Z

Codecov Report

Merging #1146 into master will decrease coverage by 0.12%.
The diff coverage is 62.5%.

@@            Coverage Diff             @@
##           master    #1146      +/-   ##
==========================================
- Coverage   79.26%   79.13%   -0.13%     
==========================================
  Files          20       20              
  Lines        1895     1912      +17     
  Branches      294      299       +5     
==========================================
+ Hits         1502     1513      +11     
- Misses        321      323       +2     
- Partials       72       76       +4

Impacted Files	Coverage Δ
locust/web.py	`88.37% <100%> (ø)`	⬆️
locust/main.py	`35.06% <33.33%> (+0.28%)`	⬆️
locust/stats.py	`84.35% <59.37%> (-0.83%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4aaedfc...f0c6faa. Read the comment docs.

mehta-ankit · 2019-11-15T15:59:57Z

Rebased to resolve merge conflicts.

mehta-ankit · 2019-11-15T16:15:50Z

@heyman Can you please take a look at this PR.
You made a comment on the other PR that this PR replaces and i want to make sure i resolve all your concerns.
As far as configuring the interval goes let me know if you still want a command line flag ?
or if it's ok for the users to set overwrite it themselves by importing locust.stats ?

import locust.stats

mehta-ankit · 2019-11-15T16:49:28Z

Also not sure why py38 job failed on travis. Running tox locally seem to be passing:

heyman · 2019-11-15T23:22:17Z

This is what I think we should do:

Rename the CSV files that we currently create to CSVBASENAME_stats.csv and CSVBASENAME_response_times.csv. Or some other name if we can come up with something better, but _requests and _distribution that we currently use is not very good, and if we're changing the CSV files we should take the opportunity to choose better names.
Change so that both files have a Type column and a Name column. At the moment the _distribution-file has a single Name column that is the type and name concatenated, while the _requests-file has two columns called Method and Name.
Make locust automatically create a new file (when --csv is specified) called something like CSVBASENAME_stats_history.csv (perhaps there's a better name?), and in this file we'll append a row for the Aggregated stats entry each interval. So by default we'll only append a single row every interval. Also we should probably have a timestamp column in this CSV file. These rows should use the StatsEntry.current_rps and StatsEntry.current_fail_per_sec instead of total_rps and total_fail_per_sec(total_fail_per_sec should also be added to the _stats CSV file, because it's currently missing).
Add a new command line flag to enable locust to also append a row for each other stats entries at every iteration. We could maybe call this --csv-full-history. The reason this has to be an option is because we otherwise risk creating huge files for long running tests with many stats entries.

I know this is much more job, but the current state of Locust's CSV handling is not very good, and I don't want to add a whole new CSV feature before we've improved on it. Let me know if you're interested in working on this.

or if it's ok for the users to set overwrite it themselves by importing locust.stats

I don't think it's necessary to have a command line argument for it. However, if we're doing the above changes, I think we should move out the CSV-related code into a separate python module (locust.csv perhaps).

Also not sure why py38 job failed on travis

I think it was just some random network error at Travis. Restarted the build and it completed without errors.

mehta-ankit · 2019-11-18T13:44:33Z

@heyman Thanks for the reply with the detailed suggestions.
I ll be more than happy to work on this since this is something we want and it would be best if we have this as part of the main locust fork, rather than a tweak in our own fork of locust.

Will make more commits with all changes as suggested and tag you once i am done.

mehta-ankit · 2019-11-18T14:13:46Z

@heyman Just to confirm - do you mean we should have Type, Name columns in both files and where Type will be the same value as method in _requests-file ?

Change so that both files have a Type column and a Name column. At the moment the _distribution-file has a single Name column that is the type and name concatenated, while the _requests-file has two columns called Method and Name.

heyman · 2019-11-18T14:17:35Z

I ll be more than happy to work on this

That's great! Just let me know if you have any questions.

do you mean we should have Type, Name columns in both files and where Type will be the same value as method in _requests-file ?

Exactly! From the beginning Locust was HTTP-only so it made sense to call it Method, while Type is more generic.

mehta-ankit · 2019-11-21T21:42:10Z

@heyman Couple of clarifying questions:

For the _stats_history.csv file without the --csv-full-history flag (that means only printing the aggregate entries), do we show current_rps and current_fail_per_sec or the total ones. You mention in your comment it should be the current_rps but making sure.
if --csv-full-history flag is provided, that means we will add each stats entry every iteration, do we still add the aggregated entries to this file ? Having discussed it with my team i feel we should, it would help in analyzing the data. But follow up question is what rps and failure_per_sec would make sense in this case (current or total).

Thanks in advance!

mehta-ankit · 2019-11-22T12:51:14Z

@matthiaslee @mckornfield Can you review this please.

mehta-ankit · 2019-11-22T13:04:25Z

FYI
Example of new CSV file with --csv-full-history flag passed in:

"Type","Name","Timestamp","# requests","# failures","Requests/s","Requests Failed/s,"Median response time","Average response time","Min response time","Max response time","Average Content Size","50%","66%","75%","80%","90%","95%","98%","99%","99.9%","99.99%","99.999","100%"
"None","Aggregated","1574373141",0,0,0.00,0.00,0,0,0,0.00,0.00,"N/A","N/A","N/A","N/A","N/A","N/A","N/A","N/A","N/A","N/A","N/A","N/A"
"GET","Login","1574373143",1,0,0.00,0.00,28,28,28,28.74,0.00,28,28,28,28,28,28,28,28,28,28,28,28
"GET","/some/endpoint","1574373143",1,0,0.00,0.00,77,77,77,77.15,427827.00,77,77,77,77,77,77,77,77,77,77,77,77
"GET","aboutUs","1574373143",1,0,0.00,0.00,105,105,105,105.89,11383.00,110,110,110,110,110,110,110,110,110,110,110,110
"POST","page1","1574373143",1,0,0.00,0.00,520,520,520,520.62,452939.00,520,520,520,520,520,520,520,520,520,520,520,520
"GET","page2","1574373143",2,0,0.00,0.00,120,140,119,161.40,35908.00,160,160,160,160,160,160,160,160,160,160,160,160
"GET","logout","1574373143",1,0,0.00,0.00,107,107,107,107.87,107909.00,110,110,110,110,110,110,110,110,110,110,110,110
"None","Aggregated","1574373143",7,0,0.00,0.00,110,160,28,520.62,153124.86,110,120,160,160,520,520,520,520,520,520,520,520

mckornfield

A few comments, some questions

locust/main.py

locust/stats.py

heyman · 2019-11-25T15:13:57Z

For the _stats_history.csv file without the --csv-full-history flag (that means only printing the aggregate entries), do we show current_rps and current_fail_per_sec or the total ones. You mention in your comment it should be the current_rps but making sure.

Yes, I think it makes most sense to use the current_rps and current_fail_per_sec here.

if --csv-full-history flag is provided, that means we will add each stats entry every iteration, do we still add the aggregated entries to this file ? Having discussed it with my team i feel we should, it would help in analyzing the data. But follow up question is what rps and failure_per_sec would make sense in this case (current or total).

I agree that we should also output a row for the aggregated StatsEntry when --csv-full-history is set. We should use the current rps/failure_per_sec.

heyman · 2019-11-25T15:24:57Z

Hmm, we should probably be able to merge the _response_times columns into the _stats CSV file. I think this was proposed by someone else some time ago, and I don't know why I didn't think about that when I wrote my previous comment.

heyman · 2019-11-25T15:29:05Z

Also we should make sure to use the StatsEntry.get_current_response_time_percentile() when retrieving response time stats for the _history CSV file (so that we'll output the current response times instead of the total for the whole test run).

mehta-ankit · 2019-11-25T20:01:57Z

@heyman The _stats_history file anyways has columns from _response_times.csv and _stats.csv combined. So are you saying we will not need the _response_times.csv anymore ? And we will only have 2 files in total. Otherwise i don't get what we will achieve from [this](Do you think this is needed in the current PR ?) change.

mehta-ankit · 2019-11-25T21:44:23Z

Also we should make sure to use the StatsEntry.get_current_response_time_percentile() when retrieving response time stats for the _history CSV file (so that we'll output the current response times instead of the total for the whole test run).

To use this, don't we have to pass in use_response_times_cache=True. Right now for Aggregated entries it's passed in as true, self.total = StatsEntry(self, "Aggregated", None, use_response_times_cache=True) but otherwise it uses the default entry = StatsEntry(self, name, method) which is False.
So i guess we will need to set use_response_times_cache to true when calling StatsEntry.

mehta-ankit · 2019-12-02T21:26:10Z

@heyman Do you have any update on my 2 clarifying comments/questions regarding your last suggestions:
#1146 (comment)
#1146 (comment)

EDIT:
I made 2 commits based on your suggestion and my understanding of it. Let me know if that is what we need.

Also, since _response_times.csv is no longer a file we create, i removed it from web.py and index.html so it is not available for download from web ui.
Do we would want stats_history file to be added to web.py as a route, so it can be downloaded from the web UI. I created a commit to implement that. Let me know if that is ok.

mehta-ankit · 2019-12-04T15:38:49Z

@cyberw Will you be able to take a look at my last comment and last 2 commits. @heyman suggested those changes and i haven't heard back so was wondering if any of the other contributors want to take a look.
Thanks 🙏

cyberw · 2019-12-05T09:13:48Z

I have just one question: The header line in the CSV now says "Type" but you are logging s.method? Shouldnt say "Method"?

Other than that it looks good to me and I have no issues merging this.

mehta-ankit · 2019-12-05T13:04:00Z

@cyberw this was a suggestion from Heyman and i asked a clarifying question and here is the reply from him.

Also should i squash all my commits, or is it ok to keep separate commits ? The only commit i want to squash even if we keep all of then would be this one: 84bb2d3

cyberw · 2019-12-05T13:10:03Z

@cyberw this was a suggestion from Heyman and i asked a clarifying question and here is the reply from him.

Also should i squash all my commits, or is it ok to keep separate commits ? The only commit i want to squash even if we keep all of then would be this one: 84bb2d3

Ah, I didnt see that one in this long history, sorry :)

No need to squash. Thank you for your contribution!

cyberw · 2019-12-05T13:11:37Z

Oh, one more thing before I hit the merge button, could you update the PR title?

mehta-ankit · 2019-12-05T13:13:46Z

@cyberw Done!
One more question, will you make a new release as well ? The reason we wanted to do this was so that we can start using main Locust fork instead of a fork of it we use for our purposes.

cyberw · 2019-12-05T13:28:22Z

Unforutunately I can't make releases. But I'm sure @heyman or @mbeacom can make one shortly.

mehta-ankit · 2019-12-05T13:39:08Z

Unforutunately I can't make releases. But I'm sure @heyman or @mbeacom can make one shortly.

Thanks. 🙏

mehta-ankit force-pushed the csvAppend branch from 29deb28 to fb358ef Compare November 15, 2019 15:59

mehta-ankit force-pushed the csvAppend branch 2 times, most recently from c899756 to 3e2374e Compare November 20, 2019 19:40

mckornfield reviewed Nov 23, 2019

View reviewed changes

locust/main.py Outdated Show resolved Hide resolved

locust/stats.py Outdated Show resolved Hide resolved

locust/stats.py Outdated Show resolved Hide resolved

locust/stats.py Show resolved Hide resolved

locust/stats.py Show resolved Hide resolved

cyberw mentioned this pull request Nov 28, 2019

Statistics: Provide distribution trend over time in CSV format #1007

Closed

Add failed request/s as a new column and changed CSV filenames

85c8051

mehta-ankit force-pushed the csvAppend branch 2 times, most recently from 9ed2f10 to 30060bb Compare December 3, 2019 16:15

mehta-ankit added 4 commits December 3, 2019 14:21

Write Aggregated entry to _stats_history.csv file per interval

1cde723

Write each stats entry to _stats_history.csv file per interval

7e7f3bc

Address CR comments by Matt K

84bb2d3

Use get_current_response_time_percentile

d7c7656

Merge response times into _stats.csv file

44c3fff

mehta-ankit force-pushed the csvAppend branch 2 times, most recently from d208867 to bd8e973 Compare December 3, 2019 19:40

Make csv_history file available for download on web UI

f0c6faa

mehta-ankit force-pushed the csvAppend branch from bd8e973 to f0c6faa Compare December 3, 2019 19:49

mehta-ankit changed the title ~~Stats: New argument "--csv-append" appends instead of replacing~~ Stats: New argument "--csv-full-history" appends stats entries every interval in a new "_stats_history.csv" File Dec 5, 2019

cyberw merged commit 3ea6822 into locustio:master Dec 5, 2019

This was referenced Dec 16, 2019

Use response_time_percentile for <name>_stats.csv file instead of current_response_time_percentile #1197

Merged

Fix percentiles printed in _stats.csv file #1198

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stats: New argument "--csv-full-history" appends stats entries every interval in a new "_stats_history.csv" File #1146

Stats: New argument "--csv-full-history" appends stats entries every interval in a new "_stats_history.csv" File #1146

mehta-ankit commented Nov 14, 2019 •

edited

Loading

mehta-ankit commented Nov 14, 2019

codecov bot commented Nov 14, 2019 •

edited

Loading

mehta-ankit commented Nov 15, 2019

mehta-ankit commented Nov 15, 2019

mehta-ankit commented Nov 15, 2019 •

edited

Loading

heyman commented Nov 15, 2019

mehta-ankit commented Nov 18, 2019

mehta-ankit commented Nov 18, 2019

heyman commented Nov 18, 2019

mehta-ankit commented Nov 21, 2019

mehta-ankit commented Nov 22, 2019 •

edited

Loading

mehta-ankit commented Nov 22, 2019

mckornfield left a comment

heyman commented Nov 25, 2019

heyman commented Nov 25, 2019

heyman commented Nov 25, 2019

mehta-ankit commented Nov 25, 2019

mehta-ankit commented Nov 25, 2019 •

edited

Loading

mehta-ankit commented Dec 2, 2019 •

edited

Loading

mehta-ankit commented Dec 4, 2019

cyberw commented Dec 5, 2019

mehta-ankit commented Dec 5, 2019 •

edited

Loading

cyberw commented Dec 5, 2019

cyberw commented Dec 5, 2019

mehta-ankit commented Dec 5, 2019 •

edited

Loading

cyberw commented Dec 5, 2019

mehta-ankit commented Dec 5, 2019

Stats: New argument "--csv-full-history" appends stats entries every interval in a new "_stats_history.csv" File #1146

Stats: New argument "--csv-full-history" appends stats entries every interval in a new "_stats_history.csv" File #1146

Conversation

mehta-ankit commented Nov 14, 2019 • edited Loading

mehta-ankit commented Nov 14, 2019

codecov bot commented Nov 14, 2019 • edited Loading

Codecov Report

mehta-ankit commented Nov 15, 2019

mehta-ankit commented Nov 15, 2019

mehta-ankit commented Nov 15, 2019 • edited Loading

heyman commented Nov 15, 2019

mehta-ankit commented Nov 18, 2019

mehta-ankit commented Nov 18, 2019

heyman commented Nov 18, 2019

mehta-ankit commented Nov 21, 2019

mehta-ankit commented Nov 22, 2019 • edited Loading

mehta-ankit commented Nov 22, 2019

mckornfield left a comment

Choose a reason for hiding this comment

heyman commented Nov 25, 2019

heyman commented Nov 25, 2019

heyman commented Nov 25, 2019

mehta-ankit commented Nov 25, 2019

mehta-ankit commented Nov 25, 2019 • edited Loading

mehta-ankit commented Dec 2, 2019 • edited Loading

mehta-ankit commented Dec 4, 2019

cyberw commented Dec 5, 2019

mehta-ankit commented Dec 5, 2019 • edited Loading

cyberw commented Dec 5, 2019

cyberw commented Dec 5, 2019

mehta-ankit commented Dec 5, 2019 • edited Loading

cyberw commented Dec 5, 2019

mehta-ankit commented Dec 5, 2019

mehta-ankit commented Nov 14, 2019 •

edited

Loading

codecov bot commented Nov 14, 2019 •

edited

Loading

mehta-ankit commented Nov 15, 2019 •

edited

Loading

mehta-ankit commented Nov 22, 2019 •

edited

Loading

mehta-ankit commented Nov 25, 2019 •

edited

Loading

mehta-ankit commented Dec 2, 2019 •

edited

Loading

mehta-ankit commented Dec 5, 2019 •

edited

Loading

mehta-ankit commented Dec 5, 2019 •

edited

Loading