Stats Download And Site Overload #5524

skilfullycurled · 2019-04-18T16:57:47Z

Hi, last night, after we had sorted out the date issue with #5490, I went to download the rest of the data. While I was downloading 2/26/13 - 07/01/16, all of the downloads went smoothly, except when I downloaded the users, it took a very long time (relative to other downloads of users I've done) and it broke the site. (For what it's worth to diagnosis, before I did that, I had tried to download 1/10/2010 - 4/24/13).

Anyway, after the site went back up, I decided to try a smaller date, 4/26/13 - 1/1/14 (7 mo). Everything worked as expected but I noticed something about the download. The file I downloaded a while back for 2.5 years (7/1/16 - ~4/2019 is 71MB), but the file for 6 mo. was 91MB for just those 6 mo! The JSON file might have even been more. I didn't download it but my experience has been that they can be larger just by their nature.

Aside from figuring out the downloads issue, this has led me to wonder if it would be better to simply have zipped files prepared by year for large downloads. If people want the whole archive, they download each year, and then use the interface to download the rest of the year they are in. A zipped version of the 91MB CSV brought it down to 31MB.

(Forgot to mention #3498 which is where the larger conversation about the stats feature is taking place.)

skilfullycurled · 2019-04-18T17:48:13Z

Some other info: Basically, sometime between 4/26/13 and 1/1/14 we either became immensely popular, or we had a ridiculously large number of spam signups.

Some figures:

1/1/2013: 1356998400
4/24/2013: 1366847999
UID Range: 59296 - 59296
Users: 12020

4/25/2013: 1366848000
4/25/2013: 1366934399
UID Range: 59297 - 59626
Users: 330

4/26/2013: 1366934400
1/1/2014: 1388534400
UID Range: 59627 - 420114
Users: 360466

1/1/14: 1388534400
1/24/14: 1398297600
UID Range: 420115- 422688
Total: 2572

jywarren · 2019-04-18T20:04:54Z

Just on the performance/slowness portion, it could be useful to look at https://oss.skylight.io/app/applications/GZDPChmcfm1Q/recent/6h/endpoints and see if it lines up with your queries? and looping in @icarito too!

…

On Thu, Apr 18, 2019 at 1:48 PM skilfullycurled ***@***.***> wrote: Some other info: Basically, sometime between 4/26/13 and 1/1/14 we either became immensely popular, or we had a ridiculously large number of spam signups. Some figures: *1/1/2013:* 1356998400 *4/24/2013:* 1366847999 *UID Range:* 59296 - 59296 *Users:* 12020 *4/25/2013:* 1366848000 *4/25/2013:* 1366934399 *UID Range:* 59297 - 59626 *Users: 330* *4/26/2013:* 1366934400 *1/1/2014:* 1388534400 *UID Range:* 59627 - 420114 *Users:* 360466 *1/1/14:* 1388534400 *1/24/14:* 1398297600 *UID Range:* 420115- 422688 *Total:* 2572 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#5524 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAF6J7GT24YVORQGITXSM3PRCX53ANCNFSM4HG6XQHA> .

jywarren · 2019-04-18T20:06:01Z

Hmm, that could have been either in the final days of the Drupal site, or before/after some change in our login sequence!

…

On Thu, Apr 18, 2019 at 4:04 PM Jeffrey Warren ***@***.***> wrote: Just on the performance/slowness portion, it could be useful to look at https://oss.skylight.io/app/applications/GZDPChmcfm1Q/recent/6h/endpoints and see if it lines up with your queries? and looping in @icarito too! On Thu, Apr 18, 2019 at 1:48 PM skilfullycurled ***@***.***> wrote: > Some other info: Basically, sometime between 4/26/13 and 1/1/14 we either > became immensely popular, or we had a ridiculously large number of spam > signups. > > Some figures: > > *1/1/2013:* 1356998400 > *4/24/2013:* 1366847999 > *UID Range:* 59296 - 59296 > *Users:* 12020 > > *4/25/2013:* 1366848000 > *4/25/2013:* 1366934399 > *UID Range:* 59297 - 59626 > *Users: 330* > > *4/26/2013:* 1366934400 > *1/1/2014:* 1388534400 > *UID Range:* 59627 - 420114 > *Users:* 360466 > > *1/1/14:* 1388534400 > *1/24/14:* 1398297600 > *UID Range:* 420115- 422688 > *Total:* 2572 > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#5524 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAAF6J7GT24YVORQGITXSM3PRCX53ANCNFSM4HG6XQHA> > . >

skilfullycurled · 2019-04-18T20:26:22Z

Here's the time period.

Maybe this request (2.1 min) was for searching/aggregating the users so the website charts and figures could be updated, and then this request (7.4 min) was the csv download?

skilfullycurled · 2019-05-12T21:10:12Z

Hi everyone, circling back on this since I'm doing some planning on some work I'd like to try to do this summer.

This doesn't replace the caching issue, but I thought one way to get around download overload is by creating pre-made csv/json files for every six months. It's not like the data is going to change.

If people are into it, should I make a new issue or keep it here? I could use some discussion around implementation and how to break it down into steps.

skilfullycurled · 2019-06-07T20:13:14Z

Bringing in @icarito as comeuppance for (not entirely unfounded) accusations of stats misuse on the 27th of May, 2019. ; )

Kidding aside, wondering about the idea of pre-packaged 6 mo json/csv's downloads. This doesn't take care of the other problem of when someone just wants to view large sets of data which I've brought into the discussion on here. Even if it's a reasonable time period, choosing one that happens to include an unusually large set of data, may still overload the site.

Side question, how are we to test solutions (even on unstable) which tend to break the site without breaking the site?

jywarren · 2019-06-18T03:34:47Z

Re: testing, what are the drawbacks of testing on stable/unstable, even to the point of breaking those sites? Thanks!

skilfullycurled · 2019-06-18T03:54:13Z

As I am not the one who will have to restart the sites (cough, cough, @icarito eh-hem, sorry got something stuck in my throat) I don't know. Having said that, I wrote this issue when we had less information about rsessions and the spam discussion. So, testing may not be an issue once rsessions is removed. #5817 (comment)

We'll have an opportunity to find out since we (@cesswairimu and I) weren't sure if it was just the date issue or the large user issue as well that was giving her trouble with the "all time" query #5904. I'm going to be gone for the next two days but I'm adding @cesswairimu to #5817 and as soon as @icarito is finished then she can give it a try...?

If there's still a problem then we can be more aggressive on planning the removal of spam users from the chunk of ~350,000 and see how that helps with the overload issue.

grvsachdeva added the discussion label Apr 28, 2019

cesswairimu mentioned this issue May 8, 2019

Caching of data #4138

Closed

This was referenced May 31, 2019

Stats page bug #5728

Closed

Stats downloading returns "Page does not exist" for dates prior to early 2013 #5490

Closed

icarito mentioned this issue Jun 8, 2019

Memory issues (leak?) investigation #5817

Closed

This was referenced Jun 11, 2019

Spam account detection/reduction planning #5450

Open

Add 'all time' option for stats #5904

Closed

cesswairimu mentioned this issue Jun 17, 2019

Stats Page Query Bug #5917

Open

4 tasks

skilfullycurled mentioned this issue Dec 13, 2019

Include Emails of Banned Users in Moderator Email Search #6962

Closed

5 tasks

stale bot added the stale label Oct 7, 2020

publiclab deleted a comment from stale bot Oct 8, 2020

stale bot removed the stale label Oct 8, 2020

skilfullycurled mentioned this issue Jan 13, 2021

Stats page overload mitigation, passwording next steps discussion #9002

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stats Download And Site Overload #5524

Stats Download And Site Overload #5524

skilfullycurled commented Apr 18, 2019 •

edited

Loading

skilfullycurled commented Apr 18, 2019

jywarren commented Apr 18, 2019 via email

jywarren commented Apr 18, 2019 via email

skilfullycurled commented Apr 18, 2019

skilfullycurled commented May 12, 2019

skilfullycurled commented Jun 7, 2019

jywarren commented Jun 18, 2019

skilfullycurled commented Jun 18, 2019

Stats Download And Site Overload #5524

Stats Download And Site Overload #5524

Comments

skilfullycurled commented Apr 18, 2019 • edited Loading

skilfullycurled commented Apr 18, 2019

jywarren commented Apr 18, 2019 via email

jywarren commented Apr 18, 2019 via email

skilfullycurled commented Apr 18, 2019

skilfullycurled commented May 12, 2019

skilfullycurled commented Jun 7, 2019

jywarren commented Jun 18, 2019

skilfullycurled commented Jun 18, 2019

skilfullycurled commented Apr 18, 2019 •

edited

Loading