-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stats Download And Site Overload #5524
Comments
Some other info: Basically, sometime between 4/26/13 and 1/1/14 we either became immensely popular, or we had a ridiculously large number of spam signups. Some figures: 1/1/2013: 1356998400 4/25/2013: 1366848000 4/26/2013: 1366934400 1/1/14: 1388534400 |
Just on the performance/slowness portion, it could be useful to look at
https://oss.skylight.io/app/applications/GZDPChmcfm1Q/recent/6h/endpoints
and see if it lines up with your queries? and looping in @icarito too!
…On Thu, Apr 18, 2019 at 1:48 PM skilfullycurled ***@***.***> wrote:
Some other info: Basically, sometime between 4/26/13 and 1/1/14 we either
became immensely popular, or we had a ridiculously large number of spam
signups.
Some figures:
*1/1/2013:* 1356998400
*4/24/2013:* 1366847999
*UID Range:* 59296 - 59296
*Users:* 12020
*4/25/2013:* 1366848000
*4/25/2013:* 1366934399
*UID Range:* 59297 - 59626
*Users: 330*
*4/26/2013:* 1366934400
*1/1/2014:* 1388534400
*UID Range:* 59627 - 420114
*Users:* 360466
*1/1/14:* 1388534400
*1/24/14:* 1398297600
*UID Range:* 420115- 422688
*Total:* 2572
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#5524 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAF6J7GT24YVORQGITXSM3PRCX53ANCNFSM4HG6XQHA>
.
|
Hmm, that could have been either in the final days of the Drupal site, or
before/after some change in our login sequence!
…On Thu, Apr 18, 2019 at 4:04 PM Jeffrey Warren ***@***.***> wrote:
Just on the performance/slowness portion, it could be useful to look at
https://oss.skylight.io/app/applications/GZDPChmcfm1Q/recent/6h/endpoints
and see if it lines up with your queries? and looping in @icarito too!
On Thu, Apr 18, 2019 at 1:48 PM skilfullycurled ***@***.***>
wrote:
> Some other info: Basically, sometime between 4/26/13 and 1/1/14 we either
> became immensely popular, or we had a ridiculously large number of spam
> signups.
>
> Some figures:
>
> *1/1/2013:* 1356998400
> *4/24/2013:* 1366847999
> *UID Range:* 59296 - 59296
> *Users:* 12020
>
> *4/25/2013:* 1366848000
> *4/25/2013:* 1366934399
> *UID Range:* 59297 - 59626
> *Users: 330*
>
> *4/26/2013:* 1366934400
> *1/1/2014:* 1388534400
> *UID Range:* 59627 - 420114
> *Users:* 360466
>
> *1/1/14:* 1388534400
> *1/24/14:* 1398297600
> *UID Range:* 420115- 422688
> *Total:* 2572
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#5524 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAAF6J7GT24YVORQGITXSM3PRCX53ANCNFSM4HG6XQHA>
> .
>
|
Here's the time period. Maybe this request (2.1 min) was for searching/aggregating the users so the website charts and figures could be updated, and then this request (7.4 min) was the csv download? |
Hi everyone, circling back on this since I'm doing some planning on some work I'd like to try to do this summer. This doesn't replace the caching issue, but I thought one way to get around download overload is by creating pre-made csv/json files for every six months. It's not like the data is going to change. If people are into it, should I make a new issue or keep it here? I could use some discussion around implementation and how to break it down into steps. |
Bringing in @icarito as comeuppance for (not entirely unfounded) accusations of stats misuse on the 27th of May, 2019. ; ) Kidding aside, wondering about the idea of pre-packaged 6 mo json/csv's downloads. This doesn't take care of the other problem of when someone just wants to view large sets of data which I've brought into the discussion on here. Even if it's a reasonable time period, choosing one that happens to include an unusually large set of data, may still overload the site. Side question, how are we to test solutions (even on unstable) which tend to break the site without breaking the site? |
Re: testing, what are the drawbacks of testing on stable/unstable, even to the point of breaking those sites? Thanks! |
As I am not the one who will have to restart the sites (cough, cough, @icarito eh-hem, sorry got something stuck in my throat) I don't know. Having said that, I wrote this issue when we had less information about rsessions and the spam discussion. So, testing may not be an issue once rsessions is removed. #5817 (comment) We'll have an opportunity to find out since we (@cesswairimu and I) weren't sure if it was just the date issue or the large user issue as well that was giving her trouble with the "all time" query #5904. I'm going to be gone for the next two days but I'm adding @cesswairimu to #5817 and as soon as @icarito is finished then she can give it a try...? If there's still a problem then we can be more aggressive on planning the removal of spam users from the chunk of ~350,000 and see how that helps with the overload issue. |
Hi, last night, after we had sorted out the date issue with #5490, I went to download the rest of the data. While I was downloading 2/26/13 - 07/01/16, all of the downloads went smoothly, except when I downloaded the users, it took a very long time (relative to other downloads of users I've done) and it broke the site. (For what it's worth to diagnosis, before I did that, I had tried to download 1/10/2010 - 4/24/13).
Anyway, after the site went back up, I decided to try a smaller date, 4/26/13 - 1/1/14 (7 mo). Everything worked as expected but I noticed something about the download. The file I downloaded a while back for 2.5 years (7/1/16 - ~4/2019 is 71MB), but the file for 6 mo. was 91MB for just those 6 mo! The JSON file might have even been more. I didn't download it but my experience has been that they can be larger just by their nature.
Aside from figuring out the downloads issue, this has led me to wonder if it would be better to simply have zipped files prepared by year for large downloads. If people want the whole archive, they download each year, and then use the interface to download the rest of the year they are in. A zipped version of the 91MB CSV brought it down to 31MB.
(Forgot to mention #3498 which is where the larger conversation about the stats feature is taking place.)
The text was updated successfully, but these errors were encountered: