Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats downloading returns "Page does not exist" for dates prior to early 2013 #5490

Closed
skilfullycurled opened this issue Apr 15, 2019 · 56 comments
Labels
bug the issue is regarding one of our programs which faces problems when a certain task is executed help wanted requires help by anyone willing to contribute
Milestone

Comments

@skilfullycurled
Copy link
Contributor

skilfullycurled commented Apr 15, 2019

I was trying to download stats from the beginning of the site (which was sometime in 2010) until July of 2016 and I received a "Page does not exist" error. I did some experimentation and while I'm not sure of the exact date before which this occurs, I can say it is sometime between 01-01-2013 and 01-07-2013 (in stats page date format DD-MM-YYYY). For what it's worth to the diagnosis of the problem, I was "only" downloading up until July of 2016.

Let me know if there's anything else I can tell you (FF 66.0.2, macOS 10.12.6).

Oh, a possible reference where this could be merged: Raw data from stats page, #4654

NOTE for high impact URLs which could slow the main website down, if you want to test them out, please use stable.publiclab.org instead of publiclab.org and you'll only slow or break the stable test server, which should have very similar code and data. (note by @jywarren)

@cesswairimu
Copy link
Collaborator

cesswairimu commented Apr 17, 2019

🤔 Will take a look at why this is happening thanks @skilfullycurled

@cesswairimu
Copy link
Collaborator

@skilfullycurled when did you receive the error when submitting the date ranges or the data was returned fine and this occurred when you were trying to "download as"?

@skilfullycurled
Copy link
Contributor Author

No, @cesswairimu, it was the search itself that returned the error.

@skilfullycurled
Copy link
Contributor Author

Oh, I forgot about that date. @cesswairimu, we're never supposed mention that date or even speak about what happened on it. It's sort of the Voldemort of dates.

But seriously folks.

Very interesting.

I'll download around it.

@cesswairimu
Copy link
Collaborator

Aha gotcha sorry will delete the comment

@cesswairimu
Copy link
Collaborator

@skilfullycurled maybe we can close this now since its not a code issue?

@jywarren
Copy link
Member

lol are you folks having too much fun in here? Voldemort dates? 🔮 🤐 🙊 I hope Cess has seen Harry Potter and knows you're joking? Cess, @skilfullycurled has a strange sense of humor please don't hold it against him.

No but for real, thanks for looking into this. I'm sure it was a tough one to track down esp. given the mysterious date.

@cesswairimu
Copy link
Collaborator

😆 😆

@skilfullycurled
Copy link
Contributor Author

Oh my gosh. @cesswairimu, I'm so sorry. So sorry. I feel terrible. I was hoping to convey I was kidding when I said "but seriously folks". PS: @jywarren, I'll have you know that all the people who consistently laugh at my jokes think I have a great sense of humor.

@skilfullycurled
Copy link
Contributor Author

Actually, on a legitimately serious note, I'll try to be a little more clear humor. I've seen enough furrowed brows to readily admit that my sense of humor can be confusing even in person because I would have said that with a completely straight face.

@cesswairimu
Copy link
Collaborator

cesswairimu commented Apr 17, 2019

@skilfullycurled no its fine, blame it on having english as my second language 😆

@skilfullycurled
Copy link
Contributor Author

I appreciate that. In the meantime, I should probably learn to chill out just a bit when I first meet people.

It looks like you actually deleted the comment with the date, so for the record and people seeing this trying to track down a bug, it was April 25, 2013 I think...? Also, I'd still like to know if there's a reason! What I really thought was funny was that there would be one specific day prior to, and after, everything would work fine, but just not that day. So please if anyone knows, do end the mystery!

@jywarren
Copy link
Member

jywarren commented Apr 17, 2019 via email

@skilfullycurled
Copy link
Contributor Author

And I am a great admirer of yours, too!

@skilfullycurled
Copy link
Contributor Author

skilfullycurled commented Apr 18, 2019

Well, not great news.

I got the "That page does not exist" for a new date set: 01-01-2012 - 01-01-2013.

@skilfullycurled
Copy link
Contributor Author

Oh, also, the users.csv download for 01-01-2013 until 25-04-2013 is returning both incorrect data and incomplete data. It returns 154 users and those users are random from UID 1 to 58354.

@skilfullycurled
Copy link
Contributor Author

Hey all, let me know if there is anything that I can do to try and diagnose the situation beyond simply telling you what dates aren't work. I just figured since I'm using the interface, I might as well be of some use!

@grvsachdeva grvsachdeva added bug the issue is regarding one of our programs which faces problems when a certain task is executed help wanted requires help by anyone willing to contribute labels Apr 28, 2019
@skilfullycurled
Copy link
Contributor Author

Hi, returning to this conversation since I'm trying to plan a bit for the summer. Can I be of help on this? And if so, what would be the most systematic way to figure out which dates are causing the problem. And also, how can I avoid taking down the whole site? Do people use unstable?

@jywarren
Copy link
Member

No, unstable and stable are both fine to hammer on as much as you'd like!

As to how to debug, i'm not sure... we could try to pull logs for when it happens. Wait - let me link some Sentry issues and see if they shine any light?

@sentry-io
Copy link

sentry-io bot commented May 14, 2019

Sentry issue: PLOTS2-6H

@skilfullycurled
Copy link
Contributor Author

@jywarren, not sure if I did this right, but I just signed up for Sentry and requested access. I wasn't sure what permissions to request so I just requested the default ones that sentry provided for me. Let me know if I need to change anything.

@skilfullycurled
Copy link
Contributor Author

(also, thanks!)

@jywarren
Copy link
Member

Hmm, weird that you can't see the error log, it's supposed to be public! But, ok here it is:

ActionController::UnknownFormat: ActionController::UnknownFormat
  from action_controller/metal/mime_responds.rb:205:in `respond_to'
  from app/controllers/stats_controller.rb:141:in `format'
  from app/controllers/stats_controller.rb:110:in `tags'
  from action_controller/metal/basic_implicit_render.rb:6:in `send_action'

But i'm not convinced this is for the same page: It says this is for:

https://publiclab.org/stats/tags

@skilfullycurled
Copy link
Contributor Author

Just accepted the invite, this is awesome, thank you!

@skilfullycurled
Copy link
Contributor Author

Well, I've spent a fair amount of time with Sentry and and Skylight and I can't seem to find an error that is registered when I recreate the error on the site with the dates in question. I even turned on a VPN so that I could be confident that I would see the correct IP address but nothing appears in the events log in Sentry. Something might be happening in Skylight but I can't figure out how to see the timestamp of the errors. Thoughts?

@ebarry
Copy link
Member

ebarry commented May 30, 2019

I have also encountered this error when attempting to guestimate the start date for our website so that i could view data "For All Time".

I'm wondering if it's really a bug or perhaps maybe it could be worked around by people not having to guess when we started logging data for our website?

If it's the latter, here's an idea:
On the "Choose a Start Date" interface, could we add the exact start date of our website as a "quick pick" OR perhaps make an option to choose "for all time" as the period of inquiry?

@jywarren
Copy link
Member

@jywarren
Copy link
Member

Now, we are DD-MM-YYYY, so this should work: https://publiclab.org/stats?start=01-01-2012&end=25-04-2013

@jywarren
Copy link
Member

But it doesn't. Something similar perhaps:

[0b0b480c-084e-45a5-9805-ef990116fd68] Completed 404 Not Found in 4838ms (ActiveRecord: 1788.7ms)
[0b0b480c-084e-45a5-9805-ef990116fd68]   
[0b0b480c-084e-45a5-9805-ef990116fd68] ActiveRecord::RecordNotFound (Couldn't find Tag with 'tid'=3015):
[0b0b480c-084e-45a5-9805-ef990116fd68]   
[0b0b480c-084e-45a5-9805-ef990116fd68] app/models/tag.rb:65:in `block in nodes_frequency'
[0b0b480c-084e-45a5-9805-ef990116fd68] app/models/tag.rb:65:in `map'
[0b0b480c-084e-45a5-9805-ef990116fd68] app/models/tag.rb:65:in `nodes_frequency'
[0b0b480c-084e-45a5-9805-ef990116fd68] app/controllers/stats_controller.rb:36:in `block in range'
[0b0b480c-084e-45a5-9805-ef990116fd68] app/controllers/stats_controller.rb:19:in `range'
[0b0b480c-084e-45a5-9805-ef990116fd68] app/controllers/stats_controller.rb:41:in `index'

I think these range stats queries are just good at finding lonesome db records, because they try to gather ALL such records across huge swaths.

@jywarren
Copy link
Member

I'll find and kill that NodeTag record too...

@jywarren
Copy link
Member

OK, deleted lonesome NodeTag records pointing at non-existent tids of 3015 and 3016, and https://publiclab.org/stats?start=01-01-2012&end=25-04-2013 now works. Is that it, then? 🕵️‍♀️ 🕵

@skilfullycurled
Copy link
Contributor Author

Should be for this issue...PS: where in Sentry can you see the 404's? I thought they did not show them?

@skilfullycurled
Copy link
Contributor Author

oh, and thanks @jywarren! This is been like a splinter in my data finger. Thanks for tweez-ing it out!

@jywarren
Copy link
Member

jywarren commented Jun 18, 2019 via email

@skilfullycurled
Copy link
Contributor Author

Ah, I see, the 404's are still logged somewhere though. Good to know!

@jywarren
Copy link
Member

jywarren commented Jun 18, 2019 via email

@ebarry ebarry added this to the Metrics milestone Nov 16, 2019
@jywarren
Copy link
Member

https://stable.publiclab.org/stats?start=20-07-2010&end=20-10-2020 is still returning a 404 Page does not exist error, so i will leave this open for reference! Thanks!

@jywarren jywarren reopened this Oct 20, 2020
@jywarren
Copy link
Member

And noting @skilfullycurled's note on a "workaround date range" in #5904 (comment) -

this in #6050 means that the "all time" option will be only going back to Jan 1, 2014.

Next steps summarized by @skilfullycurled here:

#6050 (comment)

@jywarren
Copy link
Member

jywarren commented Oct 20, 2020

OK, just an update in looking for NodeTag records with no associated Tag record as @skilfullycurled mentioned in #6050, I found these tids to look at:

irb(main):010:0> NodeTag.where('date > 1366851600 AND date < 1366990345').size
=> 70
irb(main):011:0> NodeTag.where('date > 1366851600 AND date < 1366990345').collect(&:tid)
=> [1, 14, 125, 446, 578, 578, 579, 579, 1316, 2421, 3049, 3049, 3049, 3049, 3049, 3049, 3050, 3050, 3050, 3050, 3050, 3050, 3051, 3051, 3051, 3051, 3051, 3051, 3051, 3051, 3052, 3053, 3054, 3055, 3057, 3057, 3058, 3059, 3060, 3061, 3062, 3063, 3064, 3070, 3071, 3072, 3082, 3088, 3089, 3091, 3092, 3093, 3094, 3095, 3097, 3097, 3098, 3098, 3099, 3100, 3101, 3102, 3103, 3104, 3105, 3106, 3109, 3111, 3112, 3114]
irb(main):012:0> NodeTag.where('date > 1366851600 AND date < 1366990345').collect(&:tid).size
=> 70
irb(main):013:0> NodeTag.where('date > 1366851600 AND date < 1366990345').collect(&:tid).uniq.size
=> 48
irb(main):014:0> NodeTag.where('date > 1366851600 AND date < 1366990345').collect(&:tid).uniq
=> [1, 14, 125, 446, 578, 579, 1316, 2421, 3049, 3050, 3051, 3052, 3053, 3054, 3055, 3057, 3058, 3059, 3060, 3061, 3062, 3063, 3064, 3070, 3071, 3072, 3082, 3088, 3089, 3091, 3092, 3093, 3094, 3095, 3097, 3098, 3099, 3100, 3101, 3102, 3103, 3104, 3105, 3106, 3109, 3111, 3112, 3114]

indeed:

ActiveRecord::RecordNotFound (Couldn't find all Tags with 'tid': (1, 14, 125, 446, 578, 579, 1316, 2421, 3049, 3050, 3051, 3052, 3053, 3054, 3055, 3057, 3058, 3059, 3060, 3061, 3062, 3063, 3064, 3070, 3071, 3072, 3082, 3088, 3089, 3091, 3092, 3093, 3094, 3095, 3097, 3098, 3099, 3100, 3101, 3102, 3103, 3104, 3105, 3106, 3109, 3111, 3112, 3114) (found 46 results, but was looking for 48).)

I'll try figuring out which is missing. OK - only these two:

ActiveRecord::RecordNotFound (Couldn't find Tag with 'tid'=3088)
ActiveRecord::RecordNotFound (Couldn't find Tag with 'tid'=3089)

Deleting them.

@jywarren
Copy link
Member

https://stable.publiclab.org/stats?start=20-07-2010&end=20-10-2020 still shows a 404, so i'll keep looking for extra NodeTag records with no associated Tag record.

@jywarren
Copy link
Member

Hmm, i tried a bunch of ways like https://stackoverflow.com/questions/5319400/want-to-find-records-with-no-associated-records-in-rails, but didn't have much success --

NodeTag.includes(:tag).where(term_data: {name: nil})
=> #<ActiveRecord::Relation []>

I also just tried collecting ALL NodeTag tids and subtracting all valid Tag tids, which took a while:

nodetag_tids = NodeTag.select(&:tid).collect(&:tid).uniq
tag_tids = Tag.select(&:tid).collect(&:tid).uniq
tids_missing = nodetag_tids - tag_tids
=> []

So, doesn't that mean all NodeTags have valid tids?

Maybe we should look in Sentry for another kind of error now?

https://stable.publiclab.org/stats?start=20-01-2013&end=20-01-2014 shows 404
https://stable.publiclab.org/stats?start=20-01-2014&end=20-01-2015 loads fine

I searched the logs... didn't find much. Maybe this?

[ba658853-5c5b-4514-b69b-bbde1cf920c5] Processing by StatsController#index as */*
[ba658853-5c5b-4514-b69b-bbde1cf920c5]   Parameters: {"utf8"=>"✓", "options"=>"Week"}
[ba658853-5c5b-4514-b69b-bbde1cf920c5] Completed 500 Internal Server Error in 5ms (ActiveRecord: 0.0ms)
[ba658853-5c5b-4514-b69b-bbde1cf920c5] Sending event 79acc5f697a249f7aaddacfa69966bcf to Sentry
[ba658853-5c5b-4514-b69b-bbde1cf920c5]   
[ba658853-5c5b-4514-b69b-bbde1cf920c5] NoMethodError (undefined method `downcase' for nil:NilClass):
[ba658853-5c5b-4514-b69b-bbde1cf920c5]   
[ba658853-5c5b-4514-b69b-bbde1cf920c5] app/controllers/stats_controller.rb:150:in `to_keyword'
[ba658853-5c5b-4514-b69b-bbde1cf920c5] app/controllers/stats_controller.rb:20:in `range'
[ba658853-5c5b-4514-b69b-bbde1cf920c5] app/controllers/stats_controller.rb:48:in `index'

But that was on Oct 14th, six days ago. I can't seem to find the errors for today... strange.

@jywarren
Copy link
Member

Ah! was looking in the wrong directory! Got it!

[2c6a71d3-e104-44b7-98ab-44263e879b22] Processing by StatsController#index as HTML
[2c6a71d3-e104-44b7-98ab-44263e879b22]   Parameters: {"start"=>"20-01-2013", "end"=>"20-01-2014"}
[2c6a71d3-e104-44b7-98ab-44263e879b22] Completed 404 Not Found in 11475ms (ActiveRecord: 9149.4ms)
[2c6a71d3-e104-44b7-98ab-44263e879b22]   
[2c6a71d3-e104-44b7-98ab-44263e879b22] ActiveRecord::RecordNotFound (Couldn't find Tag with 'tid'=3088):
[2c6a71d3-e104-44b7-98ab-44263e879b22]   
[2c6a71d3-e104-44b7-98ab-44263e879b22] app/models/tag.rb:55:in `block in nodes_frequency'
[2c6a71d3-e104-44b7-98ab-44263e879b22] app/models/tag.rb:55:in `map'
[2c6a71d3-e104-44b7-98ab-44263e879b22] app/models/tag.rb:55:in `nodes_frequency'
[2c6a71d3-e104-44b7-98ab-44263e879b22] app/controllers/stats_controller.rb:40:in `block in range'
[2c6a71d3-e104-44b7-98ab-44263e879b22] app/controllers/stats_controller.rb:27:in `range'
[2c6a71d3-e104-44b7-98ab-44263e879b22] app/controllers/stats_controller.rb:55:in `index'

@jywarren
Copy link
Member

@jywarren
Copy link
Member

@jywarren
Copy link
Member

Yes, https://stable.publiclab.org/stats?start=20-01-2010&end=20-01-2020 now works too. Thanks, all - and should we make "all time" now really go back to 2010, @cesswairimu ?

#6050 - maybe we could modify this as an FTO and close this issue now?

@cesswairimu
Copy link
Collaborator

This is great 🎉 yeah that will be awesome. Creating an fto. Thanks Jeff.

@cesswairimu
Copy link
Collaborator

cesswairimu commented Oct 21, 2020

I've created an fto #8652.

I also think it would be a much faster to do .size on all data when getting 'all time' stats instead of using the where range clause (where timestamp btn 2010..NOW(). I will do a follow-up for that after the fto is done. Thanks everyone, closing this

@jywarren
Copy link
Member

Awesome, thanks Cess!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug the issue is regarding one of our programs which faces problems when a certain task is executed help wanted requires help by anyone willing to contribute
Projects
None yet
Development

No branches or pull requests

5 participants