-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raw data from stats page #4654
Raw data from stats page #4654
Conversation
1c89392
to
e1eb2e6
Compare
Generated by 🚫 Danger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!!!!!!!!
Hi @jywarren, what else would we potentially like to download from the page. I am currently have |
Hi @cesswairimu this is great! What does the output look like? Is it statistical output like the # per week, or per day, or is it the full raw data? I think @ebarry will be in on Monday and you can ask her what's most useful here. I think probably the count per period is more interesting than the full raw data of all the actual content... I think for the time range, we should try to offer these for the range page periods, does that make sense? Calculating it for the whole site might be... stress on the system. For types of data, perhaps a count per period of:
Also note that we may want to decide on a "bin" size to start with - so we can say I think @milaaraujo will be doing some caching work soon and may be a good person to connect with as you think about optimizing some of these queries, and note this guide: https://guides.rubyonrails.org/caching_with_rails.html Thanks! |
The output is currently raw data I see that count would be more helpful. I will change it to that as we wait for @ebarry to give us more direction on this. Will also checkout the caching blog and talk to @milaaraujo .Thanks @jywarren |
ok i see what you mean about "binning" -- for reference, in the past, i have downloaded by one month, three month, and yearly timespans. I have never downloaded by day. However, to get the graphs that @skilfullycurled created about which days of the week are busiest (https://publiclab.org/evaluation#Online+analytics), that would require the full raw data. |
Hey everyone. Wow! So much data lately!! If I'm understanding the questions (put very roughly):
Full disclosure: this next part I write only with the understanding of what it's like to take a csv file and manipulate it, and therefore I recognize that not all of these ideas will be technically feasible due to system constraints or to amount of programming required. On the surface, it seems like anyone using the data would want as much as possible and then later decide what periods they want to aggregate it into. However, pre-aggregated periods of time would lower the bar for someone just getting started be it through programming or using a spreadsheet program, or needs a "quick and dirty" way to make a graph for some presentation. For someone like myself, I want as much data as possible because I don't know what's interesting until I explore the data. I wanted to explore things such as the distribution of counts per user per day. So what I did was export the different tables as csv's and then join on keys as needed. The schema.rb file was very helpful as was the mysql back end of the database when I installed it. Would it be possible to have the counts as you have suggested above but also a page that lists accessible tables (e.g. not the one's that have identifying data like email addresses) with csv download links for each table and let the analyst handle it from there? The size of the wiki edits csv that I have for all edits from 2011 to summer of 2016 with the unix timestamp, nid, vid, uid, and title is only 1.2 MB. |
Thanks so much @ebarry and @skilfullycurled for your input on this.
|
Yes, I think we're on the same (html) page with what I meant. M overarching idea is to have a place where the most comprehensive dataset can be retrieved while requiring the least amount of strain on both the developers and servers. Of course, some of the tables will have to be sanitized, and to the extent that it's not very labor intensive to exclude unnecessary things like session tokens, then that'd be great but in an ideal world, the developer would just export to csv and let person downloading sort through the rest. For example: Suppose someone wants to have a list of tags, their counts, and creation date. The site provides (these tables may have changed) the community_tags and term_data tables as downloadable csv's, and it's up to them to do a join on "tid" to create a dataset with the tid, date, name, and count. Happy to help with the documentation at some point. With regard to which data is needed I could use some clarification on where the idea stands first. Is the idea that you choose a specific start and end date or a start and end month and/or year? |
Yes @skilfullycurled the idea is to choose a specific start date and end date and have an option of downloading the data created within that range. I would also like to hear @gauravano @jywarren @SidharthBansal point on this just to make sure I am not going out of scope code-wise |
Got it. Thanks @cesswairimu. I ask because if it's one specific date to the next then I think the question is, what type of user is the data geared towards? Will that user know how to aggregate it by other means? I think it'd be easy enough to have some documentation with how to do that in Excel. Also, I don't know if this would save you any computational resources, but I think you could just provide start and end windows by month and year. Again, not knowing how the server resources work, I can't think of a reason why you'd need to download exact dates like you're booking a flight. |
e1eb2e6
to
1523829
Compare
ad848a4
to
da19418
Compare
@skilfullycurled kindly take a look at the new code here https://unstable.publiclab.org/stats/json and see if its anything close to what you had in mind... for now only |
6b13ae5
to
b58669f
Compare
b58669f
to
c6a0306
Compare
200cf37
to
f324bf6
Compare
|
75219ee
to
40a688e
Compare
looks great! I think we could stick with the basic `btn-default` white bg
buttons, thanks!
…On Tue, Feb 5, 2019 at 12:31 PM Cess ***@***.***> wrote:
[image: raw-range]
<https://user-images.githubusercontent.com/17081074/52292022-d061c100-2984-11e9-9a01-3c6f12508ede.png>
@jywarren <https://github.com/jywarren> how does that position look? Also
any ideas on button color?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4654 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABfJ1K0IsKliAu1lczzIkrBQRCzcicoks5vKb_SgaJpZM4aHFVI>
.
|
Thanks..we restrict this download to admins alone? |
Perhaps to start with, yes.
…On Tue, Feb 5, 2019 at 12:48 PM Cess ***@***.***> wrote:
Thanks..we restrict this download to admins alone?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4654 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABfJ4S0fXT4pINBtDQvxfMZN7eyr8slks5vKcPKgaJpZM4aHFVI>
.
|
@jywarren I asked on slack but I am not sure if you understood my question.. On how I can refactor the code climate issue |
Oh sorry! First, the buttons look beautiful! I just went ahead and approved it. CodeClimate recs are helpful but we needn't follow every single one. Thanks! Is this ready then? |
Yeah its ready |
* Download data as json * add comments to stats * download stats with month ranges * add maps as download content and style * implement downlod as csv * move download logic to range page * resctrict download of stats to admin
Fixes #963
rake test
@publiclab/reviewers
for help, in a comment below