.SD vignette based on SO answer #3412

MichaelChirico · 2019-02-17T11:50:17Z

Encouraged to include as vignette:

I want to be sure to add a bit about chaining since I mentioned but forgot to include that in the original answer.

Another hold-up is the use of Lahman data. I guess I can try and find a URL instead of adding a Suggests for that package...

The text was updated successfully, but these errors were encountered:

arunsrinivasan · 2019-02-17T20:12:19Z

Sounds great! Why not go for a "special symbols" vignette altogether? Could you not use the same flights data we already use though?

jangorecki · 2019-02-18T04:53:40Z

Vignette only about special symbols sounds cryptic. Maybe vignette about complex queries, which will cover special symbols and tricks like { inside j, or newcol := { ... }. Printing from j. Grouping columns being length 1 inside j. Combining columns with .SD list c(list(col=f1(col1)), lapply(.SD, f2), list(col5=f3(col1))).

Henrik-P · 2019-02-18T17:15:11Z

Because the choice of data is mentioned: in general, I think it's better with minimal data sets in examples and vignettes - so much easier to track results following each step of code. To be honest, I don't think we need 253316 rows (flights) or 44963 (Pitching) to explain and demonstrate the actual functionality of .SD (or other data.table functions for that sake) ;)

Benchmarking on large data sets could be provided in a separate vignette.

Just my 2c.

MichaelChirico · 2019-04-03T04:24:11Z

@Henrik-P I have to disagree.

253K and 45K rows are tiny in the grand scheme of things. They are trivially small to fread:

system.time(fread('vignettes/flights14.csv'))
#   user  system elapsed 
#  0.102   0.012   0.037

library(data.table)
library(Lahman)
Pitching = as.data.table(Pitching)
Pitching = Pitching[ , .(playerID, yearID, teamID, W, L, G, ERA)]

tmp = tempfile()
system.time({fwrite(Pitching, tmp); fread(tmp)})
#    user  system elapsed 
#   0.020   0.002   0.023

Could we use a subset of the data? Sure, if we're careful to preserve reproducibility. In terms of "tracking results", this is easily superable -- vignette writer can drill down to a specific group as needed.

Practical examples are strictly superior to contrived ones. Using foo and bar IMO is the quickest way to ensure a vignette stays unread. flights ✈️ data is pretty universal & easy to understand... Lahman baseball ⚾️ data is a bit more niche but I don't think the basic analysis gets too abstruse and things like ERA can be meaningfully conveyed a few words.

jangorecki · 2019-04-03T04:54:58Z

They are not tiny for human eye, I think this is what @Henrik-P mean. If we can have 20-40 rows data to present same functionality, then it will be easier for readers to follow.

MichaelChirico · 2019-05-09T05:58:31Z

I'm going to stick with the original data. I think small data is good for the examples section in documentation; vignettes (to me) are about telling/writing a story, a sort of interactive blog post.

Since Lahman is on Github, I'll just download.file/load the data & direct users to the package & original website as proper citation.

jangorecki added the documentation label Feb 17, 2019

MichaelChirico self-assigned this Apr 3, 2019

MichaelChirico pushed a commit that referenced this issue May 18, 2019

Closes #3412 -- adds .SD vignette from SO answer

024c891

MichaelChirico mentioned this issue May 18, 2019

New vignette -- Usages of .SD #3572

Merged

mattdowle added this to the 1.12.4 milestone May 22, 2019

mattdowle closed this as completed in #3572 May 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.SD vignette based on SO answer #3412

.SD vignette based on SO answer #3412

MichaelChirico commented Feb 17, 2019

arunsrinivasan commented Feb 17, 2019 •

edited

Loading

jangorecki commented Feb 18, 2019

Henrik-P commented Feb 18, 2019 •

edited

Loading

MichaelChirico commented Apr 3, 2019

jangorecki commented Apr 3, 2019

MichaelChirico commented May 9, 2019

.SD vignette based on SO answer #3412

.SD vignette based on SO answer #3412

Comments

MichaelChirico commented Feb 17, 2019

arunsrinivasan commented Feb 17, 2019 • edited Loading

jangorecki commented Feb 18, 2019

Henrik-P commented Feb 18, 2019 • edited Loading

MichaelChirico commented Apr 3, 2019

jangorecki commented Apr 3, 2019

MichaelChirico commented May 9, 2019

arunsrinivasan commented Feb 17, 2019 •

edited

Loading

Henrik-P commented Feb 18, 2019 •

edited

Loading