Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

analyze function that collects various statistics #4478

Draft
wants to merge 34 commits into
base: master
Choose a base branch
from
Draft

Conversation

jangorecki
Copy link
Member

@jangorecki jangorecki commented May 23, 2020

Built on top of lazy-forder branch, diff will include lazy-forder diff as well!
For useful diff see https://github.com/Rdatatable/data.table/compare/lazy-forder...analyze?expand=1
After merging lazy-forder will need rebase.
Closes #2879

x = data.table(id1=1:5, id2=c(NA,2:5), id3=rep(NA_integer_, 5L), id4=c("a","b","c","b","a"), id5=c("a","b","\u221A","d","e"), v1=1L/2, v2=c(NaN,2:5/2), v3=c(1:4/2, NaN), v4=c(-Inf,2:4,Inf))
cols = list("id1", "id2","id3", c("id1","id2"), "id4", "id5", c("id4","id5"), "v1", "v2", "v3", "v4")

a = analyze(x, cols, flat=TRUE)
a
#       cols ncols             coltype            colclass sorted uniqueN maxgrpN  anyNA anyInfNaN anyNotASCII anyNotUTF8  anyNF unique  const  allNA
#     <char> <int>              <char>              <char> <lgcl>   <int>   <int> <lgcl>    <lgcl>      <lgcl>     <lgcl> <lgcl> <lgcl> <lgcl> <lgcl>
# 1:     id1     1             integer             integer   TRUE       5       1  FALSE     FALSE       FALSE      FALSE  FALSE   TRUE  FALSE  FALSE
# 2:     id2     1             integer             integer   TRUE       5       1   TRUE     FALSE       FALSE      FALSE   TRUE   TRUE  FALSE  FALSE
# 3:     id3     1             integer             integer   TRUE       1       5   TRUE     FALSE       FALSE      FALSE   TRUE  FALSE   TRUE   TRUE
# 4: id1,id2     2     integer,integer     integer,integer   TRUE       5       1   TRUE     FALSE       FALSE      FALSE   TRUE   TRUE  FALSE  FALSE
# 5:     id4     1           character           character  FALSE       3       2  FALSE     FALSE       FALSE      FALSE  FALSE  FALSE  FALSE  FALSE
# 6:     id5     1           character           character  FALSE       5       1  FALSE     FALSE        TRUE      FALSE  FALSE   TRUE  FALSE  FALSE
# 7: id4,id5     2 character,character character,character  FALSE       5       1  FALSE     FALSE        TRUE      FALSE  FALSE   TRUE  FALSE  FALSE
# 8:      v1     1              double             numeric   TRUE       1       5  FALSE     FALSE       FALSE      FALSE  FALSE  FALSE   TRUE  FALSE
# 9:      v2     1              double             numeric   TRUE       5       1  FALSE      TRUE       FALSE      FALSE   TRUE   TRUE  FALSE  FALSE
#10:      v3     1              double             numeric  FALSE       5       1  FALSE      TRUE       FALSE      FALSE   TRUE   TRUE  FALSE  FALSE
#11:      v4     1              double             numeric   TRUE       5       1  FALSE      TRUE       FALSE      FALSE   TRUE   TRUE  FALSE  FALSE

a = analyze(x, cols) ## flat=FALSE returns first and last rows as nested DT

No tests yet. No manual yet. I will follow up with this after some initial feedback.

@jangorecki jangorecki added the WIP label May 23, 2020
@codecov
Copy link

codecov bot commented May 23, 2020

Codecov Report

Merging #4478 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4478   +/-   ##
=======================================
  Coverage   99.42%   99.42%           
=======================================
  Files          72       72           
  Lines       14153    14153           
=======================================
  Hits        14071    14071           
  Misses         82       82           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9382cb8...9382cb8. Read the comment docs.

@MichaelChirico MichaelChirico marked this pull request as draft February 19, 2024 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

collect more statistics about the data
2 participants