Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change constructor of the FilteredData to not depend on TealData #33

Closed
5 tasks done
Tracked by #32 ...
gogonzo opened this issue May 18, 2022 · 2 comments · Fixed by #49
Closed
5 tasks done
Tracked by #32 ...

Change constructor of the FilteredData to not depend on TealData #33

gogonzo opened this issue May 18, 2022 · 2 comments · Fixed by #49
Assignees
Labels
core enhancement New feature or request refactor

Comments

@gogonzo
Copy link
Contributor

gogonzo commented May 18, 2022

part of insightsengineering/NEST-roadmap#32

  • Initialization of the FilteredData should not depend on a TealData nor JoinKeys nor CodeClass. Initialize should require list of data, list of join keys.
  • Initialization of the FilteredDataset should not depend on a TealDataset but on a class of the dataset (data.frame or MultiAssayExperiment). FilteredDataset should receive dataname, (dataset) label, primary keys, col labels, metadata, parent.
  • FilteredDataset should not hold TealDataset but raw data and it's attributes.
  • We don't need to keep copy of the var_labels in the class - they are stored in column attributes and they are not dropped on filter nor on merge (only on mutate).
  • CodeClass should not be included in the FilteredData. But it's "silently" used in get_rcode to combine reproducible code. We need to find the way how to assure reproducibility in the teal_module(s) (to be addressed later)
  • Make sure the distinction between cdisc and non cdisc data.frames are handled correctly

So by default it should be possible to initialize FilterPanel using following:

init_filtered_data(
  data = list(
    adsl = synthetic_cdisc_data("latest")$adsl
    adtte = synthetic_cdisc_data("latest")$adtte
    mae = MultiAssayExperiment::miniACC
  ), 
  join_keys = list(...)
)

image

Things to resolve:

  • how to pass attributes of the dataset along with the list of data
@nikolas-burkoff
Copy link
Contributor

nikolas-burkoff commented Jun 10, 2022

In this post when I say FilteredData - I mean the object which contains the data used within the shiny app - at the moment this is the FilteredData object itself but in the future can be the data object which is passed into modules.

how to pass attributes of the dataset along with the list of data

If we specify data as a named list then we also need to specify: metadata, label and (primary) keys to FilteredDataset and then join_keys and (for cdisc) parent to FilteredData - as discussed with @kpagacz and @gogonzo we're not having CDISCFilteredDataset...

  1. The primary keys could be incorporated into the join_keys argument (lets rename it keys?) then you could do this:
df1 <- data.frame(id = 1:10, x = 1:10)
df2 <- data.frame(id2 = 1:10, y = 1:10)

init_filtered_data(
  data = list(
    DF1 = df1, DF2 = df2
  ),
  keys = list(primary = list(df1 = "id", df2 = "id2")), # or primary_keys and join_keys separately if we prefer
  metadata = list(df1 = list(author = "me"))   # optional
  label = list(df1 = "some_label") # optional
)

and

data <- list(
    adsl = synthetic_cdisc_data("latest")$adsl,
    adtte = synthetic_cdisc_data("latest")$adtte,
    mae = MultiAssayExperiment::miniACC,
  ), 

init_cdisc_filtered_data(
  data = data
  keys = get_cdisc_keys(names(data))), # would also give join_keys but need helper functions to adapt for non standard cases
  metadata = list(adsl = ...), # optional
  parent = function(dataname) if (dataname == "ADSL") character(0) else "ADSL",   # default
  label = ...  #optional
)

2 )But this is a little strange as parent, metadata belong with each dataset so you could do it this way:

init_filtered_data(
  DF1 = list(df1, label = "boo", metadata = list(), key = _ ,
  DF2 = ...,
  keys = ...
)

2+) But then you might as well wrap each one up into a TealDataset object and then you can specify FilteredData in exactly the same way as you specify a TealDataset in teal_data - except you cannot have connectors only TealDatasets. (Maybe you can simplify TealDataset if you just pass in a raw dataset and don't care about reproducility etc.) you also get the structure and validation as well. This then keeps a link between teal.slice and teal.data though you could pass a TealDataset directly into a module

  1. Alternatively we could create structures to more easily handle the schema:
init_filtered_data(
  data = list(DF1 = df1, DF2 = df2)
  schema = schema(
    keys = join_key = list() # or JoinKey object
    parents = ...
  )
)

And this schema could be read from say chevron's yaml file...but presumably you would want to use the same code to specify a teal_data object as well.

I guess a nice thing about doing it this way is that modules won't nec need the schema part (especially if no merge or merge happens before) and could just take the data - but maybe we don't want that actually.


For this issue I suspect we'll do 1) as it's easier and can be changed in the future but something feels a bit strange here - having two different ways of specifying the same thing - 1) if you want to use teal::init and teal_data to specify your data/schema/metadata and a different way of specifying it if you want to use the data by itself (even if you need to specify the same things).

@gogonzo
Copy link
Contributor Author

gogonzo commented Jun 13, 2022

@nikolas-burkoff I like (2) as dataset attributes goes with dataset. I'd prefer the object+attributes instead of list to be consistent with this.
I like (1) also as it's easy to explain in the documentation.

Not having CDISCFilteredDataset is a good thing, let's stick to have FilteredDatasets dispatched by raw object.

In a long term we expect FIlteredData to be managed by the API. We will have a FIlterStates object which holds all current states and filter_panel_srv/filter_panel_ui which renders and manages filterstates. FilterStates can be also changed in other places.

quosure-FilteredData

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core enhancement New feature or request refactor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants