[Feature Request]: Extend `within` to accept a custom parameter that links all lines to a dataname #200

averissimo · 2024-02-20T12:46:25Z

Feature description

Extend within.qenv(data, expr, links_to = NULL, ...) so that links_to parameter will hint the code parser

The goal is that all the code in expr is hinted to one dataname (?or more?).

The get_code below would output the same code.

library(teal.code)
library(teal.data)

data_within <- teal_data() |> within({
  options("AN_OPTION" = iris)
  IRIS <- getOption("AN_OPTION")
}, links_to = "IRIS")

datanames(data_within) <- "IRIS"

get_code(data_within, datanames = "IRIS") |> cat()
#> IRIS <- getOption("AN_OPTION")

data_eval <- teal_data() |> eval_code("
  options(\"AN_OPTION\" = iris) # @linksto IRIS
  IRIS <- getOption(\"AN_OPTION\")
")

datanames(data_eval) <- "IRIS"

get_code(data_eval, datanames = "IRIS") |> cat()
#> options(AN_OPTION = iris)
#> IRIS <- getOption("AN_OPTION")

^{Created on 2024-02-20 with reprex v2.0.2}

Code of Conduct

I agree to follow this project's Code of Conduct.

Contribution Guidelines

I agree to follow this project's Contribution Guidelines.

Security Policy

I agree to follow this project's Security Policy.

The text was updated successfully, but these errors were encountered:

chlebowa · 2024-02-20T12:56:30Z

Note that
a) we tend to put all code into one compound expression and this would necessitate multiple within calls;
b) due to the way values are injected into expr through ..., adding another named argument to within further restricts the variable names that can be used in the code passed to expr.

averissimo · 2024-02-20T13:49:37Z

Some context: this idea was inspired by working with the python app in teal.gallery, where the code setting the environment is not traced back to the dataname. Hence, an eval_code call needed to be used.

This is an idea/discussion on how to bring it to within.

a) we tend to put all code into one compound expression and this would necessitate multiple within calls;

If this avoids executing string as code I think it's an acceptable trade-off

b) due to the way values are injected into expr through ..., adding another named argument to within further restricts the variable names that can be used in the code passed to expr.

Valid, we could further minimize this by adding a dot as a prefix to avoid names clashing, similar to what tidyverse does.

within.qenv(data, expr, .links_to = NULL, ...)

averissimo · 2024-02-20T13:50:00Z

An alternative solution might be a validation of the completeness of the code.

So that when (pseudo-code below) is not empty, it returns all the code.

setdiff(get_code(data), lapply(datanames(data), \(x) get_code(data, datanames = x)) # pseudo-code!

That is, fallback to the full code on the code parser blindspots

chlebowa · 2024-02-20T14:33:38Z

An alternative solution might be a validation of the completeness of the code.

So that when (pseudo-code below) is not empty, it returns all the code.

setdiff(get_code(data), lapply(datanames(data), \(x) get_code(data, datanames = x)) # pseudo-code!

That is, fallback to the full code on the code parser blindspots

I'm not a fan of this approach. The point of the datanames argument in get_code is to be exclusive and your proposal is to include orphan code by default.

chlebowa · 2024-02-20T14:39:10Z

Some context: this idea was inspired by working with the python app in teal.gallery, where the code setting the environment is not traced back to the dataname. Hence, an eval_code call needed to be used.

This is an idea/discussion on how to bring it to within.

We tried to remove eval_code altogether but decided to keep it, in part because of the necessity to tag lines with comments, which is only possible in text. An additional benefit to using eval_code is one can load code from a text file with a simple readLines.
Are you aiming to make eval_code obsolete (at least as an exported function)?

Tagging code in within would be slightly different to tagging in text form because one would be able to tag whole blocks, not single lines.

b) due to the way values are injected into expr through ..., adding another named argument to within further restricts the variable names that can be used in the code passed to expr.

Valid, we could further minimize this by adding a dot as a prefix to avoid names clashing, similar to what tidyverse does.

within.qenv(data, expr, .links_to = NULL, ...)

This would be an effective approach 👍 I don't quite like the mixed formal names, though (dot vs no dot). I'm on the fence on this.

averissimo · 2024-02-20T15:51:08Z

I don't think it should become obsolete. Especially if we have users executing code from other scripts or blocks of texts.

Having said that, I don't like using it inline as it obfuscates expressions inside a very long string. Among the cons are:

Harder to debug / execute specific lines/blocks
IDE's features to detect issues with code disappears
Need to escape quotation characters
No linter available for the code

averissimo · 2024-02-20T15:53:20Z

I'm not a huge fan of adding new non-standard parameters to within either, together we might come up with a better solution 💪

I'll think a bit more about it.

gogonzo · 2024-02-21T15:51:03Z

a) we tend to put all code into one compound expression and this would necessitate multiple within calls;

This is exactly why we decided to use comments tag instead of extra argument. Imagine this

# now
data |>
  within({
    open_connection() # @linksto df1 df2
    df1 <- get_data("df1")
    df2 <- get_data("df2")
    close_connection() # @linksto df1 df2
  })

# possibly
data |>
  within(open_connection(), .linksto = c("df1", "df2")) |>
  within({
     df1 <- get_data("df1")
     df2 <- get_data("df2")
   }) |>
 within(close_connection(), .linksto = c("df1", "df2"))

I think .linksto being a formal argument is fine as it won't be confused with .... I think also that we should not put so much attention on within generic. within.teal_code is exposed in teal.code documentation and .linksto can be easily explained (probably even easier than # @linksto). I also think .linksto argument is more robust.

I don't think it should become obsolete.
I'm not a huge fan of adding new non-standard parameters to within either

Maybe, we can have both # @linksto and .linksto in both within and eval_code?

However, there is fundamental question. We need to asses the cost of maintaining code-subsetting and decide whether this feature is worth efforts.

averissimo · 2024-02-21T17:30:26Z

AFAIK within currently has a limitation on detecting comments, hence the proposal.

my_within <- \(data, expr, ...) {
  expr <- substitute(expr) # first line of within.qenv
  expr
}
my_within(iris, {
  1+ 
    2 # a comment
})
#> {
#>     1 + 2
#> }

Maybe, we can have both # @linksto and .linksto in both within and eval_code?

Agree 👍

# possibly
data |>
  within(open_connection(), .linksto = c("df1", "df2")) |>
  within({
     df1 <- get_data("df1")
     df2 <- get_data("df2")
   }) |>
 within(close_connection(), .linksto = c("df1", "df2"))

As @chlebowa said, this is not great 😁 and I agree. My only grip is that using string code is the only alternative for the code parser limitations.

# Now
data |>
  eval_code(
    "
    open_connection() # @linksto df1 df2
    df1 <- read_lines(1L)
    df2 <- read_lines()
    close_connection() # @linksto df1 df2
    "
  )

We need to assess the cost of maintaining code-subsetting and decide whether this feature is worth efforts.

Totally. This is not a trivial feature as it needs to be tightly integrated with code parser.

averissimo · 2024-02-21T17:39:53Z

Looking at the commented code I wonder if we could give a hint to the code parser via a "side effect" function.

(sorry if this has already been discussed)

# Possibly (alt to subsetting)
teal_data() |> 
  within({
    open_connection() %@linksto% list(df1, df2)
    df1 <- get_data("df1")
    df2 <- get_data("df2")
    close_connection() %>% teal.code::links_to(df1, df2)
  })

%@linksto% or teal.code::links_to does absolutely nothing, it's a special function name just like # @linksto is a special comment.

`%@linksto%` <- \(x, ...) x # or %links_to% or just %hint%
1 + 1 %@linksto% list(df_not_exist, df2)
#> [1] 2
links_to <- `%@linksto%`
1 + 1 |> links_to(df_variable_doesnt_exist)
#> [1] 2

gogonzo · 2024-02-22T03:28:39Z

AFAIK within currently has a limitation on detecting comments, hence the proposal.

Yup, substitute, quote, expression discard comments. Only parse preserve comments in srcref attribute.

open_connection() %@linksto% list(df1, df2)

@averissimo Good luck with writing parser which handles this expression :D

If we introduce .linksto argument in within and eval_code it would mean that this additional information will have to be carried over with the code element (probably an attribute). Please note also that there is no parsing when code is inserted to the qenv class but only on using get_code. If we decide to add .linksto argument I would also recommend moving parser to the setters (instead of getter).

I imagine such thing, so that .linksto will always have a default value which can be replaced if someone doesn't like the result of the parsing.

within <- function(x, expr, .linksto = get_used_symbols(expr), ...) {
  parsed_expr <- substitute(expr)
  ...
  eval_code(x, parsed_expr, .linksto = .linksto)

}

chlebowa · 2024-02-22T09:21:41Z

However, there is fundamental question. We need to asses the cost of maintaining code-subsetting and decide whether this feature is worth efforts.

In eval_code existing code is collapsed to a single string and the incoming block is appended.

code <- paste(code, collapse = "\n")
object@code <- c(object@code, code)

Then, get_code collapses the code completely.

paste(object@code, collapse = "\n") # for qenv
paste(object$trace, collapse = "\n ") # for `qenv.error

The code is essentially collapsed for all intents and purposes, we just temporarily store it separated.

What if we do the collapsing on.exit, like the within prototype did? Then @code will always be a single string.

open_connection() %@linksto% list(df1, df2)

@averissimo Good luck with writing parser which handles this expression :D

I think this is doable.

If we introduce .linksto argument in within and eval_code it would mean that this additional information will have to be carried over with the code element (probably an attribute). Please note also that there is no parsing when code is inserted to the qenv class but only on using get_code. If we decide to add .linksto argument I would also recommend moving parser to the setters (instead of getter).

Now this is is a can of worms... 😫

averissimo added enhancement core labels Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Extend `within` to accept a custom parameter that links all lines to a dataname #200

[Feature Request]: Extend `within` to accept a custom parameter that links all lines to a dataname #200

averissimo commented Feb 20, 2024 •

edited

Loading

Code of Conduct

Contribution Guidelines

Security Policy

chlebowa commented Feb 20, 2024

averissimo commented Feb 20, 2024 •

edited

Loading

averissimo commented Feb 20, 2024

chlebowa commented Feb 20, 2024

chlebowa commented Feb 20, 2024

averissimo commented Feb 20, 2024

averissimo commented Feb 20, 2024

gogonzo commented Feb 21, 2024

averissimo commented Feb 21, 2024

averissimo commented Feb 21, 2024 •

edited

Loading

gogonzo commented Feb 22, 2024 •

edited

Loading

chlebowa commented Feb 22, 2024

[Feature Request]: Extend within to accept a custom parameter that links all lines to a dataname #200

[Feature Request]: Extend within to accept a custom parameter that links all lines to a dataname #200

Comments

averissimo commented Feb 20, 2024 • edited Loading

Feature description

Code of Conduct

Contribution Guidelines

Security Policy

chlebowa commented Feb 20, 2024

averissimo commented Feb 20, 2024 • edited Loading

averissimo commented Feb 20, 2024

chlebowa commented Feb 20, 2024

chlebowa commented Feb 20, 2024

averissimo commented Feb 20, 2024

averissimo commented Feb 20, 2024

gogonzo commented Feb 21, 2024

averissimo commented Feb 21, 2024

averissimo commented Feb 21, 2024 • edited Loading

gogonzo commented Feb 22, 2024 • edited Loading

chlebowa commented Feb 22, 2024

[Feature Request]: Extend `within` to accept a custom parameter that links all lines to a dataname #200

[Feature Request]: Extend `within` to accept a custom parameter that links all lines to a dataname #200

averissimo commented Feb 20, 2024 •

edited

Loading

averissimo commented Feb 20, 2024 •

edited

Loading

averissimo commented Feb 21, 2024 •

edited

Loading

gogonzo commented Feb 22, 2024 •

edited

Loading