Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Extend within to accept a custom parameter that links all lines to a dataname #200

Open
3 tasks done
averissimo opened this issue Feb 20, 2024 · 12 comments

Comments

@averissimo
Copy link
Contributor

averissimo commented Feb 20, 2024

Feature description

Extend within.qenv(data, expr, links_to = NULL, ...) so that links_to parameter will hint the code parser

The goal is that all the code in expr is hinted to one dataname (?or more?).

The get_code below would output the same code.

library(teal.code)
library(teal.data)

data_within <- teal_data() |> within({
  options("AN_OPTION" = iris)
  IRIS <- getOption("AN_OPTION")
}, links_to = "IRIS")

datanames(data_within) <- "IRIS"

get_code(data_within, datanames = "IRIS") |> cat()
#> IRIS <- getOption("AN_OPTION")

data_eval <- teal_data() |> eval_code("
  options(\"AN_OPTION\" = iris) # @linksto IRIS
  IRIS <- getOption(\"AN_OPTION\")
")

datanames(data_eval) <- "IRIS"

get_code(data_eval, datanames = "IRIS") |> cat()
#> options(AN_OPTION = iris)
#> IRIS <- getOption("AN_OPTION")

Created on 2024-02-20 with reprex v2.0.2

Code of Conduct

  • I agree to follow this project's Code of Conduct.

Contribution Guidelines

  • I agree to follow this project's Contribution Guidelines.

Security Policy

  • I agree to follow this project's Security Policy.
@chlebowa
Copy link
Contributor

Note that
a) we tend to put all code into one compound expression and this would necessitate multiple within calls;
b) due to the way values are injected into expr through ..., adding another named argument to within further restricts the variable names that can be used in the code passed to expr.

@averissimo
Copy link
Contributor Author

averissimo commented Feb 20, 2024

Some context: this idea was inspired by working with the python app in teal.gallery, where the code setting the environment is not traced back to the dataname. Hence, an eval_code call needed to be used.

This is an idea/discussion on how to bring it to within.

a) we tend to put all code into one compound expression and this would necessitate multiple within calls;

If this avoids executing string as code I think it's an acceptable trade-off

b) due to the way values are injected into expr through ..., adding another named argument to within further restricts the variable names that can be used in the code passed to expr.

Valid, we could further minimize this by adding a dot as a prefix to avoid names clashing, similar to what tidyverse does.

within.qenv(data, expr, .links_to = NULL, ...)

@averissimo
Copy link
Contributor Author

An alternative solution might be a validation of the completeness of the code.

So that when (pseudo-code below) is not empty, it returns all the code.

setdiff(get_code(data), lapply(datanames(data), \(x) get_code(data, datanames = x)) # pseudo-code!

That is, fallback to the full code on the code parser blindspots

@chlebowa
Copy link
Contributor

An alternative solution might be a validation of the completeness of the code.

So that when (pseudo-code below) is not empty, it returns all the code.

setdiff(get_code(data), lapply(datanames(data), \(x) get_code(data, datanames = x)) # pseudo-code!

That is, fallback to the full code on the code parser blindspots

I'm not a fan of this approach. The point of the datanames argument in get_code is to be exclusive and your proposal is to include orphan code by default.

@chlebowa
Copy link
Contributor

Some context: this idea was inspired by working with the python app in teal.gallery, where the code setting the environment is not traced back to the dataname. Hence, an eval_code call needed to be used.

This is an idea/discussion on how to bring it to within.

We tried to remove eval_code altogether but decided to keep it, in part because of the necessity to tag lines with comments, which is only possible in text. An additional benefit to using eval_code is one can load code from a text file with a simple readLines.
Are you aiming to make eval_code obsolete (at least as an exported function)?

Tagging code in within would be slightly different to tagging in text form because one would be able to tag whole blocks, not single lines.

b) due to the way values are injected into expr through ..., adding another named argument to within further restricts the variable names that can be used in the code passed to expr.

Valid, we could further minimize this by adding a dot as a prefix to avoid names clashing, similar to what tidyverse does.

within.qenv(data, expr, .links_to = NULL, ...)

This would be an effective approach 👍 I don't quite like the mixed formal names, though (dot vs no dot). I'm on the fence on this.

@averissimo
Copy link
Contributor Author

I don't think it should become obsolete. Especially if we have users executing code from other scripts or blocks of texts.

Having said that, I don't like using it inline as it obfuscates expressions inside a very long string. Among the cons are:

  • Harder to debug / execute specific lines/blocks
  • IDE's features to detect issues with code disappears
  • Need to escape quotation characters
  • No linter available for the code

@averissimo
Copy link
Contributor Author

I'm not a huge fan of adding new non-standard parameters to within either, together we might come up with a better solution 💪

I'll think a bit more about it.

@gogonzo
Copy link
Contributor

gogonzo commented Feb 21, 2024

a) we tend to put all code into one compound expression and this would necessitate multiple within calls;

This is exactly why we decided to use comments tag instead of extra argument. Imagine this

# now
data |>
  within({
    open_connection() # @linksto df1 df2
    df1 <- get_data("df1")
    df2 <- get_data("df2")
    close_connection() # @linksto df1 df2
  })

# possibly
data |>
  within(open_connection(), .linksto = c("df1", "df2")) |>
  within({
     df1 <- get_data("df1")
     df2 <- get_data("df2")
   }) |>
 within(close_connection(), .linksto = c("df1", "df2"))

I think .linksto being a formal argument is fine as it won't be confused with .... I think also that we should not put so much attention on within generic. within.teal_code is exposed in teal.code documentation and .linksto can be easily explained (probably even easier than # @linksto). I also think .linksto argument is more robust.

I don't think it should become obsolete.
I'm not a huge fan of adding new non-standard parameters to within either

Maybe, we can have both # @linksto and .linksto in both within and eval_code?


However, there is fundamental question. We need to asses the cost of maintaining code-subsetting and decide whether this feature is worth efforts.

@averissimo
Copy link
Contributor Author

AFAIK within currently has a limitation on detecting comments, hence the proposal.

my_within <- \(data, expr, ...) {
  expr <- substitute(expr) # first line of within.qenv
  expr
}
my_within(iris, {
  1+ 
    2 # a comment
})
#> {
#>     1 + 2
#> }

Maybe, we can have both # @linksto and .linksto in both within and eval_code?

Agree 👍

# possibly
data |>
  within(open_connection(), .linksto = c("df1", "df2")) |>
  within({
     df1 <- get_data("df1")
     df2 <- get_data("df2")
   }) |>
 within(close_connection(), .linksto = c("df1", "df2"))

As @chlebowa said, this is not great 😁 and I agree. My only grip is that using string code is the only alternative for the code parser limitations.

# Now
data |>
  eval_code(
    "
    open_connection() # @linksto df1 df2
    df1 <- read_lines(1L)
    df2 <- read_lines()
    close_connection() # @linksto df1 df2
    "
  )

We need to assess the cost of maintaining code-subsetting and decide whether this feature is worth efforts.

Totally. This is not a trivial feature as it needs to be tightly integrated with code parser.

@averissimo
Copy link
Contributor Author

averissimo commented Feb 21, 2024

Looking at the commented code I wonder if we could give a hint to the code parser via a "side effect" function.

(sorry if this has already been discussed)

# Possibly (alt to subsetting)
teal_data() |> 
  within({
    open_connection() %@linksto% list(df1, df2)
    df1 <- get_data("df1")
    df2 <- get_data("df2")
    close_connection() %>% teal.code::links_to(df1, df2)
  })

%@linksto% or teal.code::links_to does absolutely nothing, it's a special function name just like # @linksto is a special comment.

`%@linksto%` <- \(x, ...) x # or %links_to% or just %hint%
1 + 1 %@linksto% list(df_not_exist, df2)
#> [1] 2
links_to <- `%@linksto%`
1 + 1 |> links_to(df_variable_doesnt_exist)
#> [1] 2

@gogonzo
Copy link
Contributor

gogonzo commented Feb 22, 2024

AFAIK within currently has a limitation on detecting comments, hence the proposal.

Yup, substitute, quote, expression discard comments. Only parse preserve comments in srcref attribute.

open_connection() %@linksto% list(df1, df2)

@averissimo Good luck with writing parser which handles this expression :D


If we introduce .linksto argument in within and eval_code it would mean that this additional information will have to be carried over with the code element (probably an attribute). Please note also that there is no parsing when code is inserted to the qenv class but only on using get_code. If we decide to add .linksto argument I would also recommend moving parser to the setters (instead of getter).

I imagine such thing, so that .linksto will always have a default value which can be replaced if someone doesn't like the result of the parsing.

within <- function(x, expr, .linksto = get_used_symbols(expr), ...) {
  parsed_expr <- substitute(expr)
  ...
  eval_code(x, parsed_expr, .linksto = .linksto)

}

@chlebowa
Copy link
Contributor

However, there is fundamental question. We need to asses the cost of maintaining code-subsetting and decide whether this feature is worth efforts.

In eval_code existing code is collapsed to a single string and the incoming block is appended.

code <- paste(code, collapse = "\n")
object@code <- c(object@code, code)

Then, get_code collapses the code completely.

paste(object@code, collapse = "\n") # for qenv
paste(object$trace, collapse = "\n ") # for `qenv.error

The code is essentially collapsed for all intents and purposes, we just temporarily store it separated.

What if we do the collapsing on.exit, like the within prototype did? Then @code will always be a single string.


open_connection() %@linksto% list(df1, df2)

@averissimo Good luck with writing parser which handles this expression :D

I think this is doable.


If we introduce .linksto argument in within and eval_code it would mean that this additional information will have to be carried over with the code element (probably an attribute). Please note also that there is no parsing when code is inserted to the qenv class but only on using get_code. If we decide to add .linksto argument I would also recommend moving parser to the setters (instead of getter).

Now this is is a can of worms... 😫

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants