Cached recipes #1906

nmay231 · 2024-02-16T09:15:41Z

Putting this out there to get some preliminary feedback, ask questions, and take advantage of self-inflicted peer pressure 😁

What's in this PR?

Adds a [cached] recipe attribute that marks that recipe as cacheable. A cacheable recipe only reruns when the body of the recipe changes after variables and other {{interpolations}} are evaluated. See examples in #867.

TODOs

Decisions, decisions... Edge cases and questions

Besides backtick expressions and calls to uuid() and just_pid(), are there other impure interpolations that you can think of off the top of your head? I've already scanned the README, but wanted to double check.
Cached recipes with dependencies do not have perfectly obvious behaviors. Here is how I am going to go with it (I'm looking for feedback on these):
- The prior dependencies of cached recipes must also be cached, otherwise, it's ambiguous if you would run a dependency if the main recipe matched the last cached run.
- Subsequent dependencies (&& after) do not need to be cached and do not run if the recipe matches the last run. I don't think they need to run because they are typically clean-up jobs and are only needed if the recipe actually ran. If the user wants them to run every time, they should set up a separate recipe that depends on the main recipe and subsequent in that order.
- An unchanged recipe with a changed dependency will re-run both.
- A changed recipe with an unchanged dependency only re-run the former and not the dependency.
Did I miss any stupidly obvious issue that needs addressing? I am writing this at 3 in the morning, so I wouldn't be surprised.

casey · 2024-02-16T19:46:41Z

Nice! Some feedback:

This initial PR should be as simple as possible, so it's easy to review and land, and additional features can be added in follow-up PRs. So I would leave --no-cache, --delete-cache, and customizing the cache filename for later.
What is the Error::InvalidCachedRecipe error for?
Can you describe the structure of the cache file JSON?

src/cache.rs

tgross35 · 2024-02-16T22:01:14Z

This looks awesome!

I still think it could be better to have an explicit "rerun recipe if the value of this thing changes" function so the user can determine what is important, rather than having just need to determine whether a function is impure. This would also mean that you could have backticked expressions that don't invalidate the cache.

Don't think this would be too bad if the indicator function also returns the value, so you could do:

config:
    # There is probably a snappier name than rerun_if_changed
    echo "running for configuration {{ rerun_if_changed(env("PROJECT_CONFIG")) }}"
    # ...

casey · 2024-02-16T22:04:33Z

I still think it could be better to have an explicit "rerun recipe if the value of this thing changes" function so the user can determine what is important, rather than having just need to determine whether a function is impure. This would also mean that you could have backticked expressions that don't invalidate the cache.

This is definitely an interesting idea, but I'd like to explore how things look without explicit annotations. If it works well without having to annotate anything, that would be ideal.

nmay231 · 2024-02-16T23:07:32Z

@casey

What is the Error::InvalidCachedRecipe error for?

Having a call to uuid(), and the like in a cached recipe means the recipe will always change and never be cached. So they should be disallowed completely until cache_ignore() is implemented.

I also disallowed backtick interpolation because currently the command would be executed twice (one of the backlog problems). Even when it is evaluated once, it won't be run in the same execution order as run_linewise which only runs interpolations line by line. If the user determines their backtick expression is pure (like a version check rustc --version) then they can put the expression outside the recipe body. If it is impure (relies on previous state) then they can wrap it in cache_ignore() to only run during the main execution.

Can you describe the structure of the cache file JSON?

The structure itself is JustfileCache. As mentioned in my response to @tgross35 above, I probably need to change things a bit.

After this comment, I'll make it so there is a unique cache file per justfile path and working directory combo. The filename on my system is ~/.cache/justfiles/{project_name}-{&path_hash[..16]}.json. The working_directory and (soon to be added) justfile_path will only be there for user debuggability.

I would leave [...] customizing the cache filename for later.

I'm actually using the filename customization to not pollute the contributor's workspace during tests, so I'll leave it for now.

casey · 2024-02-16T23:24:33Z

Having a call to uuid(), and the like in a cached recipe means the recipe will always change and never be cached. So they should be disallowed completely until cache_ignore() is implemented.

I think it's fine to just not detect this and the recipe doesn't get cached.

I also disallowed backtick interpolation because currently the command would be executed twice (one of the backlog problems). Even when it is evaluated once, it won't be run in the same execution order as run_linewise which only runs interpolations line by line. If the user determines their backtick expression is pure (like a version check rustc --version) then they can put the expression outside the recipe body. If it is impure (relies on previous state) then they can wrap it in cache_ignore() to only run during the main execution.

Let's punt on linewise recipes and only allow [cache] on shebang recipes for now. We can allow backticks in that case and figure out linewise recipes later. Shebang recipes will be easiest to cache, since we just evaluate the text once and can make a decision to execute or skip based on the evaluated text. I don't want to worry about a cache_ignore function for the time being.

The structure itself is JustfileCache. As mentioned in my response to @tgross35 above, I probably need to change things a bit.

I think we should aim for an initial very simple version which has shortcomings and then fix those one by one.

For now, what do you think about just having a single .justcache directory, which has files named after the sha hash of the evaluated recipe body, which contain the evaluated text of the recipe body?

My thinking is that:

It's useful to be able to see what's cached with ls .justcache and seeing evaluated recipe bodies with cat .justcache/HASH.
What exactly should be included in the cache key is tricky. There is the evaluated recipe body, the working directory, and justfile settings. Anything else?

We can initially land this feature behind set unstable, and then we can think about what needs to be in the cache keys one at a time. So for example, we can go, "Okay, the working directory needs to be in the cache key, let's do a PR for that.", and make incremental improvements to the feature, until it's ready to be made stable.

nmay231 · 2024-02-17T00:12:23Z

I think it's fine to just not detect this and the recipe doesn't get cached.

Alrighty. I'm just trying to prevent open issues like "[cached] doesn't work". If you prefer a simpler implementation, I'll do that.

For now, what do you think about just having a single .justcache directory, which has files named after the sha hash of the evaluated recipe body, which contain the evaluated text of the recipe body?

I am very opposed to this. While it sounds good in principle to be able to read the previous cached recipes, most of the time the hash is gonna change a lot because the recipe contains sha256_file(...) while the rest of the recipe body remains the same. I don't want to have to delete the cache all the time.

That said, I would not mind if the exact format changed somewhat. For example, I wouldn't mind having the cache file/directory in the working directory instead of the user's general cache directory.

What exactly should be included in the cache key is tricky. There is the evaluated recipe body, the working directory, and justfile settings. Anything else?

The recipe name and/or justfile path should also be included in the cache input. If we do put the cache file/directory in the working directory, that would not need to be part of the input. I can't think of anything else.

We can initially land this feature behind set unstable

👍

amarao · 2024-02-21T14:50:19Z

Don't forget to put version check into cache. Eventually it will be version 2 of the cache, and someone will run older just with newer cache.

Expected behavior: either invalidate cache, or refuse to run.
Bad behavior: ignore version 2 in the cache file and try to re-use it.

The main issue is that it's impossible to add behavior to older versions (they are old by definition), so safeguard should be from from the very beginning.

This reverts commit a649e1d.

nmay231 · 2024-03-12T05:44:58Z

@casey Two quick things:

Do we really need to constrain this to only shebang recipes to prevent double evaluation if this is unstable only? I feel it's better to allow shebang and linewise recipes since it requires --unstable anyways. Then users can keep the benifits of linewise recipes (early exit on failure, linewise command printing) while trying out caching. Of course, I'll work on preventing double evaluation in a separate PR.
Is the current cache format a show-stopper for you, or is the json file okay? I already expressed why I didn't like it in the previous comment. I did simplify it to a file in a local .justcache/ directory, where there's a json file for every justfile * working_dir combo.

(Still need to handle recipe dependencies and add docs, but this should be done other than that)

nmay231 · 2024-03-26T21:39:49Z

@amarao I decided to not include an explicit version in the cache format to simplify things and instead let the schema be the version. Obviously, we still handle outdated/invalid schemas by invalidating the cache.

Besides, running just with a cache file that is the wrong version is much rarer than you might think since .justcache/ should be local to each machine and not included in git history.

nmay231 · 2024-04-15T21:23:07Z

I still have the two previous questions that need to be answered, but it's ready besides that. Whenever you have time for this full-time unpaid job, of course :)

One of those questions is about double evaluation, but that feels much easier to handle in the context of how cache_ignore() will be implemented (since that will basically need to be evaluated once during cache checks and once again during runtime). I will probably implement that and future features as a stacked diff (multiple PRs).

The only other thing to consider (that I know about) is that this might heavily conflict with #1562, however, that seems to be paused at the moment due to the author's lack of time.

casey · 2024-05-20T21:03:48Z

Do we really need to constrain this to only shebang recipes to prevent double evaluation if this is unstable only? I feel it's better to allow shebang and linewise recipes since it requires --unstable anyways. Then users can keep the benifits of linewise recipes (early exit on failure, linewise command printing) while trying out caching. Of course, I'll work on preventing double evaluation in a separate PR.

I think I'd rather constrain it to shebang recipes at the moment. Linewise recipes seem much more complicated. One issue is that linewise recipes are evaluated line-by-line, so an error in an early line will prevent a later line from running. If a later line relies on an earlier line, for example because it creates a file that is read in an expression in a later line, then evaluating it ahead of time would produce an error. We could, in the case of cached recipes, evaluate linewise recipe lines ahead of time, and only re-run the recipe if no lines have changed, which is probably what we want to do, but would be a larger change, and I'd rather get this PR in sooner rather than later.

Is the current cache format a show-stopper for you, or is the json file okay? I already expressed why I didn't like it in the previous comment. I did simplify it to a file in a local .justcache/ directory, where there's a json file for every justfile * working_dir combo.

Ultimately, I think I'd like to use redb an embedded pure-rust key/value store, for the cache. I'm happy to do that myself though, so whatever format is easiest for you for now and easiest to review is fine. Since redb databases block on concurrent access, this would also prevent errors where two instances of just try to modify the cache at the same time, and potentially also allow for concurrent access, if something like #1562 is ever merged.

One of those questions is about double evaluation, but that feels much easier to handle in the context of how cache_ignore() will be implemented (since that will basically need to be evaluated once during cache checks and once again during runtime). I will probably implement that and future features as a stacked diff (multiple PRs).

I'm not sure I understand this point. Can't recipes be evaluated only once, before running, and the evaluated recipe body used both for cache checks and execution?

casey · 2024-05-20T21:05:12Z

There are some conflicts, but they shouldn't be too bad to resolve. Let me know when this is ready for review!

casey · 2024-05-25T08:10:20Z

Converted to a draft just so I can keep track of what's ready to go. Feel free to undraft!

nmay231 · 2024-06-09T20:25:25Z

Sounds good! I recently changed jobs, but I'll try to get back to this this week.

One issue is that linewise recipes are evaluated line-by-line, so an error in an early line will prevent a later line from running. If a later line relies on an earlier line, for example because it creates a file that is read in an expression in a later line, then evaluating it ahead of time would produce an error.

True, but that's only if the user is using backtick expressions to mutate things "during compile time" which should be discouraged. This is one reason I wanted to disallow backtick expressions initially until cache_ignore() was fully implemented. (To prevent confusion, I mean expr {{`with`}} backticks since regular backticks expr `with` backticks are passed to the shell directly and safely run line-by-line.)

In any case, I agree that cached line-by-line recipes should have the same semantics as shebang recipes (all expressions evaluated at start before passed to shell). Ideally, cached line-by-line recipes would run exactly the same way as cached shebang recipes, but you can still reap the benefits of changing line-echos with @, spreading commands over multiple lines with \, etc. In other words, line-by-line recipes would be the odd one out.

One of those questions is about double evaluation, but that feels much easier to handle in the context of how cache_ignore() will be implemented (since that will basically need to be evaluated once during cache checks and once again during runtime). I will probably implement that and future features as a stacked diff (multiple PRs).

I'm not sure I understand this point. Can't recipes be evaluated only once, before running, and the evaluated recipe body used both for cache checks and execution?

I initially thought I would do two evaluation runs to implement cache_ignore(), but it would be much simpler to produce two strings, one for hashing and comparing to the cache, and one to pass to the shell to run.

EDIT: So much for that deadline, haha. I think I'll have time to look at this again soon.

nmay231 added 6 commits February 13, 2024 04:06

Add ExpressionWalker and refactor Variables

a649e1d

Cache recipe by evaluated body, initial

47976a7

Tests

7d24508

Include working directory/project dir in serialized cache

626dd08

Rename RecipeCache.{hash => body_hash}

ed9f0aa

Shorten cache filename

a1f740c

tgross35 reviewed Feb 16, 2024

View reviewed changes

src/cache.rs Outdated Show resolved Hide resolved

src/cache.rs Outdated Show resolved Hide resolved

Rename JustfileCache.{recipe_caches => recipes}

d885519

Use working directory AND justfile path for cache filename

5a959d1

nmay231 added 12 commits March 6, 2024 21:32

Remove InvalidCachedRecipe error

d82e9e1

Print 'skipping cached recipe' message by default

c18a0cc

Remove CacheFilename for now

6307eb0

Use .as_encoded_bytes() on os strings

ed8c05d

Fix tests

bbf7f84

Make cached_recipes require unstable

cf3c95a

Put cache files in working directory by default

36482fd

Added versioning to cache files

d5edf65

Make errors look the same

f32cc86

Revert "Add ExpressionWalker and refactor Variables"

219c646

This reverts commit a649e1d.

Clarify error messages

3a341ea

Incompatible old versions of cache should also silently be ignored

9acd827

Merge branch 'master' into cached-recipes

29d3459

nmay231 added 12 commits March 14, 2024 14:39

Doc comments

0812866

Move cache_file into Search struct

707844c

Starting to implement cached dependencies

70fe6ac

Return instead of mutate

e014285

Cached dependencies mostly work

e9686df

Only impl Recipe::run() for recipes with resolved dependencies

e859bd6

Finalize cached recipes with deps

fbc87dd

Merge branch 'master' into cached-recipes

8e8e046

Rename src/{cache => justfile_cache}.rs

4961730

Simplify JustfileCache

a17b8dd

Docs

d12dbc3

note to add to gitignore

1ca5122

nmay231 added 3 commits April 15, 2024 15:45

Just make the cache mutable directly, don't use refcell

db89a9a

Run just check

aa8c2db

Merge branch 'master' into cached-recipes

e666bf5

nmay231 marked this pull request as ready for review April 15, 2024 21:23

casey marked this pull request as draft May 25, 2024 08:09

This was referenced Jun 12, 2024

DJB Redo in Just #600

Closed

Feature Idea: Git diff/changed based dependency resolution #1114

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cached recipes #1906

Cached recipes #1906

nmay231 commented Feb 16, 2024

casey commented Feb 16, 2024

tgross35 commented Feb 16, 2024 •

edited

Loading

casey commented Feb 16, 2024

nmay231 commented Feb 16, 2024 •

edited

Loading

casey commented Feb 16, 2024

nmay231 commented Feb 17, 2024

amarao commented Feb 21, 2024

nmay231 commented Mar 12, 2024

nmay231 commented Mar 26, 2024

nmay231 commented Apr 15, 2024

casey commented May 20, 2024

casey commented May 20, 2024

casey commented May 25, 2024

nmay231 commented Jun 9, 2024 •

edited

Loading

Cached recipes #1906

Are you sure you want to change the base?

Cached recipes #1906

Conversation

nmay231 commented Feb 16, 2024

What's in this PR?

TODOs

Decisions, decisions... Edge cases and questions

casey commented Feb 16, 2024

tgross35 commented Feb 16, 2024 • edited Loading

casey commented Feb 16, 2024

nmay231 commented Feb 16, 2024 • edited Loading

casey commented Feb 16, 2024

nmay231 commented Feb 17, 2024

amarao commented Feb 21, 2024

nmay231 commented Mar 12, 2024

nmay231 commented Mar 26, 2024

nmay231 commented Apr 15, 2024

casey commented May 20, 2024

casey commented May 20, 2024

casey commented May 25, 2024

nmay231 commented Jun 9, 2024 • edited Loading

tgross35 commented Feb 16, 2024 •

edited

Loading

nmay231 commented Feb 16, 2024 •

edited

Loading

nmay231 commented Jun 9, 2024 •

edited

Loading