Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gc): record workspace manifest and target dir in global cache tracker #13846

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

baby230211
Copy link
Contributor

@baby230211 baby230211 commented May 2, 2024

What does this PR try to resolve?

Fix #13136
This PR aims to track of the following data in GC for further garbage collection use.

  1. root-manifest path
  2. target directory path

There will be three new table for this features.

  1. worksapce_manifest_index table - index and record the root manifest path
id workspace_manifest timestamp
? /foo/Cargo.toml ?
  1. target_dir_index table - index and record the target_dir path
id target_dir timestamp
? /foo/target ?
  1. workspace_src table - combine the previous two table
id worksapce_id target_dir_id timestamp
? ? ? ?

How should we test and review this PR?

  1. Run compile (cargo run build or else)
  2. inspect your root gc file (~/.cargo/.global-cache)
  3. see if the three table is record correctly.

@rustbot
Copy link
Collaborator

rustbot commented May 2, 2024

r? @ehuss

rustbot has assigned @ehuss.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@baby230211 baby230211 marked this pull request as draft May 2, 2024 12:48
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 2, 2024
@rustbot rustbot added the A-infrastructure Area: infrastructure around the cargo repo, ci, releases, etc. label May 2, 2024
@baby230211 baby230211 force-pushed the feat-13136 branch 3 times, most recently from 5f32f71 to 15e2863 Compare May 2, 2024 12:52
Comment on lines +328 to +350
basic_migration(
"CREATE TABLE workspace_manifest_index (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL,
timestamp INTEGER NOT NULL
)",
),
basic_migration(
"CREATE TABLE target_dir_index (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL,
timestamp INTEGER NOT NULL
)",
),
basic_migration(
"CREATE TABLE workspace_src (
workspace_id INTEGER NOT NULL,
target_dir_id INTEGER NOT NULL,
timestamp INTEGER NOT NULL,
PRIMARY KEY (workspace_id, target_dir_id),
FOREIGN KEY (workspace_id) REFERENCES workspace_manifest_index (id) ON DELETE CASCADE,
FOREIGN KEY (target_dir_id) REFERENCES target_dir_index (id) ON DELETE CASCADE
)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the motivation for

  • Tracking this as 3 separate tables
  • Having timestamps for workspace dir, target dir, and workspace dir + target dir

Comment on lines 1482 to 1523
/// New workspace manifest entries to insert.
workspace_db_timestamps: HashMap<WorkspaceManifestIndex, Timestamp>,
/// New target dir entries to insert.
target_dir_db_timestamps: HashMap<TargetDirIndex, Timestamp>,
/// New workspace src entries to insert.
workspace_src_timestamps: HashMap<WorkspaceSrc, Timestamp>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does db mean in this context?

I think for git

  • git_db is the .git
  • git_checkout is a work dir of a shared .git

Comment on lines 230 to 231
pub encoded_workspace_manifest_name: InternedString,
pub encoded_target_dir_name: InternedString,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the others refer to "encoded" they are referring to how we translate package names to file names.

That doesn't quite make sense here and just naming these target_dir and workspace_manifest_path should work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll rename it with target_dir_path and workspace_manifest_path for these naming.
Thanks.

/// Indicates the given [`WorkspaceManifest`] has been used right now.
///
/// Also implicitly marks the workspace manifest used, too.
pub fn mark_workspace_src_used(&mut self, workspace_src: WorkspaceSrc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I wonder why we exposed these in the API, rather than taking parameters. Seems like it creates more boilerplate for the callers

@@ -264,6 +264,16 @@ pub fn create_bcx<'a, 'gctx>(
HasDevUnits::No
}
};
let _ = &gctx.deferred_global_last_use()?.mark_workspace_src_used(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why let _ = &?

Copy link
Contributor Author

@baby230211 baby230211 May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove let _ = &, rustc would be warning as follow

rustc: unused borrow that must be used (unused_must_use)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it locally and didn't get any message...

    gctx.deferred_global_last_use()?
        .mark_workspace_src_used(global_cache_tracker::WorkspaceSrc {
            workspace_manifest_name: InternedString::new(ws.root_manifest().to_str().unwrap()),
            target_dir_name: InternedString::new(
                ws.target_dir().as_path_unlocked().to_str().unwrap(),
            ),
        });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That has a leading & that can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works. Thanks, I've updated.

@@ -1632,6 +1734,32 @@ impl DeferredGlobalLastUse {
);
}

// Flushes all of the `target_dir_db_timestamps` to the database,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review note: I've not yet dug into all of the DB operations

@baby230211 baby230211 force-pushed the feat-13136 branch 2 times, most recently from 9f069a9 to d55dda7 Compare May 15, 2024 12:39
@baby230211 baby230211 force-pushed the feat-13136 branch 5 times, most recently from 7d4fbbe to 481de08 Compare May 23, 2024 03:29
@bors
Copy link
Contributor

bors commented Jun 7, 2024

☔ The latest upstream changes (presumably #13979) made this pull request unmergeable. Please resolve the merge conflicts.

@epage
Copy link
Contributor

epage commented Jun 20, 2024

As this is marked as a draft, @baby230211 what do you see as the next steps before marking this ready for review?

@baby230211
Copy link
Contributor Author

@epage , Sorry for pending this PR so long, the final step here is to record the manifest lock file in manifest table i think.

@epage
Copy link
Contributor

epage commented Jun 21, 2024

You're ok. I just wanted to check if there was something waiting on me.

@baby230211
Copy link
Contributor Author

baby230211 commented Oct 16, 2024

Hello @epage , i would like to re-work on this PR recently.
Due to recording the manifest lock file is not related to tracking the target/, i wonder if it's fine to not to track that lock file here.

@baby230211 baby230211 marked this pull request as ready for review October 20, 2024 04:28
@baby230211 baby230211 changed the title [WIP] feat(gc): record workspace manifest and target dir in global cache tracker feat(gc): record workspace manifest and target dir in global cache tracker Oct 20, 2024
@epage
Copy link
Contributor

epage commented Oct 24, 2024

Lockfile tracking was mostly intended to prep for #13137. We'll likely need to database migrations anyways so probably fine to skip for now so long as we understand how we would support it in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-infrastructure Area: infrastructure around the cargo repo, ci, releases, etc. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Garbage collect whole target/
5 participants