Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose to the user the commit_key of the latest GC #2110

Merged
merged 4 commits into from
Oct 12, 2022

Conversation

icristescu
Copy link
Contributor

on top of #2102.

In order to add the commit offset to the control file, I added a version v4 for the control file and implemented what I understood by virtual migration from v3 to v4.

I'm not sure about many choices I made in this PR, including the migration, but also the different names.

src/irmin-pack/unix/control_file_intf.ml Outdated Show resolved Hide resolved
src/irmin-pack/unix/ext.ml Outdated Show resolved Hide resolved
@icristescu icristescu force-pushed the expose_gc_commit branch 3 times, most recently from aa91cc5 to 9334de9 Compare October 7, 2022 17:34
@icristescu icristescu force-pushed the expose_gc_commit branch 2 times, most recently from c3b98d6 to ad95a3a Compare October 10, 2022 13:03
Comment on lines 386 to 408
let upgrade_v3_to_v4 : Control_file.Payload_v3.t -> Payload.t =
fun pl ->
let status =
match pl.status with
| From_v1_v2_post_upgrade x -> Payload.From_v1_v2_post_upgrade x
| From_v3_no_gc_yet -> No_gc_yet
| From_v3_used_non_minimal_indexing_strategy ->
Used_non_minimal_indexing_strategy
| From_v3_gced x ->
Gced
{
suffix_start_offset = x.suffix_start_offset;
generation = x.generation;
oldest_live_commit_offset = x.suffix_start_offset;
}
| _ -> assert false
in
{
dict_end_poff = pl.dict_end_poff;
suffix_end_poff = pl.suffix_end_poff;
status;
upgraded_from_v3_to_v4 = true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't this logic live in the control file since it doesn't depend on anything beyond its own scope?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes true, oldest_live_commit_offset = x.suffix_start_offset; is an operation that is local to the control file. I told @icristescu to move the upgrade out of the CF because I expected a more complicated process that needed to do stuff with other files.

Let's move that back to CF

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad I didn't miss something here. Happy it can live in the CF!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it back in CF

@@ -250,12 +250,12 @@ module Maker (Config : Conf.S) = struct
(Irmin.Type.to_string XKey.t commit_key))
| Some (k, _kind) -> Ok k)
in
let offset =
let end_offset, oldest_live_commit_offset =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR but I lean towards thinking this logic should live in Gc.v since it can be calculated from the commit_key that we are already passing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I actually did this change with your previous comments in mind :) (#2102 (comment)) we need the same logic for both end_offset and oldest_live_commit_offset, both of which can be computed from commit_key. I would say either both computations are done in Gc.v or none. wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, yeah I've been looking at this code 🧐 since that comment. In my mind, it would be great to move as much GC-specific code calculations as we can out of ext and into gc (even possibly the code preceding this that checks the commit key validity and the check to the file manager). So, I'd say both in Gc.v.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved end_offset and oldest_live_commit_offset, but not check of the commit key. There are some other checks done in start, so I think it makes sense to do all those checks in one place. Feel free to change it in subsequent PRs though.

generation = x.generation;
oldest_live_commit_offset = x.suffix_start_offset;
}
| _ -> assert false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is typically the case where we want to exhaustively list the remaining tags instead of using a wildcard for the day we will change the definition. However, it is really not expected that we will touch the V3 definition again, so it's OK to leave it that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, not sure why I didn't, thanks

@codecov-commenter
Copy link

codecov-commenter commented Oct 11, 2022

Codecov Report

Merging #2110 (277eddc) into main (46e46e0) will increase coverage by 0.04%.
The diff coverage is 83.33%.

❗ Current head 277eddc differs from pull request most recent head e9292da. Consider uploading reports for the commit e9292da to get more accurate results

@@            Coverage Diff             @@
##             main    #2110      +/-   ##
==========================================
+ Coverage   64.79%   64.83%   +0.04%     
==========================================
  Files         133      133              
  Lines       15851    15881      +30     
==========================================
+ Hits        10271    10297      +26     
- Misses       5580     5584       +4     
Impacted Files Coverage Δ
src/irmin-pack/version.ml 44.44% <25.00%> (-1.02%) ⬇️
src/irmin-pack/unix/control_file.ml 84.00% <83.33%> (-2.49%) ⬇️
src/irmin-pack/unix/ext.ml 62.15% <85.71%> (+0.20%) ⬆️
src/irmin-pack/unix/file_manager.ml 83.95% <95.00%> (-0.06%) ⬇️
src/irmin-pack/unix/control_file_intf.ml 85.71% <100.00%> (-14.29%) ⬇️
src/irmin-pack/unix/dispatcher.ml 75.00% <100.00%> (ø)
src/irmin-pack/unix/gc.ml 82.14% <100.00%> (+0.49%) ⬆️
src/irmin-test/store.ml 94.95% <0.00%> (+0.05%) ⬆️
... and 4 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Comment on lines 330 to 332
(* Stores which support gc, also have a length in the header
of all their entries. *)
assert false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stores which support gc, also have a length in the header of all their entries.

That sentence is unfortunately not true:

  • we will in the near future support GC for old Tezos archive nodes that have V1 entries and
  • the irmin-pack functor config allows contents with no length stored.

I believe that the way forward is:

  • to change the comment to state that this branch is reachable only in the unlikely event that a V1 commit was the target of a GC and
  • to implement this branch by looking into index for the key of the commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment was untrue, I changed it, thanks.

Are we going to support GC on with V1 commits as target? I missed that. The code is not ready for this I think, so I propose we change this when we do add the Gc support on V1.

the irmin-pack functor config allows contents with no length stored.

I changed the comment to only mention commits (which are the target of this operation).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to support GC on with V1 commits as target

Nothing would prevent an irmin-pack user to launch a GC targeting a V1 commit. However, it will not occur in Tezos because V1 commits haven't been generated for months and tezos only targets commits ~15 days in the past.

However, we might not have to handle this for commit objects but we will have to handle it for inode objects. A recent block most likely still targets 1 y/o inodes.

The code is not ready for this I think

I checked and it's partially handled. I think we should open an issue to not forget.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing would prevent an irmin-pack user to launch a GC targeting a V1 commit. However, it will not occur in Tezos ..

I think we are in agreement, that we will not add support for this (i.e. calling gc on V1 commits)? (I was only referring to V1 commits in my previous comment, as this code only targets commits).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I'm being unclear.

On master:

  • The code supports V1 gc-targets but the CF forbids it thanks to the from_v1_v2 tag
  • The code doesn't support V1 object in the tree of the GC target.

After your PR:

  • The code doesn't support V1 gc-targets, but that's OK since the CF already forbits this
  • Same for V1 inodes/contents

When implementing the lower layer:

  • We will have to choose if we support V1 gc-targets. Either we do or we raise a new dedicated error.
  • We will have to support V1 inodes/contents in the tree of the GC target

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks

src/irmin-pack/unix/s.ml Outdated Show resolved Hide resolved
src/irmin-pack/unix/gc.ml Outdated Show resolved Hide resolved
@icristescu icristescu merged commit af984bb into mirage:main Oct 12, 2022
@icristescu icristescu deleted the expose_gc_commit branch October 12, 2022 10:42
@metanivek metanivek added this to the Irmin 3.5 milestone Oct 13, 2022
metanivek added a commit to metanivek/opam-repository that referenced this pull request Dec 14, 2022
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.5.0)

CHANGES:

### Added

- **irmin-pack**
  - Add `Irmin_pack_unix.Stats.Latest_gc` which is now the parameter of GC's
    `finished` callback (mirage/irmin#2089, @Ngoguey42)
  - Add `Gc.oldest_live_commit` which returns the key of the commit on which the
    latest gc was called on. (mirage/irmin#2110, @icristescu)
  - Add `split` to create a new suffix chunk. Subsequent writes will append to
    this chunk until `split` is called again. (mirage/irmin#2118, @icristescu)
  - Add `create_one_commit_store` to create a new store from the existing one,
    containing only one commit. (mirage/irmin#2125, @icristescu)

### Changed

- **irmin-pack**
  - Upgraded on-disk format to version 4. (mirage/irmin#2110, @icristescu)
  - Detecting control file corruption with a checksum (mirage/irmin#2119, @art-w)
  - Change on-disk layout of the suffix from a single file to a multiple,
    chunked file design (mirage/irmin#2115, @metanivek)
  - Modify GC to work with new chunked suffix. See `examples/gc.ml` for a
    demonstration of how it works with the new `split` function. (mirage/irmin#2126,
    @metanivek)

### Fixed
metanivek added a commit to metanivek/opam-repository that referenced this pull request Dec 14, 2022
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.5.0)

CHANGES:

### Added

- **irmin-pack**
  - Add `Irmin_pack_unix.Stats.Latest_gc` which is now the parameter of GC's
    `finished` callback (mirage/irmin#2089, @Ngoguey42)
  - Add `Gc.oldest_live_commit` which returns the key of the commit on which the
    latest gc was called on. (mirage/irmin#2110, @icristescu)
  - Add `split` to create a new suffix chunk. Subsequent writes will append to
    this chunk until `split` is called again. (mirage/irmin#2118, @icristescu)
  - Add `create_one_commit_store` to create a new store from the existing one,
    containing only one commit. (mirage/irmin#2125, @icristescu)

### Changed

- **irmin-pack**
  - Upgraded on-disk format to version 4. (mirage/irmin#2110, @icristescu)
  - Detecting control file corruption with a checksum (mirage/irmin#2119, @art-w)
  - Change on-disk layout of the suffix from a single file to a multiple,
    chunked file design (mirage/irmin#2115, @metanivek)
  - Modify GC to work with new chunked suffix. See `examples/gc.ml` for a
    demonstration of how it works with the new `split` function. (mirage/irmin#2126,
    @metanivek)

### Fixed
@irmaTS irmaTS added tezos-support Support for bugs related to Tezos users/tezos Related to Tezos users labels Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tezos-support Support for bugs related to Tezos users/tezos Related to Tezos users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants