-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose to the user the commit_key of the latest GC #2110
Conversation
aa91cc5
to
9334de9
Compare
c3b98d6
to
ad95a3a
Compare
src/irmin-pack/unix/file_manager.ml
Outdated
let upgrade_v3_to_v4 : Control_file.Payload_v3.t -> Payload.t = | ||
fun pl -> | ||
let status = | ||
match pl.status with | ||
| From_v1_v2_post_upgrade x -> Payload.From_v1_v2_post_upgrade x | ||
| From_v3_no_gc_yet -> No_gc_yet | ||
| From_v3_used_non_minimal_indexing_strategy -> | ||
Used_non_minimal_indexing_strategy | ||
| From_v3_gced x -> | ||
Gced | ||
{ | ||
suffix_start_offset = x.suffix_start_offset; | ||
generation = x.generation; | ||
oldest_live_commit_offset = x.suffix_start_offset; | ||
} | ||
| _ -> assert false | ||
in | ||
{ | ||
dict_end_poff = pl.dict_end_poff; | ||
suffix_end_poff = pl.suffix_end_poff; | ||
status; | ||
upgraded_from_v3_to_v4 = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't this logic live in the control file since it doesn't depend on anything beyond its own scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes true, oldest_live_commit_offset = x.suffix_start_offset;
is an operation that is local to the control file. I told @icristescu to move the upgrade out of the CF because I expected a more complicated process that needed to do stuff with other files.
Let's move that back to CF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad I didn't miss something here. Happy it can live in the CF!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved it back in CF
src/irmin-pack/unix/ext.ml
Outdated
@@ -250,12 +250,12 @@ module Maker (Config : Conf.S) = struct | |||
(Irmin.Type.to_string XKey.t commit_key)) | |||
| Some (k, _kind) -> Ok k) | |||
in | |||
let offset = | |||
let end_offset, oldest_live_commit_offset = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR but I lean towards thinking this logic should live in Gc.v
since it can be calculated from the commit_key
that we are already passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I actually did this change with your previous comments in mind :) (#2102 (comment)) we need the same logic for both end_offset
and oldest_live_commit_offset
, both of which can be computed from commit_key
. I would say either both computations are done in Gc.v
or none. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, yeah I've been looking at this code 🧐 since that comment. In my mind, it would be great to move as much GC-specific code calculations as we can out of ext and into gc (even possibly the code preceding this that checks the commit key validity and the check to the file manager). So, I'd say both in Gc.v
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved end_offset
and oldest_live_commit_offset
, but not check of the commit key. There are some other checks done in start
, so I think it makes sense to do all those checks in one place. Feel free to change it in subsequent PRs though.
ad95a3a
to
d126cbd
Compare
src/irmin-pack/unix/control_file.ml
Outdated
generation = x.generation; | ||
oldest_live_commit_offset = x.suffix_start_offset; | ||
} | ||
| _ -> assert false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is typically the case where we want to exhaustively list the remaining tags instead of using a wildcard for the day we will change the definition. However, it is really not expected that we will touch the V3 definition again, so it's OK to leave it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, not sure why I didn't, thanks
d126cbd
to
40b182f
Compare
Codecov Report
@@ Coverage Diff @@
## main #2110 +/- ##
==========================================
+ Coverage 64.79% 64.83% +0.04%
==========================================
Files 133 133
Lines 15851 15881 +30
==========================================
+ Hits 10271 10297 +26
- Misses 5580 5584 +4
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
src/irmin-pack/unix/ext.ml
Outdated
(* Stores which support gc, also have a length in the header | ||
of all their entries. *) | ||
assert false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stores which support gc, also have a length in the header of all their entries.
That sentence is unfortunately not true:
- we will in the near future support GC for old Tezos archive nodes that have V1 entries and
- the irmin-pack functor config allows contents with no length stored.
I believe that the way forward is:
- to change the comment to state that this branch is reachable only in the unlikely event that a V1 commit was the target of a GC and
- to implement this branch by looking into index for the key of the commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment was untrue, I changed it, thanks.
Are we going to support GC on with V1 commits as target? I missed that. The code is not ready for this I think, so I propose we change this when we do add the Gc support on V1.
the irmin-pack functor config allows contents with no length stored.
I changed the comment to only mention commits (which are the target of this operation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to support GC on with V1 commits as target
Nothing would prevent an irmin-pack user to launch a GC targeting a V1 commit. However, it will not occur in Tezos because V1 commits haven't been generated for months and tezos only targets commits ~15 days in the past.
However, we might not have to handle this for commit objects but we will have to handle it for inode objects. A recent block most likely still targets 1 y/o inodes.
The code is not ready for this I think
I checked and it's partially handled. I think we should open an issue to not forget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing would prevent an irmin-pack user to launch a GC targeting a V1 commit. However, it will not occur in Tezos ..
I think we are in agreement, that we will not add support for this (i.e. calling gc on V1 commits)? (I was only referring to V1 commits in my previous comment, as this code only targets commits).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I'm being unclear.
On master:
- The code supports V1 gc-targets but the CF forbids it thanks to the
from_v1_v2
tag - The code doesn't support V1 object in the tree of the GC target.
After your PR:
- The code doesn't support V1 gc-targets, but that's OK since the CF already forbits this
- Same for V1 inodes/contents
When implementing the lower layer:
- We will have to choose if we support V1 gc-targets. Either we do or we raise a new dedicated error.
- We will have to support V1 inodes/contents in the tree of the GC target
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks
14006dc
to
e9292da
Compare
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.5.0) CHANGES: ### Added - **irmin-pack** - Add `Irmin_pack_unix.Stats.Latest_gc` which is now the parameter of GC's `finished` callback (mirage/irmin#2089, @Ngoguey42) - Add `Gc.oldest_live_commit` which returns the key of the commit on which the latest gc was called on. (mirage/irmin#2110, @icristescu) - Add `split` to create a new suffix chunk. Subsequent writes will append to this chunk until `split` is called again. (mirage/irmin#2118, @icristescu) - Add `create_one_commit_store` to create a new store from the existing one, containing only one commit. (mirage/irmin#2125, @icristescu) ### Changed - **irmin-pack** - Upgraded on-disk format to version 4. (mirage/irmin#2110, @icristescu) - Detecting control file corruption with a checksum (mirage/irmin#2119, @art-w) - Change on-disk layout of the suffix from a single file to a multiple, chunked file design (mirage/irmin#2115, @metanivek) - Modify GC to work with new chunked suffix. See `examples/gc.ml` for a demonstration of how it works with the new `split` function. (mirage/irmin#2126, @metanivek) ### Fixed
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.5.0) CHANGES: ### Added - **irmin-pack** - Add `Irmin_pack_unix.Stats.Latest_gc` which is now the parameter of GC's `finished` callback (mirage/irmin#2089, @Ngoguey42) - Add `Gc.oldest_live_commit` which returns the key of the commit on which the latest gc was called on. (mirage/irmin#2110, @icristescu) - Add `split` to create a new suffix chunk. Subsequent writes will append to this chunk until `split` is called again. (mirage/irmin#2118, @icristescu) - Add `create_one_commit_store` to create a new store from the existing one, containing only one commit. (mirage/irmin#2125, @icristescu) ### Changed - **irmin-pack** - Upgraded on-disk format to version 4. (mirage/irmin#2110, @icristescu) - Detecting control file corruption with a checksum (mirage/irmin#2119, @art-w) - Change on-disk layout of the suffix from a single file to a multiple, chunked file design (mirage/irmin#2115, @metanivek) - Modify GC to work with new chunked suffix. See `examples/gc.ml` for a demonstration of how it works with the new `split` function. (mirage/irmin#2126, @metanivek) ### Fixed
on top of #2102.
In order to add the commit offset to the control file, I added a version v4 for the control file and implemented what I understood by virtual migration from v3 to v4.
I'm not sure about many choices I made in this PR, including the migration, but also the different names.