Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow per project cache #15

Closed
buchgr opened this issue Mar 6, 2018 · 5 comments · Fixed by #339
Closed

Allow per project cache #15

buchgr opened this issue Mar 6, 2018 · 5 comments · Fixed by #339
Milestone

Comments

@buchgr
Copy link
Owner

buchgr commented Mar 6, 2018

@BenTheElder mentioned to me that k8s is using a different remote cache for different projects and they distinguish between projects via a path component in the URL:

https://cache.com/<project name>/(ac|cas)/

I think that's a good idea and I believe @nicolov already laid the foundations for that.

@buchgr buchgr added this to the 1.0 milestone Jan 12, 2019
@bayareabear
Copy link
Contributor

That's a good feature. But it would be helpful to make it optional with a flag, as there is use case for different projects sharing cache objects

@gjasny
Copy link
Contributor

gjasny commented May 26, 2019

I started a prove-of-concept patch for on-disk storage here: https://github.com/gjasny/bazel-remote/tree/project_name

I noticed that for cas objects the cache enforces that the key equals the hashed content, so adding project name support there does not make much sense.

@BenTheElder
Copy link

the CAS only needs to enforce this for the last segment in the URL, so it's pretty easy to do this. there's an implementation here that inspired this issue: https://github.com/kubernetes/test-infra/tree/master/greenhouse

gjasny added a commit to gjasny/bazel-remote that referenced this issue Dec 28, 2019
gjasny added a commit to gjasny/bazel-remote that referenced this issue Dec 28, 2019
mostynb added a commit to mostynb/bazel-remote that referenced this issue Aug 25, 2020
In some client setups, untracked local files can be used by an action
without being included in the Action message, which causes action cache
collisions: bazelbuild/bazel#4558

Ideally this should be fixed on the client side (either in the client,
or in the build configuration), but it is not always easy to do in
practice.

As a workaround, this patch adds a setting to mangle ActionCache keys
with the instance name provided by the client (if it is not empty), to
produce a new ActionCache key. Clients are then able to specify a
different instance name whenever a change is made that could affect
these untracked inputs. The instance name value could be something like
the hash of the compiler version. This allows multiple ActionCache items
to exist in the cache, without requiring a change to the on-disk storage
format.

This feature is disabled by default, since it would cause cache
invalidations for existing users.

Fixes buchgr#15.
mostynb added a commit that referenced this issue Aug 28, 2020
In some client setups, untracked local files can be used by an action
without being included in the Action message, which causes action cache
collisions: bazelbuild/bazel#4558

Ideally this should be fixed on the client side (either in the client,
or in the build configuration), but it is not always easy to do in
practice.

As a workaround, this patch adds a setting to mangle ActionCache keys
with the instance name provided by the client (if it is not empty), to
produce a new ActionCache key. Clients are then able to specify a
different instance name whenever a change is made that could affect
these untracked inputs. The instance name value could be something like
the hash of the compiler version. This allows multiple ActionCache items
to exist in the cache, without requiring a change to the on-disk storage
format.

This feature is disabled by default, since it would cause cache
invalidations for existing users.

Fixes #15.
@sitsofe
Copy link

sitsofe commented Sep 8, 2020

@mostynb Looking through some logs with this in place I see the following:

020/09/08 11:44:29 REMAP AC HASH ed3360a7998f204de2c2f3311a2d8563d4f87bdfb9b4b442daee115f9c0bdb8a : centos-v2-Amd => 4699610a00158646b15c49ae315da98c658db3cd2b0f376cc48de0c65f692993
2020/09/08 11:44:29  PUT 200     X.X.X.X /centos-v2-Amd/ac/ed3360a7998f204de2c2f3311a2d8563d4f87bdfb9b4b442daee115f9c0bdb8a
2020/09/08 11:44:29 S3 UPLOAD bazel-remote-prod bazel/cas/929d6d967df38d7d70128171a34ea8794603e05f5689d7996d06639eb05375b7 OK
2020/09/08 11:44:29 REMAP AC HASH a9c38c68068a703d5a3bcbba3729d0f21d78f515ef91c88fb5ebf798389caeaa : centos-v2-Amd => 105a6b03847b7df71ba1f2c7f3769f88
69405aa859abf4965344ef1e81cb7902
2020/09/08 11:44:29 S3 UPLOAD bazel-remote-prod bazel/cas/ed3360a7998f204de2c2f3311a2d8563d4f87bdfb9b4b442daee115f9c0bdb8a OK
  • 4699610a00158646b15c49ae315da98c was mangled to ed3360a7998f204de2c2f3311a2d8563d4f87bdfb9b4b442daee115f9c0bdb8a
  • PUT 200 X.X.X.X /centos-v2-Amd/ac/ed3360[...] happens because the original request path is what is logged and that doesn't contain the mangled hash
  • 2020/09/08 11:44:29 S3 UPLOAD bazel-remote-prod bazel/cas/ed3360[...] OK is the one I don't understand. Why would we want to put the object to S3 with the original hash rather than a mangled one?

@mostynb
Copy link
Collaborator

mostynb commented Sep 8, 2020

@sitsofe: 2020/09/08 11:44:29 S3 UPLOAD bazel-remote-prod bazel/cas/ed3360[...] OK is a "CAS" blob (identified by the SHA256 of the data), which we do not mangle because the risk of key collision is extremely small.

This flag only mangles "AC" hash keys, which are prone to key collisions if the client can't hash all of the inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants