Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor the cache.Cache interface #162

Merged
merged 1 commit into from
Jan 24, 2020

Conversation

mostynb
Copy link
Collaborator

@mostynb mostynb commented Jan 16, 2020

Since all the proxy backends use the same copy+pasted logic to use
the disk cache, refactor to put that functionality in the disk cache.
This simplifies the proxy backends.

Fixes #161.

@mostynb mostynb requested a review from buchgr January 16, 2020 12:10
cache/cache.go Outdated Show resolved Hide resolved
cache/cache.go Outdated Show resolved Hide resolved
cache/disk/disk.go Outdated Show resolved Hide resolved
@buchgr
Copy link
Owner

buchgr commented Jan 17, 2020

Please let me know when it's ready for review.

@mostynb mostynb force-pushed the cache_interface_refactor branch 4 times, most recently from 54dd298 to e7fc03b Compare January 20, 2020 07:04
@mostynb
Copy link
Collaborator Author

mostynb commented Jan 20, 2020

@buchgr: I think this is ready for review. I will try to stress test it some more today.

@mostynb mostynb changed the title refactor the cache.Cache interface (NOT READY TO LAND YET) refactor the cache.Cache interface Jan 20, 2020
@mostynb
Copy link
Collaborator Author

mostynb commented Jan 21, 2020

Seems to WFM building a large project like tensorflow.

@buchgr
Copy link
Owner

buchgr commented Jan 21, 2020

Did you get a chance to test the S3 and GCS implementations? If you haven't I can test the GCS implementation but I don't have an AWS account.

@mostynb
Copy link
Collaborator Author

mostynb commented Jan 21, 2020

I nave neither AWS nor GCS accounts to test- if you could test GCS that would be great.

I started working on some s3 tests with github.com/johannesboyne/gofakes3, but it seems to be broken with the minio-go client (if I Put and then Get an object, the bytes come back wrapped with some sort of signature-chunk info). I'll try setting up a minio server and do a full system test with it.

@buchgr
Copy link
Owner

buchgr commented Jan 21, 2020

I ll give GCS a try and report back.

@mostynb
Copy link
Collaborator Author

mostynb commented Jan 22, 2020

Pushed a couple of fixes and some log format changes, after testing with minio.

Building bazel-remote itself with a bazel-remote + minio works pretty well with gRPC, but it's quite a bit slower using HTTP. I need to investigate the difference.

@lukedirtwalker
Copy link
Contributor

lukedirtwalker commented Jan 23, 2020

Tested this quickly with gRPC and S3 and works fine.

@mostynb
Copy link
Collaborator Author

mostynb commented Jan 23, 2020

Tested this quickly with gRPC and S3 and works fine.

Background: Lukas was expecting cache hits from the proxy backend after removing the bazel-remote cache and restarting. This was fixed in this PR since it also proxies HEAD/Contains operations.

@buchgr
Copy link
Owner

buchgr commented Jan 24, 2020

I just tested this patch with GCS. Seems to work great too!

@mostynb
Copy link
Collaborator Author

mostynb commented Jan 24, 2020

Re the performance difference between using http and grpc between bazel and the cache, it seems to be unrelated to this change. I think it's just that uploading go stdlib (lots of small files) is much less efficient with http than with grpc:
bazel-contrib/rules_go#1531
bazelbuild/bazel#6091

So I will rebase and land this if there are no objections.

@buchgr
Copy link
Owner

buchgr commented Jan 24, 2020

So I will rebase and land this if there are no objections.

I am giving it a review right now. 1sec please :-).

Re the performance difference between using http and grpc between bazel and the cache, it seems to be unrelated to this change.

I believe the reason for that is different. The gRPC implementation in Bazel uploads all outputs of an action concurrently while the HTTP implementation does so serially. This can be fixed in Bazel.

@mostynb
Copy link
Collaborator Author

mostynb commented Jan 24, 2020

I am giving it a review right now. 1sec please :-).

Yes, of course.

I believe the reason for that is different. The gRPC implementation in Bazel uploads all outputs of an action concurrently while the HTTP implementation does so serially. This can be fixed in Bazel.

I figured that must be the case.

cache/disk/disk.go Outdated Show resolved Hide resolved
cache/disk/disk.go Outdated Show resolved Hide resolved
@buchgr
Copy link
Owner

buchgr commented Jan 24, 2020

LGTM. Thanks @mostynb

Since all the proxy backends use the same copy+pasted logic to use
the disk cache, refactor to put that functionality in the disk cache.
This simplifies the proxy backends.

Contains is now also proxied to the backend, instead of only
returning whether or not the local disk cache contains the item.

Fixes buchgr#161.
@mostynb mostynb merged commit 3f62926 into buchgr:master Jan 24, 2020
@mostynb mostynb deleted the cache_interface_refactor branch January 24, 2020 12:46
mostynb added a commit to mostynb/bazel-remote that referenced this pull request Feb 18, 2020
Confirm that we can fill the proxy backend via the disk cache, and
vice versa. Followup to buchgr#162.
mostynb added a commit to mostynb/bazel-remote that referenced this pull request Feb 18, 2020
Confirm that we can fill the proxy backend via the disk cache, and
vice versa. Followup to buchgr#162.
mostynb added a commit to mostynb/bazel-remote that referenced this pull request Feb 18, 2020
Confirm that we can fill the proxy backend via the disk cache, and
vice versa. Followup to buchgr#162.
mostynb added a commit to mostynb/bazel-remote that referenced this pull request Feb 18, 2020
Confirm that we can fill the proxy backend via the disk cache, and
vice versa. Followup to buchgr#162.
mostynb added a commit that referenced this pull request Feb 19, 2020
Confirm that we can fill the proxy backend via the disk cache, and
vice versa. Followup to #162.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor the cache.Cache interface
4 participants