Implement persistent caching in SourceMgr #431

fabulous-gopher · 2017-04-21T14:51:07Z

From @sdboyer on December 30, 2016 11:46

With additional testing being done as part of the new tool, it's increasingly clear that we really, really need to get persistent, disk-backed caching into place. There are four basic things we need to cache:

Version lists - all the versions available for a source
Manifest and Lock - for a given rev and a particular analyzer, these should be permanently cacheable
PackageTree - for a given rev and gps version, these should be permanently cacheable
Project root computations - those that require a go-get metadata query

The biggest immediate question is how to actually store it on disk. For the first three, at least, discrete files would probably work. However, I'm inclined to store everything together using bolt. That'll cover the last case more easily, especially because it is likely to require prefix queries; it'll also probably make it a bit cleaner to store TTLs.

(supercedes #8)

Copied from original issue: sdboyer/gps#130

The text was updated successfully, but these errors were encountered:

fabulous-gopher · 2017-04-21T14:51:08Z

From @sdboyer on April 4, 2017 14:36

With #196 finally done, the road to this is pretty clear. There's even a TODO exactly where a handle to it should be dropped in 😄

We do need to pick a storage lib for it. LevelDB would be fine, as would Bolt.

jmank88 · 2017-06-07T12:59:20Z

I can work on this issue.

I've started sketching out a Bolt singleSourceCache implementation, and the main open question is how to encode the complex structs/interfaces. @sdboyer Do you have any ideas/preferences for serializing: Manifest/RootManifest, Lock, pkgtree.PackageOrErr, and UnpairedVersion (and their underlying types - e.g. Constraint, maybe typedString())?

jmank88 · 2017-06-07T13:58:45Z

Looking at dep's rawManifest for inspiration makes me think that most of the underlying types can be cleanly converted to string fields anyways, so I'll experiment with how far I can get with native Bolt structure (nested buckets, and/or composite keys ending in field name suffixes), before introducing another encoding layer.

sdboyer · 2017-06-08T00:03:33Z

AMAZING!! 🎉 🎉 🎉 🎊 - this is something that would make a huge difference for users, and i've got nothing even close to the time required to work on it.

Looking at dep's rawManifest for inspiration makes me think that most of the underlying types can be cleanly converted to string fields anyways

Unfortunately, while that's adequate for a written file manifest/lock's purposes (by design), it likely won't be for gps' internal purposes. There are two issues I can think of off the top of my head:

branchVersion has the default field, which is not expressed in even typedString() output. We could add it, but...
semver.Version has an original field that...well, i guess that the String() method prints it out, but that field has bitten me so many times in subtle ways that i just don't trust it anymore. (OTOH, i'm not sure there's anything we can do EXCEPT stringify that, as semver doesn't export those fields).

And that's just with the versions. I generally tried not to allow such gotchas, but it wasn't a design goal that I explicitly kept track of, so I can't say with certainty that there aren't more such issues.

Notwithstanding all that, I think it's still the right call to prototype this with strings, where it works. Layering in an encoding system later is not difficult. And, more generally, I'd strongly prefer to keep this proceeding in smaller chunks with smaller PRs (even if they have no immediate effect), rather than building up something huge and unreviewable.

jmank88 · 2017-08-28T12:36:50Z

@sdboyer Have you given any thought to how the cache timeout should be configured/specified? My working branch is currently using a time.Duration -cache-age flag, but this PR/comment is making me wonder about environment vars

sdboyer · 2017-08-28T18:25:19Z

@jmank88 i'm avoiding thinking about TTLs thus far, as the information i'd like to see in the cache for the first pass should, i think, all be permanently cacheable by revision. only stuff like the version list lookups would be...oh wait, ugh. i guess that's unavoidable that we have to do something with it now.

for now, let's just do something quick and dirty. like a 60s cache. initially, at least, i'd rather we not expose a choice around this to users.

is it possible that we get some incremental PRs in, and do some branching by abstraction/feature flags, even if they're internal ones?

jmank88 · 2017-08-28T20:53:49Z

for now, let's just do something quick and dirty. like a 60s cache. initially, at least, i'd rather we not expose a choice around this to users.

I thought users would at least need/want some way of running w/o the cache, or to force a clean cache, but one minute is a lot shorter than I was imagining.

is it possible that we get some incremental PRs in, and do some branching by abstraction/feature flags, even if they're internal ones?

I've been trying to get incremental PRs in as fast as I can. Currently blocked by #1024 and #1025.
The WIP branch as currently written does not operate as a hidden feature flag, but passing a -cache-age < 0 will suppress the persistent cache and instantiate the in-mem cache only, as master does now. This flag should eventually default to a positive value (e.g. 24hr now or 1m as above), but could start with a negative default so the new code doesn't run unless requested.

jmank88 · 2017-09-25T14:30:22Z

I'm seeing speed up in every area of 'solver wall time', except for b-list-versions (most of the cost, unfortunately). I believe that the problem is that the current code doesn't even have the opportunity to use the cached data. I discovered that requiring sourceIsSetUp necessarily hits the network the first time, for all source types (no matter the state of the cache dir, or the bolt db). It triggers maybeSource.try, which for git calls listVersions, while the other types perform a (less costly) ping.

@sdboyer Can sourceIsSetUp be verified without listing versions? Is it even necessary to ping at all?

sdboyer · 2017-10-01T01:56:54Z

@jmank88 it's not strictly necessary to reach out and ping - we should be able to operate solely on information already in the cache (or none at all). this is basically a deficiency in the design of the FSM governing source interactions, but i opted for it initially just to simplify the model. there are two major aspects of that simplification.

first, right now, the only way to know which of the possible maybeSource URLs to use is to try them. in general, that entails network interactions. if we take a repo existing in the sources cache locally as an indication of success, then it opens the door to the possibility of network failure later. to preserve equivalence to existing behavior, we would then have to fall back to more maybeSource-like searching behavior.

but, this messes with a bunch of other things. for one, the sourceCoordinator is currently built around the idea that, if it gets a sourceGateway back for a given input, that will definitively be the sourceGateway moving forward, and that all URL mappings to it kept by the sourceCoordinator are correct. if those URLs change underfoot based on the adaptive try-more-maybeSources logic, it breaks this assumption. (i do have a solution in mind for this that involves a bit of refactoring and abstraction, though)

i think there's also a number of other assumptions woven into there that i can't even conjure in the abstract - it'd take me digging in to really see them, and i'd no doubt discover more along the way. the assurance of there being exactly one object in memory responsible for managing each unique location on disk seems like a major one.

the second big simplification is that just always requiring us to hit the net kept us in the comparatively-simpler-to-think-about world where we were really only ever operating on the most current version of upstream's reality. purely local information mostly just doesn't matter in dep today. this allowed me to defer on a bunch of trickier modeling questions that, while quite crucial in the long run (if we can't trust local reality, then dep is still crippled by left-pad events, even when code is committed to vendor - the very thing that's supposed to protect against that), i adjudged to not be strictly required for the first round of work.

prior to the introduction of persistent caching, the benefits of relying on local realities were also far fewer, and didn't outweigh the costs. now that you're working on persistent caching, though, the balance there has changed, and it's worth looking at this seriously.

side note: another benefit to revisiting the FSM and not requiring network activity (or even a local repo) will be that we can teach sourceGateway.exportVersionTo() to grab tarballs from hosting providers that support it directly, rather than requiring a full clone and working from that. that'll be huge in CI for projects that don't commit vendor.

jmank88 · 2017-10-07T15:34:22Z

I did some preliminary refactoring and I'm seeing much better results now. Here are some example solver times (with the insignificant items omitted) from running dep ensure -v --no-vendor on cockroachdb.

Without cache:

Solver wall times by segment:
     b-list-versions: 1m38.935666232s
         b-list-pkgs:   11.796190703s
              b-gmal:    5.107555114s
     b-source-exists:    2.171482904s
...
  TOTAL: 1m58.564169279s

With cache:

Solver wall times by segment:
     b-list-versions: 54.052503576s
     b-source-exists:   2.26689633s
         b-list-pkgs:  436.869786ms
              b-gmal:  296.971425ms
...
  TOTAL: 57.600790947s

I still haven't fully solved the case where the mapped url changes. I'll try to isolate some of the other changes into separate PRs first.

sdboyer · 2017-10-08T15:31:26Z

a bit weird that there'd be so much of a drop in b-list-versions here, but hey, it's good to see the improvements 😄

tbh, i'm a bit surprised we're not seeing numbers in the <100ms, or even <10ms range. not a huge concern for right now, though - we're still in the run/right phases of "make it run, make it right, make it fast" within the scope of this particular feature. for now, i'll take <1s and be happy 😄

jmank88 · 2017-10-08T19:44:30Z

Indeed, after more tinkering I'm seeing total runtime around 1m w/o cache and 10s w/ cache, so I suspect that there were superfluous and/or redundant network calls being made and we'll get some speedup even before integrating the cache.

sdboyer · 2017-10-08T21:12:06Z

huh. redundant network calls in the currently-active logic, or in part of the new layer you're introducing? (what did you change when tinkering?)

jmank88 · 2017-10-08T22:35:52Z

I'm working on isolating it now to find out for sure, but I have only been messing with the gateways, not the cache itself.

sdboyer · 2017-10-09T11:25:14Z

that's (tentatively) great news. I would love nothing more than to have written some bugs that result in unnecessary net requests - that'd mean across-the-board speedups for everybody when you find them!

the huge gain here will be obviating the need for any network calls on git projects in Gopkg.lock and already in the cachedir by deferring the initial network call until sometime after the maybeSource.try() - but that's orthogonal to caching, and likely separate from what you're seeing anyway.

fabulous-gopher mentioned this issue Apr 21, 2017

Implement persistent caching in SourceMgr sdboyer/gps#130

Open

sdboyer added the performance label Apr 27, 2017

This was referenced May 9, 2017

adding in-process tests #525

Merged

Ignore dot-prefixed files and subdirs when scanning packages #551

Merged

This was referenced May 17, 2017

dep, internal/gps: Always return an error from Analyzer.DeriveManifestAndLock #595

Closed

Warn on use of abbreviated sha1 commit hash #582

Closed

[WIP, DO NOT MERGE] Use fastwalk from goimports #588

Closed

carolynvs added area: gps and removed area: gps labels May 31, 2017

sdboyer added the Epic label Jun 8, 2017

This was referenced Jun 14, 2017

Version API Method Names #750

Merged

gps: add new ProjectAnalyzerInfo type to return from ProjectAnalyzer.Info #753

Merged

add sourceCachePath helper func #754

Merged

adding TestSingleSourceCache #756

Merged

This was referenced Jun 20, 2017

Down repository prevents vendoring of other packages #767

Closed

Faster TOML parser? #789

Closed

sdboyer mentioned this issue Aug 10, 2017

support nested vendor/ dirs #985

Closed

This was referenced Aug 17, 2017

gps: source cache: interface tweaks and implementation optimizations #1023

Merged

gps: source cache: adding close() methods #1024

Merged

gps: source: requirements tweak #1025

Merged

This was referenced Aug 30, 2017

gps: source cache: add persistent BoltDB cache #1097

Closed

gps: source cache: add persistent BoltDB cache #1098

Merged

gps: source cache: protobuf integration #1127

Merged

sdboyer mentioned this issue Sep 5, 2017

Add satisfiability check for case variants #1079

Merged

jmank88 mentioned this issue Sep 7, 2017

gps: source cache: adding multiCache #1138

Merged

sdboyer mentioned this issue Sep 14, 2017

"dep ensure -update foo/bar/baz && dep prune" takes ages to complete #1170

Closed

jmank88 mentioned this issue Sep 25, 2017

gps: source cache: memory cache optimization #1215

Merged

jmank88 mentioned this issue Oct 9, 2017

gps: fine grained source transitions #1250

Merged

sdboyer mentioned this issue Nov 28, 2017

Allow custom names to be used #1267

Closed

jmank88 mentioned this issue Dec 12, 2017

gps: remove unused gopkgin code #1447

Closed

jmank88 mentioned this issue Jan 18, 2018

Feature Request: Don't fetch source updates if needed revision is found locally #1546

Closed

jmank88 mentioned this issue Feb 4, 2018

dep status runs significantly slower in a docker container on a mac #1624

Closed

jmank88 mentioned this issue Feb 16, 2018

Extreme parallelism used for subcommands bombs Raspberry Pi out of memory #1689

Closed

jmank88 mentioned this issue Feb 24, 2018

gps: source cache: enable opt-in persistent caching via DEPCACHEAGE env var #1711

Merged

jmank88 mentioned this issue Jul 11, 2018

Document DEPCACHEAGE in env-vars.md #1939

Merged

kevinburke closed this as completed Jun 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement persistent caching in SourceMgr #431

Implement persistent caching in SourceMgr #431

fabulous-gopher commented Apr 21, 2017

fabulous-gopher commented Apr 21, 2017

jmank88 commented Jun 7, 2017

jmank88 commented Jun 7, 2017

sdboyer commented Jun 8, 2017

jmank88 commented Aug 28, 2017

sdboyer commented Aug 28, 2017

jmank88 commented Aug 28, 2017

jmank88 commented Sep 25, 2017 •

edited

Loading

sdboyer commented Oct 1, 2017

jmank88 commented Oct 7, 2017

sdboyer commented Oct 8, 2017 •

edited

Loading

jmank88 commented Oct 8, 2017

sdboyer commented Oct 8, 2017

jmank88 commented Oct 8, 2017

sdboyer commented Oct 9, 2017

Implement persistent caching in SourceMgr #431

Implement persistent caching in SourceMgr #431

Comments

fabulous-gopher commented Apr 21, 2017

fabulous-gopher commented Apr 21, 2017

jmank88 commented Jun 7, 2017

jmank88 commented Jun 7, 2017

sdboyer commented Jun 8, 2017

jmank88 commented Aug 28, 2017

sdboyer commented Aug 28, 2017

jmank88 commented Aug 28, 2017

jmank88 commented Sep 25, 2017 • edited Loading

sdboyer commented Oct 1, 2017

jmank88 commented Oct 7, 2017

sdboyer commented Oct 8, 2017 • edited Loading

jmank88 commented Oct 8, 2017

sdboyer commented Oct 8, 2017

jmank88 commented Oct 8, 2017

sdboyer commented Oct 9, 2017

jmank88 commented Sep 25, 2017 •

edited

Loading

sdboyer commented Oct 8, 2017 •

edited

Loading