Skip to content
This repository has been archived by the owner on Sep 9, 2020. It is now read-only.

internal/gps: Add prune functions to gps #1020

Merged
merged 5 commits into from
Sep 22, 2017

Conversation

ibrasho
Copy link
Collaborator

@ibrasho ibrasho commented Aug 16, 2017

What does this do / why do we need it?

Add different functions to gps that implement different pruning modes.

The different modes are as follows:

  • pruneNestedVendorDirs: Delete any directory named vendor.
  • pruneNonGoFiles: Delete any file that doesn't end with .go (will keep files matching common names for legal documents)
  • pruneGoTestFiles: Prune Go test files (any file ending with _test.go)
  • pruneUnusedPackages: Delete all files in unused Go packages. Since we might import a sub package we only delete files and keep directories.

What should your reviewer look out for in this PR?

Correctness.

Do you need help or clarification on anything?

Refer to https://github.com/sgotti/glide-vc/ for comparison and ideas.

Is adding an option to allow overriding deletion of specific files a good idea?

Which issue(s) does this PR fix?

Breaking down #952

Copy link
Collaborator

@jmank88 jmank88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct. Just a few nits.


if err := pruneEmptyDirs(baseDir, logger); err != nil {
return errors.Wrap(err, "failed to prune empty dirs")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one extra, since there is one at the end?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

// on the PruneOptions passed.
//
// A Lock must be passed if PruneUnusedPackages is toggled on.
func Prune(baseDir string, options PruneOptions, l Lock, logger *log.Logger) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How fast is this? Would it be worthwhile to try to do some of the filepath.Walks concurrently and/or combine them?

Copy link
Collaborator Author

@ibrasho ibrasho Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already asked about this in #952. I believe that we can merge them but I was wondering why they were separated in the first place.

Also, I still haven't done any benchmarking on this yet. I wanted to ensure that the behavior is correct first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll likely, though not certainly, be able to interlace the logic eventually - but i want to take it one step at a time. verification needs to be integrated first, which opens the door to partial writes of the tree. combining that with pruning will be fairly complicated, and keeping pruning standalone initially will help us walk before we run.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we are talking about the same thing. I was, and I assume @jmank88, talking about the fact that we do multiple filepath.Walk per prune option. Can we do one filepath.Walk and calculate all the files to be pruned?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhhhh, i see. sorry, i understand now. yeah, definitely different things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so, yeah, we're definitely going to need to combine the filepath.Walk() operations. maybe it's OK not to do it for this first PR, but the cost of even one walk will be order(s) of magnitude higher than any of the actual ops being performed within any of the walks.

if err := os.Remove(dir); err != nil {
return err
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could call return deleteFiles(empty). Otherwise, can the loops be combined?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should have been one loop.


if err := deleteFiles(files); err != nil {
return err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just return deleteFiles(files).


if err := deleteFiles(files); err != nil {
return err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just return deleteFiles(files).

// on the PruneOptions passed.
//
// A Lock must be passed if PruneUnusedPackages is toggled on.
func Prune(baseDir string, options PruneOptions, l Lock, logger *log.Logger) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so here's one big thing: this is currently focused on an entire dep/vendor tree. we need to split it up, so that it instead deals with a single project. this is arguably a better design in general, but whether you agree with that or not, our Gopkg.toml-driven pruning implementation will need to allow different pruning options to be specified globally and per dep. that means calling this func with different PruneOptions.

in terms of the interface, i think it may be as simple as changing from taking an l Lock to an lp LockedProject.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing per-project prune options is a good thing, but I'm wondering what implications it might have.

I'll take an attempt at that tomorrow to see what I can do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the implications should be fairly contained, but i've not thought it all the way through. hopefully not too bad 😄

@ibrasho
Copy link
Collaborator Author

ibrasho commented Sep 2, 2017

I'm back to working on this. Should update it today.

@ibrasho
Copy link
Collaborator Author

ibrasho commented Sep 5, 2017

@carolynvs Added some simple tests now.

false,
},
{
"nested-package",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉


var (
// licenseFilePrefixes is a list of name prefixes for license files.
licenseFilePrefixes = []string{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest you include "contributors" and "authors" to the file list here or in legalFileSubstrings. These files are legal in nature. For example, dep's own AUTHORS file starts with

This source code refers to The Go Authors for copyright purposes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

// PruneUnusedPackages indicates if unused Go packages should be pruned.
PruneUnusedPackages
// PruneNonGoFiles indicates if non-Go files should be pruned.
// LICENSE & COPYING files are kept for convience.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update this note to include the contributors & authors files. also, not for convenience, but "to try to comply with legal requirements".


// pruneProject remove excess files according to the options passed, from
// the lp directory in baseDir.
func pruneProject(baseDir string, lp LockedProject, options PruneOptions, logger *log.Logger) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's worth exporting this. even if we don't have immediate plans to use it in dep, it just feels shortsighted to only expose the top-level call.

if err != nil {
return errors.Wrap(err, "unexpected error while calculating unused packages")
}
fmt.Println(pkg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debugging artifact?

return files, err
}

// isPreservedFile checks if the file name idicates that the file should be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment needs cleaning up, it seems duplicated right now. also s/idicates/indicates/

func collectUnusedPackagesFiles(projectDir string, unusedPackages map[string]struct{}) ([]string, error) {
// TODO(ibrasho): is this useful?
files := make([]string, 0, len(unusedPackages))
fmt.Println(unusedPackages)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debugging artifact


// collectGoTestsFile returns a slice contains all Go test files (any files
// prefixed with _test.go) in baseDir.
func collectGoTestsFile(baseDir string) ([]string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/TestsFile/TestFiles/

// on the PruneOptions passed.
//
// A Lock must be passed if PruneUnusedPackages is toggled on.
func Prune(baseDir string, options PruneOptions, l Lock, logger *log.Logger) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so, yeah, we're definitely going to need to combine the filepath.Walk() operations. maybe it's OK not to do it for this first PR, but the cost of even one walk will be order(s) of magnitude higher than any of the actual ops being performed within any of the walks.

@@ -16,29 +16,27 @@ func stripVendor(path string, info os.FileInfo, err error) error {
return err
}

if info.Name() == "vendor" {
if _, err := os.Lstat(path); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it OK to drop this lstat?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was pretty much not doing anything. lstat has the same error responses as stat. If we already have a FileInfo then stat has succeeded.

Anyhow, I can restore it if we are unsure and don't want to break anything.

cc: @spenczar Any idea what's the original intent of this?

Copy link
Contributor

@spenczar spenczar Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibrasho Just checking in to confirm that the original purpose of this got lost after a few rewrites of the surrounding code (including my own rewrites!). Your fix here looks correct to me. 👍

@ibrasho ibrasho self-assigned this Sep 9, 2017
@ibrasho
Copy link
Collaborator Author

ibrasho commented Sep 9, 2017

@sdboyer
To improve performance, I was thinking of doing a single filepath.Walk to collect file and directory names and then running the operations on that list.

All the current rules are based on file names anyway so stating multiple times is wasteful.

@sdboyer
Copy link
Member

sdboyer commented Sep 10, 2017

@ibrasho if you think that general approach is feasible, then i'd say it's vastly preferable.

seems like it might be good to reuse dirwalk code for this purpose, once #1084 is done.

@sdboyer
Copy link
Member

sdboyer commented Sep 18, 2017

kicking CI to retest, that last one seems like an error?

i'd like to merge this soon, even if it's sorta WIP, as i think we can improve as we go - in particular, i don't think we need to fix the filepath.Walk() duplication right now. however, i do want to get the various smaller cleanup issues addressed, as it seems quite unlikely that we'll remember to grab those in a later PR.

@ibrasho
Copy link
Collaborator Author

ibrasho commented Sep 20, 2017

I've pushed a patch of changes.

I'll keep the changes related to deduplicating filepath.Walk for another PR.

@sdboyer
Copy link
Member

sdboyer commented Sep 20, 2017

ahh, some new linting errors, as we added some linters - once we have fixes for those, i think we should be ready to go with this.

@ibrasho
Copy link
Collaborator Author

ibrasho commented Sep 21, 2017

I'm not sure these linter errors are valid...

/home/travis/gopath/src/github.com/golang/dep/internal/gps/vcs_source.go:379:2: redundant if ...; err != nil check, just return error instead.
/home/travis/gopath/src/github.com/golang/dep/internal/gps/vcs_source.go:460:2: redundant if ...; err != nil check, just return error instead.
internal/gps/prune.go:21:2: const PruneNestedVendorDirs is unused (U1000)
internal/gps/prune.go:23:2: const PruneUnusedPackages is unused (U1000)
internal/gps/prune.go:27:2: const PruneNonGoFiles is unused (U1000)
internal/gps/prune.go:29:2: const PruneGoTestFiles is unused (U1000)
internal/gps/prune.go:60:6: func Prune is unused (U1000)
internal/gps/prune.go:75:6: func PruneProject is unused (U1000)
internal/gps/prune.go:106:6: func pruneNestedVendorDirs is unused (U1000)

@ibrasho ibrasho force-pushed the add-prune-to-gps branch 3 times, most recently from 56acbf0 to 45126f4 Compare September 21, 2017 22:07
@sdboyer
Copy link
Member

sdboyer commented Sep 22, 2017

ahhhhh so the errors are at least in part because this predates #1157, which changed some of the logic. however, there is one valid complaint in there, which actually points to a real bug:

internal/gps/prune.go:162:19: ToSlash is a pure function but its return value is ignored (SA4017)

once we address that, we can merge - the other complaints will go away. (if we want to be really sure, though, we can rebase)

@ibrasho
Copy link
Collaborator Author

ibrasho commented Sep 22, 2017

I just added that a few hours ago because calculateUnusedPackages had a bug on Windows, but forgot to assign it to pkg. 😓

I've rebased. Should be fixed and ready to merge now.

@sdboyer
Copy link
Member

sdboyer commented Sep 22, 2017

ok so it really seems like those travis linting failures must be erroneous, even though i can't immediately figure out how that's possible post-rebase. merging this now, and we'll keep an eye on subsequent PRs to see if we have a real problem or not.

@sdboyer sdboyer merged commit f749c0c into golang:master Sep 22, 2017
@ibrasho ibrasho deleted the add-prune-to-gps branch November 29, 2017 06:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants