Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: creating files logs information for use in chisel.db #105

Merged
merged 27 commits into from
Feb 5, 2024

Conversation

letFunny
Copy link
Collaborator

@letFunny letFunny commented Dec 5, 2023

  • Have you signed the CLA?

When using the fsutil.Creator, it will store information about the files and directories created. This will then be integrated into a report which will be part of chisel.db in a separate PR.

When creating files as part of the cut command, we will use a proxy that
logs information about the hash, path, mode, etc. This will then be
ingrated into a report which will be part of chisel.db.
@letFunny letFunny changed the title Chisel db proxy Add proxy for use in chisel.db Dec 5, 2023
@letFunny letFunny marked this pull request as ready for review December 5, 2023 09:44
@letFunny letFunny changed the title Add proxy for use in chisel.db feat: add proxy for use in chisel.db Dec 8, 2023
Copy link
Member

@rebornplusplus rebornplusplus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks nice. I left a few comments and suggestions. Let me know what you think.

internal/fsutil/create_test.go Outdated Show resolved Hide resolved
internal/fsutil/proxy_test.go Outdated Show resolved Hide resolved
internal/fsutil/create.go Outdated Show resolved Hide resolved
internal/slicer/report.go Outdated Show resolved Hide resolved
internal/slicer/report.go Outdated Show resolved Hide resolved
internal/slicer/report.go Outdated Show resolved Hide resolved
internal/slicer/report.go Outdated Show resolved Hide resolved
internal/slicer/slicer.go Outdated Show resolved Hide resolved
internal/deb/extract.go Outdated Show resolved Hide resolved
@letFunny letFunny changed the title feat: add proxy for use in chisel.db feat: creating a file logs the hash for use in chisel.db Jan 16, 2024
Copy link
Member

@rebornplusplus rebornplusplus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks trimmed and much better now, thanks!

@@ -153,11 +153,13 @@ func Run(options *RunOptions) error {
if reader == nil {
continue
}
fsCreator := fsutil.NewCreator()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a different fsCreator for every slice?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That is one of the bits that can no longer be seen in this trimmed down version of the PR. We need to create one for each because we want to report the files created for each one. If we used the same fsCreator we would not know which slice created which file (unless we resorted to emptying the map or some other hack).

@letFunny letFunny changed the title feat: creating a file logs the hash for use in chisel.db feat: creating files logs information for use in chisel.db Jan 16, 2024
@cjdcordeiro cjdcordeiro added the Priority Look at me first label Jan 17, 2024
Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, thanks Alberto. Just a few details/questions.

@@ -25,6 +25,7 @@ type ExtractOptions struct {
TargetDir string
Extract map[string][]ExtractInfo
Globbed map[string][]string
FSCreator *fsutil.Creator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be just Creator, given the context, and although this is minor we can use a locally defined interface to allow for variation on the types, even more for testing purposes if required.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the interface but can we do it when the need arises? Right now the tests depend on the default creator and the tree dump function that inspects the filesystem.

internal/deb/extract.go Outdated Show resolved Hide resolved
internal/fsutil/create.go Outdated Show resolved Hide resolved
internal/fsutil/create.go Outdated Show resolved Hide resolved
internal/fsutil/create.go Outdated Show resolved Hide resolved
internal/fsutil/create.go Outdated Show resolved Hide resolved
internal/fsutil/create.go Outdated Show resolved Hide resolved
internal/fsutil/create_test.go Outdated Show resolved Hide resolved
internal/fsutil/create_test.go Show resolved Hide resolved
Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one trivial, and a question about the organization, but otherwise this seems ready for merging.

@@ -58,8 +59,12 @@ func Extract(pkgReader io.Reader, options *ExtractOptions) (err error) {
if err != nil {
return err
}
validOpts := *options
if validOpts.Creator == nil {
validOpts.Creator = fsutil.NewCreator()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem great as it's doing a shallow copy and we have no idea of the consequences. If there's intended mutation going on to the values of the options, they won't be moved out anymore. If on the other hand, the goal is to not mutate, this won't help in cases where the fields are pointers.

There are multiple options here, but the easiest here seems to be simply forcing the call site to provide the creator. We don't have many call sites anyway, and tests are already doing this in a custom way I believe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. I added it because of your comment about the panic but I agree that, because it is an internal function, we can just add it in each call site.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a bit of a bind because it'd be fine to either 1) Use a default global value that doesn't increase in memory size over time, or 2) Use a local value that is cheap to create. I'm fine with either, but would prefer to not be creating maps, populating maps, and then destroying (GCing) them on every iteration of a loop. Seems wasteful for no greater good.

@@ -84,3 +123,20 @@ func createSymlink(o *CreateOptions) error {
}
return os.Symlink(o.Link, o.Path)
}

// readerProxy implements the io.Reader interface proxying the calls to its inner io.Reader. On each read, the proxy
// calculates the file size and hash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/calculates/keeps track of/

@@ -137,7 +137,7 @@ func extractTar(dataReader io.Reader, targetDir string) error {

//debugf("Extracting header: %#v", tarHeader)

err = fsutil.Create(&fsutil.CreateOptions{
err = fsutil.NewCreator().Create(&fsutil.CreateOptions{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in a loop, creating and discarding the creator and its map on every iteration, not even keeping track of it. Is the idea here that the follow up PR will unify all of these calls and track these properly?

Copy link
Collaborator Author

@letFunny letFunny Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that you have one creator per slice to know what files were created for each one. The other option will be cleaning the creator.Created map each iteration to know which files come from the new slice, I think that is more convoluted to follow. From the report PR:

<loop over slice>
                creator := fsutil.NewCreator()
		err := deb.Extract(reader, &deb.ExtractOptions{
			Package:   slice.Package,
			Extract:   extract[slice.Package],
			TargetDir: targetDir,
			Globbed:   globbedPaths,
			Creator:   creator,
		})
		for _, entry := range creator.Created {
			report.AddEntry(slice, entry)
		}
<end of loop>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine. What's a bit awkward is the case above where we have a tight loop around individual file extraction using a new creator every time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change that to the outer possible scope, you are right.

@@ -153,11 +153,13 @@ func Run(options *RunOptions) error {
if reader == nil {
continue
}
creator := fsutil.NewCreator()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@@ -211,7 +213,8 @@ func Run(options *RunOptions) error {
return fmt.Errorf("internal error: cannot extract path of kind %q", pathInfo.Kind)
}

err := fsutil.Create(&fsutil.CreateOptions{
creator := fsutil.NewCreator()
err := creator.Create(&fsutil.CreateOptions{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and here.

Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the replies above as it may be relevant for upcoming PRs, but this seems fine as it is for now. Thanks!

@niemeyer niemeyer merged commit 11a324c into canonical:main Feb 5, 2024
14 checks passed
@letFunny letFunny deleted the chisel-db-proxy branch October 17, 2024 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority Look at me first
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants