Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(toolkit): ensure consistent zip files #2931

Merged
merged 7 commits into from
Jun 21, 2019

Conversation

jogold
Copy link
Contributor

@jogold jogold commented Jun 19, 2019

Zip files were not consistent across deploys resulting in unnecessary S3 uploads and stack updates.

Ensure consistency by appending files in series (guarantees file ordering in the zip) and reseting
dates (guarantees same hash for same content).

Closes #1997, Closes #2759


Pull Request Checklist

  • Testing
    • Unit test added (prefer not to modify an existing test, otherwise, it's probably a breaking change)
    • CLI change?: coordinate update of integration tests with team
    • cdk-init template change?: coordinated update of integration tests with team
  • Docs
    • jsdocs: All public APIs documented
    • README: README and/or documentation topic updated
    • Design: For significant features, design document added to design folder
  • Title and Description
    • Change type: title prefixed with fix, feat and module name in parens, which will appear in changelog
    • Title: use lower-case and doesn't end with a period
    • Breaking?: last paragraph: "BREAKING CHANGE: <describe what changed + link for details>"
    • Issues: Indicate issues fixed via: "Fixes #xxx" or "Closes #xxx"
  • Sensitive Modules (requires 2 PR approvers)
    • IAM Policy Document (in @aws-cdk/aws-iam)
    • EC2 Security Groups and ACLs (in @aws-cdk/aws-ec2)
    • Grant APIs (only if not based on official documentation with a reference)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license.

Zip files were not consistent across deploys resulting in unnecessary S3 uploads and stack updates.

Ensure consistency by appending files in series (guarantees file ordering in the zip) and reseting
dates (guarantees same hash for same content).

Closes aws#1997, Closes aws#2759
@jogold jogold requested a review from a team as a code owner June 19, 2019 14:59
@jogold
Copy link
Contributor Author

jogold commented Jun 19, 2019

Not 100% sure how to correctly unit test this... but I can confirm that I finally have the (no changes) message all the time from the cli across deploys for stacks containing Lambda functions.

@jogold
Copy link
Contributor Author

jogold commented Jun 19, 2019

Copy link
Contributor

@rix0rrr rix0rrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on!

output.on('open', async () => {
archive.pipe(output);

const contents = await Promise.all(files.map(async (file) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will load all files into memory concurrently. Not sure I like that for very big zips (tens to hundreds of megabytes).

Also, don't see the use of doing this in parallel since this task should be I/O bound, as far as I can estimate.

Can you test and verify whether the speed makes a difference? If it does, then I would prefer some kind of batching (chop the list into batches of 10 and do those in parallel) to keep memory consumption under control.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand, will have a look at it

@rix0rrr rix0rrr self-assigned this Jun 20, 2019
@@ -2,7 +2,7 @@
"compilerOptions": {
"target":"ES2018",
"module": "commonjs",
"lib": ["es2016", "es2017.object", "es2017.string"],
"lib": ["es2016", "es2017.object", "es2017.string", "dom"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for @types/jszip

const stream = fs.createReadStream(path.join(directory, file));
archive.append(stream, {
name: file,
date: new Date('1980-01-01T00:00:00.000Z'), // reset dates to get the same hash for the same content
Copy link
Contributor Author

@jogold jogold Jun 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on my system specifying new Date(0) here seems to be translated to 1980-01-01T00:00:00.000Z for the files in the zip... this means that I cannot write a unit test for the date reset => setting to 1980-01-01T00:00:00.000Z


files.forEach(file => { // Append files serially to ensure file order
const stream = fs.createReadStream(path.join(directory, file));
archive.append(stream, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't NodeJS streams async? Are we 100% positive that after this statement, the stream has been fully read and added? Or are we sure that .finalize() will synchronously read and flush all streams? Do we need to await finalize()?

It concerns me that I don't understand the API fully, and the documentation doesn't seem to be very explicit about it.

Copy link
Contributor Author

@jogold jogold Jun 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the example in the doc: finalize the archive (ie we are done appending files but streams have to finish yet)

From the doc: Finalizes the instance and prevents further appending to the archive structure (queue will continue til drained).

I've successfully created zips containing multiple files of more than 50MB with this code (resulting in zip files of several hundreds of MB).

Looks like finalize() is a blocking statement.

zipDirectory returns a promise that is resolved only once we get the close event on output, this is fired only when archiver has been finalized and the output file descriptor has closed (output.once('close', () => ok());)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible to call archive.file() and let the library handle the read stream creation (need to check if the sort order is preserved across zips in this case).

Copy link
Contributor Author

@jogold jogold Jun 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested archive.file() and file order is not preserved across zips. If think that the only way to go here is to use archive.append() with fs.createReadStream().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, yes you are right, I had missed this:

     output.once('close', () => ok());

I'm just very scared of accidental asyncness where you're not expecting it :(

Thanks, will merge when the validation build finishes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants