Binary snapshots #489

lasernoises · 2024-05-13T15:00:54Z

This is a relatively basic implementation of binary snapshots. I decided to adapt my implementation mentioned in #196 (comment) to handle binary files in general and remove the whole thing that prints images to the terminal.

The basic approach is store a file next to the snapshot file with the provided extension appended. The snapshot macro gets passed a closure with an &mut File parameter. Just using a Vec<u8> would be somewhat simpler in implementation, but I would be worried about memory usage if there's a bunch of tests running with large files.

I decided not to return a Result from the closure, because if there's an I/O error inside it probably makes sense just to panic there because we're inside a test anyway. But maybe it would still make sense to return a Result there just so there is no need for .unwrap() inside the closure?

Currently it directly writes the file at the location where it will end up at the beginning of the assert_snapshot function. This means the binary file is created as snap.new.$extension much sooner than the metadata is saved because we need the file to be able to compare it. What is a bit weird is if there is a crash somewhere between overwriting the binary file and storing the .snap file it will leave the two files in an inconsistent state. Creating it as .snap.new.$extension.tmp and then moving it when saving the metadata might be a good alternative approach.

There is also a problem if the extension changes between two runs where the old pending binary file will stay around. It doesn't seem to a problem in practice though because when running with cargo insta test because cargo-insta removes the old pending snapshots first. Maybe it would make sense to also do that generally at the start of assert_snapshot or alternatively cleanup all .snap.new.* somewhere.

The review TUI looks like this:

The open action opens both files (or just one if there's no existing snapshot, or the existing one is text) in an external application using the open crate.

The file paths are using the OSC-8 escape sequence (described here) to link to the files. Here it says that file:/// without a hostname should be avoided, but at least my terminal (WezTerm) seems to support it without a hostname and so I left that out for now since it would mean an extra dependency on a crate to get the hostname. As an alternative we could also just print the path including file:// to the terminal since many terminals seem to support clicking on those as well.

This fixes the trailing_crlf_inline test.

mitsuhiko · 2024-05-15T06:54:22Z

Creating it as .snap.new.$extension.tmp and then moving it when saving the metadata might be a good alternative approach.

I like this in theory. What might make this whole thing more robust would be to say that the following has to be true:

foo.snap.ANY_EXT has to be unique. On review/write if more than one extension is found, other files are deleted. A not yet fully persisted file is temporarily stored as foo.snap._ to mark it as temporary. If it's left over after a run, it would be cleaned up like any other duplicate extension.

The open action opens both files (or just one if there's no existing snapshot, or the existing one is text) in an external application using the open crate

It would be interesting to also add a p for preview on macos which would open quick-look on supported files. Not sure if an equivalent exists on windows. But this could be done in a followup.

lasernoises · 2024-06-05T16:43:17Z

I've made some adjustments:

As you suggested I've changed it so the contents are first written to a temporary file with the .snap._ extension.

At the start of the assert_snapshot function and after accepting and rejecting in review any extra files are now cleaned up.

I had to change the Snapshot::save* methods to change &mut self so they can change the path to the newly saved path. Which makes it a little bit weird because it's not necessarily clear from the name that those functions will also move a file now.

This problem could also be solved in different ways:
Instead of moving the file we could copy it and then always regenerate the binary file path when needed. But that would probably mean an extra parameter so it knows where to copy the contents file from.
Or maybe save_new could make the assumption that there's a temporary file in .snap._.

Also the .new and ._ extensions and extensions starting with .new.* are now forbidden so there can't be conflicts.

Maybe I can add the quick-look thing for macos soon, but I'll have to borrow a mac from a colleague to test it. Also I'm not sure if opening both files in quick-look makes sense. Maybe two different shortcuts makes more sense.

insta/src/runtime.rs

max-sixty · 2024-06-05T18:27:57Z

insta/src/snapshot.rs

+                let mut this = File::open(this).unwrap();
+                let mut other = File::open(other).unwrap();
+
+                file_eq(&mut this, &mut other).unwrap()


Historically we've had issues when running multiple tests at the same time; files can disappear under insta. Can we handle that a bit better here rather than blindly panicking? For example, at least with .expect with a message so we can debug?

I've changed it to return Result's with Box<dyn Error> to be more consistent with the rest of the code. It's not quite optimal for debugging because the origin of the error is missing, but I thought it would be better to be consistent.

Variants in a new error enum might be an alternative. Those could then also be returned as dyn Error's. Or just using expect as you suggested. I'm not quite sure what is best here.

Yes agree there are tradeoffs, and just ignoring the errors might not be ideal.

A more detailed approach would be to think through when errors are really errors. For example, if a pending snapshot goes missing, that's fine — another process might have removed it. In contrast, if a target snapshot isn't available that's probably unexpected.

But so far ignoring errors with reading files seems to have been OK in practice; possibly we could print a message to protect against confusing results (though maybe others will disagree, ofc welcome).

Also fixes missing call to memoize_snapshot_file in the binary case.

… the crate

insta/src/snapshot.rs

max-sixty · 2024-06-07T18:26:13Z

While we wait for a more informed review, one thought (not confident): I notice we add new enums — would there be any case for collapsing "Inline | Named" enums and the new "String | Binary" enums into a single enum? Every Binary snapshot must be Named. But I'm not sure if that makes the structure simpler...

Edit: one low-hanging fruit as an example — the proposed code has SnapshotType with String | Binary. We currently don't encode whether it's an Inline snapshot in that struct, we just check whether it has a name — we could instead add Inline as an option there.

insta/src/output.rs

insta/src/runtime.rs

lasernoises · 2024-06-14T15:01:42Z

While we wait for a more informed review, one thought (not confident): I notice we add new enums — would there be any case for collapsing "Inline | Named" enums and the new "String | Binary" enums into a single enum? Every Binary snapshot must be Named. But I'm not sure if that makes the structure simpler...

Edit: one low-hanging fruit as an example — the proposed code has SnapshotType with String | Binary. We currently don't encode whether it's an Inline snapshot in that struct, we just check whether it has a name — we could instead add Inline as an option there.

I suppose the refval thing is one of those cases. I could change the SnapshotContent enum to something like this:

type SnapshotName<'a> = Option<Cow<'a, str>>;

pub enum SnapshotValue<'a> {
    String {
        name: SnapshotName<'a>,
        content: &'a str,
    },

    InlineString {
        reference_content: &'a str,
        content: &'a str,
    },

    Binary {
        name: SnapshotName<'a>,
        write: &'a mut dyn FnMut(&mut File),
        extension: &'a str,
    },
}

or maybe

type SnapshotName<'a> = Option<Cow<'a, str>>;

pub enum ReferenceValue<'a> {
    Named(SnapshotName<'a>),
    Inline(&'a str),
}

pub enum SnapshotValue<'a> {
    String {
        refval: ReferenceValue<'a>,
        content: &'a str,
    },

    Binary {
        name: SnapshotName<'a>,
        write: &'a mut dyn FnMut(&mut File),
        extension: &'a str,
    },
}

I'm a bit less sure about SnapshotType. I don't think we need to represent the inline case there because the metadata isn't being stored in that case if I understand correctly.

max-sixty · 2024-06-14T20:54:13Z

Yes those both seem reasonable, though on reflection not obvious changes either.

(I note we're handling the contents differently between named string and named binary snapshots — string snapshots we hold the value in memory while binary snapshots we hold only the path, which leads to some of the split. Maybe that's because we expect the binary snapshots to be larger?)

I'm a bit less sure about SnapshotType. I don't think we need to represent the inline case there because the metadata isn't being stored in that case if I understand correctly.

(edit) — I tried creating a SnapshotType on the existing master; it doesn't pull its weight since the Snapshot is largely used similarly across types.

lasernoises · 2024-06-17T09:47:19Z

(I note we're handling the contents differently between named string and named binary snapshots — string snapshots we hold the value in memory while binary snapshots we hold only the path, which leads to some of the split. Maybe that's because we expect the binary snapshots to be larger?)

Yes, the reason for storing them directly in files is the size. I was concerned about large files, especially when multiple tests are running concurrently. But this is definitely a tradeoff since passing around a Vec<u8> would make the code simpler and more uniform. I also think and it's unclear whether creating huge binary snapshots is a realistic usecase especially if you want to store them in git.

max-sixty · 2024-06-17T16:57:31Z

But this is definitely a tradeoff since passing around a Vec<u8> would make the code simpler and more uniform. I also think and it's unclear whether creating huge binary snapshots is a realistic usecase especially if you want to store them in git.

Yeah. And IIUC we are storing them in memory when writing them at the moment. (+ agree given they're in git, the likelihood of multi-GB values seems very remote...)

max-sixty · 2024-07-30T18:37:37Z

@lasernoises I guess we're still waiting on a review.

Not a verdict, but on further reflection, I do think simplifying the code to load snapshots into memory would make sense. The prospect of multi-GB sized snapshots seems quite unlikely — they're stored in git, the computational cost of diffing snapshots of that size will likely bind before the memory size becomes an issue. Even before your changes, the code handling different types of snapshots is a bit complicated.

(But ofc no need to make changes based on my opinion)

lasernoises · 2024-08-06T10:24:56Z

Sorry for replying so late.

Yeah. And IIUC we are storing them in memory when writing them at the moment. (+ agree given they're in git, the likelihood of multi-GB values seems very remote...)

Because it's a callback that get's an &mut File it doesn't have to ever be in memory as a whole thing. For example in my usecase, which is for PDFs the PDF as an array of bytes is never fully in memory, only an intermediate datastructure that can then be written out as bytes.

@lasernoises I guess we're still waiting on a review.

Not a verdict, but on further reflection, I do think simplifying the code to load snapshots into memory would make sense. The prospect of multi-GB sized snapshots seems quite unlikely — they're stored in git, the computational cost of diffing snapshots of that size will likely bind before the memory size becomes an issue. Even before your changes, the code handling different types of snapshots is a bit complicated.

(But ofc no need to make changes based on my opinion)

I think you're right and that for real usecases you'll want to minimize size anyways. In my PDF case I avoid embedding fonts for example and use the builtin fonts instead.

I think I'll try to take some time next week to try it with the binaries in memory.

max-sixty · 2024-09-16T17:15:20Z

Thanks for resolving the merge conflicts! That must have been a decent amount of work...

One note — I've recently spent lots of time simplifying the internals, and I'd be very hesitant to merge the approach that uses files rather than in-memory values — I think that brings another layer of complication without much benefit. (at least — I want to understand whether extremely large snapshots are really being used, such that it's worth maintaining the code for two different approaches)...

lasernoises · 2024-09-18T08:54:04Z

Thanks for resolving the merge conflicts! That must have been a decent amount of work...

One note — I've recently spent lots of time simplifying the internals, and I'd be very hesitant to merge the approach that uses files rather than in-memory values — I think that brings another layer of complication without much benefit. (at least — I want to understand whether extremely large snapshots are really being used, such that it's worth maintaining the code for two different approaches)...

Yes, my plan is still to try it with the binary in memory (it just took me way longer to get to it than I thought..). I just decided to still try basing it on the previous work on this branch because after looking over the changes again it seems that a bunch of them will still be needed.

Although, maybe for a clean history and preventing unnecessary changes I now think I'll try to start clean. (Also because there's already more conflicts...)

max-sixty · 2024-09-18T17:16:17Z

OK great! Hopefully it could use at least some of the existing code...

One note is that a couple of outstanding PRs change the SnapshotContents struct slightly, adding a kind field: https://github.com/mitsuhiko/insta/pull/581/files#diff-2666420e8389f1d842ce108018b05f3f68fa4b920f8c1793a3d4d01f21443544R549-R553.

We could adjust that to make SnapshotContents an Enum itself if the behavior becomes more different per kind, like you've done here, or delegate more behavior to that kind.

I'd like to merge that ASAP, though waiting on feedback from @mitsuhiko

This is the somewhat simplified version of mitsuhiko#489 where we keep the full files in-memory during the assert and when reviewing.

lasernoises · 2024-09-18T20:42:11Z

I ended up getting relatively far today. I currently have the new version at: https://github.com/lasernoises/insta/tree/in-memory-binary-snapshots. There's still some smaller stuff I need to take care of and then I'll probably have some questions about some of the details. (I'll also need to take a more detailed look at #581 and see how compatible that one is with my changes.)

But first: should I create a new PR and close this one or reset this branch to the other one?

max-sixty · 2024-09-18T20:49:51Z

But first: should I create a new PR and close this one or reset this branch to the other one?

Totally as you wish!

lasernoises added 7 commits April 26, 2024 18:30

Add support for binary snapshots

6e66262

Move file_eq function to utils

0f9a08d

Fix snapshot comparison whitespace handling

0f08023

This fixes the trailing_crlf_inline test.

Add basic test for binary snapshots

f193ca8

Add more tests for file extension of binary snapshots

5f8de35

Fix file extension change behavior for binary snapshots

6b9884a

Move &mut for binary snapshot closure into macro

51cb04b

lasernoises added 7 commits May 27, 2024 18:26

Merge branch 'master' into binary-snapshots

13fea4a

Use try_removing_snapshot closure for binary deletion

c4aad15

Switch to a temporary .snap._ for binary snapshots before saving

6ec72dc

Fix handling of binary snapshot paths

831427a

Add cleanup for binary snapshots

fdb9eca

Change file content to ascii for binary snapshot tests

4865e6d

Forbid binary snapshot extensions that could cause conflicts

36b1cc0

max-sixty reviewed Jun 5, 2024

View reviewed changes

insta/src/runtime.rs Outdated Show resolved Hide resolved

max-sixty reviewed Jun 5, 2024

View reviewed changes

lasernoises added 2 commits June 6, 2024 18:13

Reduce binary snapshot code duplication

87306da

Also fixes missing call to memoize_snapshot_file in the binary case.

Make binary snapshot error handling more constistent with the rest of…

ebbd335

… the crate

max-sixty reviewed Jun 7, 2024

View reviewed changes

insta/src/snapshot.rs Show resolved Hide resolved

max-sixty reviewed Jun 7, 2024

View reviewed changes

insta/src/output.rs Show resolved Hide resolved

max-sixty reviewed Jun 7, 2024

View reviewed changes

insta/src/runtime.rs Show resolved Hide resolved

Add doc comment for encode_file_link_escape

edfabf3

max-sixty mentioned this pull request Jun 17, 2024

Align inine snapshot handling with file snapshots #510

Open

lasernoises added 2 commits September 16, 2024 15:10

Merge remote-tracking branch 'upstream/master' into binary-snapshots

86a56d7

Fix some issues after merge

a27eed1

lasernoises added a commit to lasernoises/insta that referenced this pull request Sep 18, 2024

Add binary snapshots

e0612ae

This is the somewhat simplified version of mitsuhiko#489 where we keep the full files in-memory during the assert and when reviewing.

max-sixty mentioned this pull request Sep 19, 2024

Adjust force-update on inline snapshots to only view within the string #581

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary snapshots #489

Binary snapshots #489

lasernoises commented May 13, 2024

mitsuhiko commented May 15, 2024

lasernoises commented Jun 5, 2024

max-sixty Jun 5, 2024

lasernoises Jun 7, 2024

max-sixty Jun 7, 2024

max-sixty commented Jun 7, 2024 •

edited

Loading

lasernoises commented Jun 14, 2024

max-sixty commented Jun 14, 2024 •

edited

Loading

lasernoises commented Jun 17, 2024

max-sixty commented Jun 17, 2024

max-sixty commented Jul 30, 2024

lasernoises commented Aug 6, 2024

max-sixty commented Sep 16, 2024

lasernoises commented Sep 18, 2024

max-sixty commented Sep 18, 2024

lasernoises commented Sep 18, 2024

max-sixty commented Sep 18, 2024

Binary snapshots #489

Are you sure you want to change the base?

Binary snapshots #489

Conversation

lasernoises commented May 13, 2024

mitsuhiko commented May 15, 2024

lasernoises commented Jun 5, 2024

max-sixty Jun 5, 2024

Choose a reason for hiding this comment

lasernoises Jun 7, 2024

Choose a reason for hiding this comment

max-sixty Jun 7, 2024

Choose a reason for hiding this comment

max-sixty commented Jun 7, 2024 • edited Loading

lasernoises commented Jun 14, 2024

max-sixty commented Jun 14, 2024 • edited Loading

lasernoises commented Jun 17, 2024

max-sixty commented Jun 17, 2024

max-sixty commented Jul 30, 2024

lasernoises commented Aug 6, 2024

max-sixty commented Sep 16, 2024

lasernoises commented Sep 18, 2024

max-sixty commented Sep 18, 2024

lasernoises commented Sep 18, 2024

max-sixty commented Sep 18, 2024

max-sixty commented Jun 7, 2024 •

edited

Loading

max-sixty commented Jun 14, 2024 •

edited

Loading