Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add image-to-image task w/ Swin2SR (for super-resolution) #381

Merged
merged 21 commits into from
Nov 9, 2023
Merged

Conversation

xenova
Copy link
Collaborator

@xenova xenova commented Nov 8, 2023

This PR adds support for image-to-image translation, starting with the Swin2SR family of models for super-resolution. See here for the list of already-converted models, including 2x and 4x upscalers.

Closes #138

Example usage

Pipeline API

Example code adapted from here.

import { pipeline } from '@xenova/transformers';

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/butterfly.jpg';
let upscaler = await pipeline('image-to-image', 'Xenova/swin2SR-classical-sr-x2-64');
let output = await upscaler(url);
// RawImage {
//   data: Uint8Array(786432) [ 41, 31, 24,  43, ... ],
//   width: 512,
//   height: 512,
//   channels: 3
// }

AutoClasses

Example code adapted from here.

import { AutoProcessor, Swin2SRForImageSuperResolution, RawImage } from '@xenova/transformers';

// Load processor and model
const model_id = 'Xenova/swin2SR-classical-sr-x2-64';
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await Swin2SRForImageSuperResolution.from_pretrained(model_id);

// Prepare model inputs
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/butterfly.jpg';
const image = await RawImage.fromURL(url);
const inputs = await processor(image);

// Run model
const outputs = await model(inputs);

// Convert Tensor to RawImage
const output = outputs.reconstruction.squeeze().clamp_(0, 1).mul_(255).round_().to('uint8');
const outputImage = RawImage.fromTensor(output);
// RawImage {
//   data: Uint8Array(786432) [ 41, 31, 24, ...],
//   width: 512,
//   height: 512,
//   channels: 3
// }

Example output

  • input (256x256):
    butterfly

  • output w/ unquantized model (512x512):
    output
    note: produces the exact same output as the python implementation (within floating-point precision errors of course).

  • output w/ quantized model (512x512):
    output2

  • side-by-side (input vs. unquantized output):
    image

@xenova
Copy link
Collaborator Author

xenova commented Nov 8, 2023

cc @josephrocca :)

I also intend to replicate/showcase the results from their README.

@xenova
Copy link
Collaborator Author

xenova commented Nov 8, 2023

Example using https://huggingface.co/Xenova/swin2SR-compressed-sr-x4-48:

import { pipeline } from '@xenova/transformers';

let url = 'https://huggingface.co/spaces/jjourney1125/swin2sr/resolve/main/testsets/real-inputs/shanghai.jpg';
let upscaler = await pipeline('image-to-image', 'Xenova/swin2SR-compressed-sr-x4-48');
let output = await upscaler(url);

Input:
shanghai

Output:
1_unquantized

@josephrocca
Copy link
Contributor

Awesome!! Seems to take quite a while to load the model - about 40 seconds, not including the download. I'm guessing it's a similar problem to this: microsoft/onnxruntime#11217 since Netron also complains that there are lots of nodes, and takes a very long time to load.

The actual inference is about 40 seconds on 8 threads - not bad! WebGPU will get this to a very usable inference time. Exciting!

@xenova xenova merged commit 73a99ba into main Nov 9, 2023
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature request] Image-to-image (super-resolution)
2 participants