Skip to content

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

License

Notifications You must be signed in to change notification settings

JamesDarby345/segment-anything-vesuvius

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Segment Anything? Segment A Vesuvius Scroll!

A data processing pipeline forked from meta's Segment Anything model to reduce the Vesuvius Scroll .tif file sizes via segmenting the images and masking the superfluous data with black pixels allowing for lossless compression (LZW) reducing .tif file size by ~70% (120MB -> ~40MB) while preserving all the useful data.

TLDR Overview

Since the output of this script is smaller .tif image files with no information loss, the optimal path forward is that I work with the Vesuvius Scroll Challenge organizers to put process all the date and upload it onto the vesuvius download server for distribution in the future. Thus for the average participant this repo has no use. But I have documented the results of the pipeline, along with the experiments I ran to support my claim. I also wrote a high level overview of what the pipeline is doing, and a section on the limitations of the technique as of now.

Update

The .tif files for the publicly available scroll 1 and scroll 2 slices were produced over the course of a week and uploaded to the vesuvius scroll server for general use, but a major limiting factor was also uncovered. Since the masked .tif files use lzw compression to reduce the file size without loss of information, they are incompatible with memory mapping (mmap) which speeds up I/O operations by assigning parts of the SSD or HDD as 'virtual memory' and thus requires the images to be uncompressed on disk.This makes the compressed masked data worse than the uncompressed masked, or original data for use cases such as Volume Cartographer that use mmap. You can get the compressed smaller file size and mmap compatability via downloading the masked files, uncompressing them, and then using filesystem compression, but due to the amount of time involved to do such a process and the fact file system compression is OS dependant makes it an unwieldly solution compated to just downloading the raw data unmasked, or uncompressing the downloaded masked data.

The masked files are still useful if your use case does not use mmap, if you plan to leverage the masking in some unique way, or if your download bandwidth is slow, or worse yet metered or otherwise restricted. The masked .tif files do still increase the accessiability of the data to participants who may not have access to fast & unlimited download speeds or are heavily restricted on disk space.

Messages related to the masked data I have received:
Screenshot 2023-06-14 at 3 21 33 PM (user limited by download speed) Screenshot 2023-06-14 at 3 37 23 PM (user not using mmap)

Decompression for mmap compatability

If you have downloaded the masked data & do want to use it with Volume Cartographer or any other use case using memory mapping (or to prepare for filesystem compression), you can use tiffCompression.py in this repo to accomplish that task. Run the file with 'python tiffDecompression.py <input_folder> <output_folder> [delete_original]' specifying where the current files are in relation to tiffDecompression.py, or an absolute path, and specify a path to an output folder (if it doesnt exist it will be created). Next specify True for delete_original if you want the code to delete the original files as it goes, this can help with not running out of disk space. Note that uncompressed files for scroll 1 will be 122MB each and 232MB for scroll 2.

Setup

This repo isnt particularly friendly if you want to setup the pipeline yourself as youll need all of dependencies for https://github.com/facebookresearch/segment-anything, along with the folder structure and file names I used to organize the Vesuvius Scroll .tif files. It would be possible to setup with a bit of fiddling around with file path variables though. I may in the future make this easier, but the goal of reducing the .tif file sizes while maintaing all useful information only requires the data to be processed once. Though this technique may be useful for other, unrelated applications.

Results

I make the claim that this process can reduce the .tif file size by ~70% while maintaining all useful information. The below images show the easy part to demonstrate, the reduced file size of the first file, 00000.tif (121MB) -> c00000.tif (31.4MB), while maintining the exact same image dimensions and file type.
image

image

I have made the assumption that, the surrounding area [green], the case [yellow] and the detached 'wrap' [red] do not contain useful information for reading the scrolls.

image

Using that assumption I segmented the image, and set every pixel not part of the scroll to black (0,0,0).

Original Image image

Result of Image processing Pipeline image *Note these images are not the originals, but smaller files that are under githubs upload threshold. Also note the mask over the scroll has been enlarged by 0.5% to help ensure any pixels containing papyri are not covered, this amount can be enlarged if adversarial examples are produced.

To convince myself that the lossless compression was in fact lossless I carried out a series of experiments to ensure none of the pixels in the scroll had been changed. I did this with the computeImageDiff files. The basic idea is to compare the original image, and the new compressed image by taking the absolute difference of each of their pixels and producing an image with that value as the value of each pixel. This should produce an image where the scroll portionis completly black while the background is the original background as black contributes 0. That verification technique produced the following image:

image

In addition I produced a version that multiplied the difference by 100, so even a difference in value of 1 would be noticeable. This does mean the background becomes noisy, but the uninterupted black shape of the scroll is what it important. This version produced the following image:

image

I also produced a version that changed the color of any difference in pixels to red (computeImageDiffColor,py), and an updated version that added a radius so even a single pixel changed by a single value would become obvious (computeImageDiffColorRadius.py). The radius version produced the following result:

image

To verify the scripts did what I thought they did, I edited 06052.tif by drawing on it with a 1 pixel wide brush to a grey that closely matched the scroll. Here is the 'corrupted' image: image

And the result of the color radius script: image

As you can see it clearly brought forth any changes in the images pixel values. Thus I am convinced that this technique does in fact not change the pixel values of the images where the papri and ink are located, but does reduce the file size by nearly 70%. Please feel free to validate these findings.

Pipeline process

The pipeline carries out a number of steps to produce these results which I will go over on a high level in this section. The main file scrollSegRLESeqRange.py takes a range of images specified by their number and processes them one at a time. The first step is to make a downsized version of the original image, this is so the image (batch) can fit it the systems VRAM in one go during the segmenting process. I found 10x downsizing fit in my 8GB of VRAM on my GPU. This smaller image is used to produce the masks with meta's Segment Anything model. The pipeline then uses the dumb heuristic that the first (largest) mask is the scroll. This has some shortcomings as explained in the next section. Next the downsized mask is then scaled back up to the size of the original image, and reshaped to exactly match the original images dimensions. Next the mask is enlarged by 0.5% to ensure none of the scroll (useful information) is covered by the mask as a result of the scaling process. Note that the original image is not scaled in this process, only the mask is upscaled. Then a new image is created that sets all pixels not covered by the mask to black and uses the pixel values of the original image where the mask does cover it. This image is then saved with LZW compression, and the large area of uniform black pixels allows the compression to drastically reduce the .tif file size. As seen in the results section, the original pixels in the scroll maintain their original value.

Limitations & potential improvements

Due to the simplistic mask selection of choosing the largest mask, some development work or manual intervention would be needed for cases where the scroll does not constitue the largest single section/mask of the image. This started to happen somewhere between file 12600.tif and 13000.tif on the first scroll. But the method does mean it will work as is for the majority of the scroll. This also means all the data should be quickly manually screened via visual inspection to make sure the correct mask was choosen. This can easily be done by viewing the image thumbnails.

Example of model starting to choose incorrect masks:
image

About

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.0%
  • Other 1.0%