index.html



<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  

  <!--
  <script src="./resources/jsapi" type="text/javascript"></script>
  <script type="text/javascript" async>google.load("jquery", "1.3.2");</script>
 -->

<style type="text/css">
  @font-face {
   font-family: 'Avenir Book';
   src: url("./fonts/Avenir_Book.ttf"); /* File to be stored at your site */
   }

  body {
    font-family: "Avenir Book", "HelveticaNeue-Light", "Helvetica Neue Light", "Helvetica Neue", Helvetica, Arial, "Lucida Grande", sans-serif;
    font-weight:300;
    font-size:14px;
    margin-left: auto;
    margin-right: auto;
    width: 800px;
  }
  h1 {
    font-weight:300;
  }
  h2 {
    font-weight:300;
  }

  p {
    font-weight:300;
    line-height: 1.4;
  }

  code {
    font-size: 0.8rem;
    margin: 0 0.2rem;
    padding: 0.5rem 0.8rem;
    white-space: nowrap;
    background: #efefef;
    border: 1px solid #d3d3d3;
    color: #000000;
    border-radius: 3px;
  }

  pre > code {
    display: block;
    white-space: pre;
    line-height: 1.5;
    padding: 0;
    margin: 0;
  }

  pre.prettyprint > code {
    border: none;
  }


  .container {
        display: flex;
        align-items: center;
        justify-content: center
  }
  .image {
        flex-basis: 40%
  }
  .text {
        padding-left: 20px;
        padding-right: 20px;
  }

  .disclaimerbox {
    background-color: #eee;
    border: 1px solid #eeeeee;
    border-radius: 10px ;
    -moz-border-radius: 10px ;
    -webkit-border-radius: 10px ;
    padding: 20px;
  }

  video.header-vid {
    height: 140px;
    border: 1px solid black;
    border-radius: 10px ;
    -moz-border-radius: 10px ;
    -webkit-border-radius: 10px ;
  }

  img.header-img {
    height: 140px;
    border: 1px solid black;
    border-radius: 10px ;
    -moz-border-radius: 10px ;
    -webkit-border-radius: 10px ;
  }

  img.rounded {
    border: 0px solid #eeeeee;
    border-radius: 10px ;
    -moz-border-radius: 10px ;
    -webkit-border-radius: 10px ;

  }

  a:link,a:visited
  {
    color: #1367a7;
    text-decoration: none;
  }
  a:hover {
    color: #208799;
  }

  td.dl-link {
    height: 160px;
    text-align: center;
    font-size: 22px;
  }

  .layered-paper-big { /* modified from: http://css-tricks.com/snippets/css/layered-paper/ */
    box-shadow:
            0px 0px 1px 1px rgba(0,0,0,0.35), /* The top layer shadow */
            5px 5px 0 0px #fff, /* The second layer */
            5px 5px 1px 1px rgba(0,0,0,0.35), /* The second layer shadow */
            10px 10px 0 0px #fff, /* The third layer */
            10px 10px 1px 1px rgba(0,0,0,0.35), /* The third layer shadow */
            15px 15px 0 0px #fff, /* The fourth layer */
            15px 15px 1px 1px rgba(0,0,0,0.35), /* The fourth layer shadow */
            20px 20px 0 0px #fff, /* The fifth layer */
            20px 20px 1px 1px rgba(0,0,0,0.35), /* The fifth layer shadow */
            25px 25px 0 0px #fff, /* The fifth layer */
            25px 25px 1px 1px rgba(0,0,0,0.35); /* The fifth layer shadow */
    margin-left: 10px;
    margin-right: 45px;
  }


  .layered-paper { /* modified from: http://css-tricks.com/snippets/css/layered-paper/ */
    box-shadow:
            0px 0px 1px 1px rgba(0,0,0,0.35), /* The top layer shadow */
            5px 5px 0 0px #fff, /* The second layer */
            5px 5px 1px 1px rgba(0,0,0,0.35), /* The second layer shadow */
            10px 10px 0 0px #fff, /* The third layer */
            10px 10px 1px 1px rgba(0,0,0,0.35); /* The third layer shadow */
    margin-top: 5px;
    margin-left: 10px;
    margin-right: 30px;
    margin-bottom: 5px;
  }

  .vert-cent {
    position: relative;
      top: 50%;
      transform: translateY(-50%);
  }

  hr
  {
    border: 0;
    height: 1px;
    background-image: linear-gradient(to right, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.75), rgba(0, 0, 0, 0));
  }
</style>

	<title>In-Context Matting</title>
</head>

<body data-new-gr-c-s-check-loaded="14.1093.0" data-gr-ext-installed="">
	<br>
	<center>
	<span style="font-size:36px">In-Context Matting</span><br><br><br>
	</center>
	<table align="center" width="800px">
            <tbody><tr>
                    <td align="center" width="160px">
              <center>
                <span style="font-size:16px">He Guo</a><sup>1</sup></span>
                </center>
                </td>
                    <td align="center" width="160px">
              <center>
                <span style="font-size:16px">Zixuan Ye<sup>1</sup></span>
                </center>
              </td>
                    <td align="center" width="160px">
             <center>
               <span style="font-size:16px">Zhiguo Cao<sup>1</sup></span>
               </center>
             </td>
                  <td align="center" width="160px">
              <center>
                <span style="font-size:16px">Hao Lu<sup>1</sup></span>
                </center>
              </td>
                   </tr>
       </tbody></table><br>
    
    
	  <table align="center" width="700px">
            <tbody><tr>
                    <td align="center" width="50px">
              <center>
                    <span style="font-size:16px"></span>
                </center>
                </td>
                    <td align="center" width="300px">
              <center>
                    <span style="font-size:16px"><sup>1</sup>Huazhong University of Science and Technology</span>
                </center>
                </td>
                    
        </tr></tbody></table>
	
	<table align="center" width="700px">
            <tbody><tr>
              <td align="center" width="200px">
                <center>
                  <br>
                  <span style="font-size:20px">Code
                    <a href="https://github.com/tiny-smart/in-context-matting"> [GitHub]</a>
                  </span>
                </center>
              </td>

              <td align="center" width="200px">
                <center>
                  <br>
                  <span style="font-size:20px">
                    Paper <a href="https://arxiv.org/pdf/2403.15789.pdf"> [arXiv]</a>
                  </span>
                </center>
              </td>

              <td align="center" width="200px">
                <center>
                  <br>
                  <span style="font-size:20px">
                    Cite <a href="./resources/noen"> [BibTeX]</a>
                  </span>
                </center>
              </td>
            </tr></tbody>
      </table>
	
      <br><hr>
      <br>         
      
      <center>
          <img src="./resources/fig1.png" alt="alt text" style="width: 100%; object-fit: cover; max-width:100%;"></a>
        </center>
        <p style="text-align:justify; text-justify:inter-ideograph;"><left>
            In-context matting enables automatic natural image matting of target images of a certain object category conditioned on a reference image of the same category, with user-provided priors such as masks and scribbles on the reference image only. Notice that, our approach exhibits remarkable cross-domain matting quality.
      </left></p>
        <br> 

      <center><h2> Abstract </h2> </center>
      <p style="text-align:justify; text-justify:inter-ideograph;">
      </p><div class="container">
        <div class="text" width="400px"> 
          <p style="text-align:justify; text-justify:inter-ideograph;">
            <left>
                We introduce in-context matting, a novel task setting of image matting. Given a reference image of a certain foreground and guided priors such as points, scribbles, and masks, in-context matting enables automatic alpha estimation on a batch of target images of the same foreground category, without additional auxiliary input. 
This setting marries good performance in auxiliary input-based matting and ease of use in automatic matting, which finds a good trade-off between customization and automation. 
To overcome the key challenge of accurate foreground matching, we introduce DiffusionMatte, an in-context matting model built upon a pre-trained text-to-image diffusion model. Conditioned on inter- and intra-similarity matching, DiffusionMatte can make full use of reference context to generate accurate target alpha mattes. To benchmark the task, we also introduce a novel testing dataset ICM-57, covering 57 groups of real-world images.
Quantitative and qualitative results on the ICM-57 testing set show that DiffusionMatte rivals the accuracy of trimap-based matting while retaining the automation level akin to automatic matting.
          </left></p>
        </div>
      </div>
      <br>
      
      <center><h2>Comparison between in-context matting and the existing image matting paradigms</h2></center>
      <center>
          <img src="./resources/paradigm_compare.png" alt="alt text" style="width: 65%; object-fit: cover; max-width:65%;"></a>
        </center>
             
        <p style="text-align:justify; text-justify:inter-ideograph;"><left>
           "Aux" and "Auto" are abbreviations for automatic matting and auxiliary input-based matting, respectively. In-context matting uniquely requires only a single reference input to achieve the automation of automatic matting and the generalizability of auxiliary input-based matting.
      </left></p>
<!--      <div class="container">-->
<!--        <div class="image" width="300px">-->
<!--          <center><p><img class="center" src="./resources/1691760445080.jpg" width="300px"></p></center>-->
<!--        </div>-->
<!--        <div class="text" width="250px">-->
<!--          <p> A ‘good’ mask annotation satisfy two conditions:-->
<!--          1) class-discriminative. 2) high-resolution, precise mask.-->
<!--          -->
<!--          The average map shows the possibility for us to use for semantic segmentation,-->
<!--          where it is class-discriminative and fine-grained.-->
<!--          </p>-->
<!--        </div>-->
<!--      </div>-->
      
<!--      <p><img class="center" src="./resources/fig7.png" width="800px"></p>-->
          
          
      <hr>
      
      <br>
      <center> <h2> How to do it (pipeline) </h2> </center>
      <p><img class="left" src="./resources/pipeline.png" width="800px"></p>
      <p style="text-align:justify; text-justify:inter-ideograph;"><left>
          DiffusionMatte integrates a Stable Diffusion-derived feature extractor, an in-context similarity module, and a matting head. It processes a target image, a reference image, and an RoI map. Features of both reference and target images, and self-attention maps of the target image, are extracted and used. The in-context similarity utilizes the in-context query from the reference image to create a guidance map, which, combined with self-attention maps, assists in locating the target object. The matting head then generates the alpha matte for the target object.
         </left></p>
      
      <br>
      <center> <h2> In-Context Similarity </h2> </center>
      <p><img class="left" src="./resources/similarity.png" width="800px"></p>
      <p style="text-align:justify; text-justify:inter-ideograph;"><left>
          Illustration of the inter- and intra-similarity modules. For simplicity, the resize operation is omitted, only the calculation of one element of the in-context query is depicted, and the fusion process of self-attention maps from a single scale is shown. The inter-similarity computes the similarity between features extracted from the target image and the in-context query derived from the reference image, generating an average similarity map. The intra-similarity combines the self-attention maps representing intra-image similarities within the target image with the similarity map obtained from the inter-similarity module.
         </left></p>
      
      <br>
      <center> <h2> Test dataset: ICM-57 </h2> </center>
	<center>
      <p><img class="left" src="./resources/dataset.png" width="650px"></p></center>
      <p style="text-align:justify; text-justify:inter-ideograph;"><left>
          To assess the performance of our model, we constructed the first testing dataset for in-context matting, named ICM-57, which comprises 57 image groups that form various real-world  contexts. Our test set ICM-$57$ encompasses foregrounds of the same category and same instance, fulfilling the essence of in-context matting.
         </left></p>
      
      <br>
      <hr>
      <center><h2>Experiment-1: Comparison with in-context segmentation models</h2></center>
   
      <center>
      <p><img class="center" src="./resources/seg.png" width="550px"></p>

      
       <center><h2>Experiment-2: Comparison with automatic and auxiliary input-based matting models</h2></center>
      <div class="container">
        <div class="image" width="850px">
          <center><p><img class="center" src="./resources/matting.png" width="800px"></p></center>
        </div>
        </div>
      </div>
      <center><h2>Experiment-3: Comparison with interactive matting models</h2></center>
      <div class="container">
        <div class="image" width="400px">
          <center><p><img class="center" src="./resources/interactive.png" width="500px"></p></center>
          <p style="text-align:justify; text-justify:inter-ideograph;"><left>
          In the
penultimate row, our method is provided with guidance information for every image, reducing to an auxiliary input-based method.
Our method outperforms automatic methods and some of the auxiliary input-based methods, and its performance is comparable to
that of the trimap-based method, VitMatte.
         </left></p>
        </div>
      </div>

      
      <center>
          <img src="./resources/quality.png" alt="alt text" style="width: 100%; object-fit: cover; max-width:100%;"></a>
        </center>
        <p style="text-align:justify; text-justify:inter-ideograph;"><left>
           Qualitative results of different image matting methods.
        </left></p>
        <br>
        
        <center>
            <img src="./resources/more_quality.png" alt="alt text" style="width: 100%; object-fit: cover; max-width:100%;"></a>
          </center>
          <p style="text-align:justify; text-justify:inter-ideograph;"><left>
              Qualitative results of DiffusionMatte. The first column shows the reference input, while the remaining columns display target
images and their predicted alpha mattes. Given a single reference input, our method can automatically process the same instance or category
of foreground.
          </left></p>
          <br>
          
          <center>
              <img src="./resources/video.png" alt="alt text" style="width: 65%; object-fit: cover; max-width:65%;"></a>
            </center>
            <p style="text-align:justify; text-justify:inter-ideograph;"><left>
               The technique of in-context matting is easily extendable to
video object matting. The key is to use a frame of the video
as a reference.
            </left></p>
            <br>
            
      <br>
      <hr>
      <center> <h2> Acknowledgements </h2> </center>
      <p> 
	      Based on a template by <a href="https://lipurple.github.io/">
              Ziyi Li</a> and <a href="http://richzhang.github.io/">Richard Zhang</a>.
      </p>
      <br>
<br>


</body><grammarly-desktop-integration data-grammarly-shadow-root="true"></grammarly-desktop-integration></html>