-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
395 lines (336 loc) · 15.2 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<!--
<script src="./resources/jsapi" type="text/javascript"></script>
<script type="text/javascript" async>google.load("jquery", "1.3.2");</script>
-->
<style type="text/css">
@font-face {
font-family: 'Avenir Book';
src: url("./fonts/Avenir_Book.ttf"); /* File to be stored at your site */
}
body {
font-family: "Avenir Book", "HelveticaNeue-Light", "Helvetica Neue Light", "Helvetica Neue", Helvetica, Arial, "Lucida Grande", sans-serif;
font-weight:300;
font-size:14px;
margin-left: auto;
margin-right: auto;
width: 800px;
}
h1 {
font-weight:300;
}
h2 {
font-weight:300;
}
p {
font-weight:300;
line-height: 1.4;
}
code {
font-size: 0.8rem;
margin: 0 0.2rem;
padding: 0.5rem 0.8rem;
white-space: nowrap;
background: #efefef;
border: 1px solid #d3d3d3;
color: #000000;
border-radius: 3px;
}
pre > code {
display: block;
white-space: pre;
line-height: 1.5;
padding: 0;
margin: 0;
}
pre.prettyprint > code {
border: none;
}
.container {
display: flex;
align-items: center;
justify-content: center
}
.image {
flex-basis: 40%
}
.text {
padding-left: 20px;
padding-right: 20px;
}
.disclaimerbox {
background-color: #eee;
border: 1px solid #eeeeee;
border-radius: 10px ;
-moz-border-radius: 10px ;
-webkit-border-radius: 10px ;
padding: 20px;
}
video.header-vid {
height: 140px;
border: 1px solid black;
border-radius: 10px ;
-moz-border-radius: 10px ;
-webkit-border-radius: 10px ;
}
img.header-img {
height: 140px;
border: 1px solid black;
border-radius: 10px ;
-moz-border-radius: 10px ;
-webkit-border-radius: 10px ;
}
img.rounded {
border: 0px solid #eeeeee;
border-radius: 10px ;
-moz-border-radius: 10px ;
-webkit-border-radius: 10px ;
}
a:link,a:visited
{
color: #1367a7;
text-decoration: none;
}
a:hover {
color: #208799;
}
td.dl-link {
height: 160px;
text-align: center;
font-size: 22px;
}
.layered-paper-big { /* modified from: http://css-tricks.com/snippets/css/layered-paper/ */
box-shadow:
0px 0px 1px 1px rgba(0,0,0,0.35), /* The top layer shadow */
5px 5px 0 0px #fff, /* The second layer */
5px 5px 1px 1px rgba(0,0,0,0.35), /* The second layer shadow */
10px 10px 0 0px #fff, /* The third layer */
10px 10px 1px 1px rgba(0,0,0,0.35), /* The third layer shadow */
15px 15px 0 0px #fff, /* The fourth layer */
15px 15px 1px 1px rgba(0,0,0,0.35), /* The fourth layer shadow */
20px 20px 0 0px #fff, /* The fifth layer */
20px 20px 1px 1px rgba(0,0,0,0.35), /* The fifth layer shadow */
25px 25px 0 0px #fff, /* The fifth layer */
25px 25px 1px 1px rgba(0,0,0,0.35); /* The fifth layer shadow */
margin-left: 10px;
margin-right: 45px;
}
.layered-paper { /* modified from: http://css-tricks.com/snippets/css/layered-paper/ */
box-shadow:
0px 0px 1px 1px rgba(0,0,0,0.35), /* The top layer shadow */
5px 5px 0 0px #fff, /* The second layer */
5px 5px 1px 1px rgba(0,0,0,0.35), /* The second layer shadow */
10px 10px 0 0px #fff, /* The third layer */
10px 10px 1px 1px rgba(0,0,0,0.35); /* The third layer shadow */
margin-top: 5px;
margin-left: 10px;
margin-right: 30px;
margin-bottom: 5px;
}
.vert-cent {
position: relative;
top: 50%;
transform: translateY(-50%);
}
hr
{
border: 0;
height: 1px;
background-image: linear-gradient(to right, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.75), rgba(0, 0, 0, 0));
}
</style>
<title>In-Context Matting</title>
</head>
<body data-new-gr-c-s-check-loaded="14.1093.0" data-gr-ext-installed="">
<br>
<center>
<span style="font-size:36px">In-Context Matting</span><br><br><br>
</center>
<table align="center" width="800px">
<tbody><tr>
<td align="center" width="160px">
<center>
<span style="font-size:16px">He Guo</a><sup>1</sup></span>
</center>
</td>
<td align="center" width="160px">
<center>
<span style="font-size:16px">Zixuan Ye<sup>1</sup></span>
</center>
</td>
<td align="center" width="160px">
<center>
<span style="font-size:16px">Zhiguo Cao<sup>1</sup></span>
</center>
</td>
<td align="center" width="160px">
<center>
<span style="font-size:16px">Hao Lu<sup>1</sup></span>
</center>
</td>
</tr>
</tbody></table><br>
<table align="center" width="700px">
<tbody><tr>
<td align="center" width="50px">
<center>
<span style="font-size:16px"></span>
</center>
</td>
<td align="center" width="300px">
<center>
<span style="font-size:16px"><sup>1</sup>Huazhong University of Science and Technology</span>
</center>
</td>
</tr></tbody></table>
<table align="center" width="700px">
<tbody><tr>
<td align="center" width="200px">
<center>
<br>
<span style="font-size:20px">Code
<a href="https://github.com/tiny-smart/in-context-matting"> [GitHub]</a>
</span>
</center>
</td>
<td align="center" width="200px">
<center>
<br>
<span style="font-size:20px">
Paper <a href="https://arxiv.org/pdf/2403.15789.pdf"> [arXiv]</a>
</span>
</center>
</td>
<td align="center" width="200px">
<center>
<br>
<span style="font-size:20px">
Cite <a href="./resources/noen"> [BibTeX]</a>
</span>
</center>
</td>
</tr></tbody>
</table>
<br><hr>
<br>
<center>
<img src="./resources/fig1.png" alt="alt text" style="width: 100%; object-fit: cover; max-width:100%;"></a>
</center>
<p style="text-align:justify; text-justify:inter-ideograph;"><left>
In-context matting enables automatic natural image matting of target images of a certain object category conditioned on a reference image of the same category, with user-provided priors such as masks and scribbles on the reference image only. Notice that, our approach exhibits remarkable cross-domain matting quality.
</left></p>
<br>
<center><h2> Abstract </h2> </center>
<p style="text-align:justify; text-justify:inter-ideograph;">
</p><div class="container">
<div class="text" width="400px">
<p style="text-align:justify; text-justify:inter-ideograph;">
<left>
We introduce in-context matting, a novel task setting of image matting. Given a reference image of a certain foreground and guided priors such as points, scribbles, and masks, in-context matting enables automatic alpha estimation on a batch of target images of the same foreground category, without additional auxiliary input.
This setting marries good performance in auxiliary input-based matting and ease of use in automatic matting, which finds a good trade-off between customization and automation.
To overcome the key challenge of accurate foreground matching, we introduce DiffusionMatte, an in-context matting model built upon a pre-trained text-to-image diffusion model. Conditioned on inter- and intra-similarity matching, DiffusionMatte can make full use of reference context to generate accurate target alpha mattes. To benchmark the task, we also introduce a novel testing dataset ICM-57, covering 57 groups of real-world images.
Quantitative and qualitative results on the ICM-57 testing set show that DiffusionMatte rivals the accuracy of trimap-based matting while retaining the automation level akin to automatic matting.
</left></p>
</div>
</div>
<br>
<center><h2>Comparison between in-context matting and the existing image matting paradigms</h2></center>
<center>
<img src="./resources/paradigm_compare.png" alt="alt text" style="width: 65%; object-fit: cover; max-width:65%;"></a>
</center>
<p style="text-align:justify; text-justify:inter-ideograph;"><left>
"Aux" and "Auto" are abbreviations for automatic matting and auxiliary input-based matting, respectively. In-context matting uniquely requires only a single reference input to achieve the automation of automatic matting and the generalizability of auxiliary input-based matting.
</left></p>
<!-- <div class="container">-->
<!-- <div class="image" width="300px">-->
<!-- <center><p><img class="center" src="./resources/1691760445080.jpg" width="300px"></p></center>-->
<!-- </div>-->
<!-- <div class="text" width="250px">-->
<!-- <p> A ‘good’ mask annotation satisfy two conditions:-->
<!-- 1) class-discriminative. 2) high-resolution, precise mask.-->
<!-- -->
<!-- The average map shows the possibility for us to use for semantic segmentation,-->
<!-- where it is class-discriminative and fine-grained.-->
<!-- </p>-->
<!-- </div>-->
<!-- </div>-->
<!-- <p><img class="center" src="./resources/fig7.png" width="800px"></p>-->
<hr>
<br>
<center> <h2> How to do it (pipeline) </h2> </center>
<p><img class="left" src="./resources/pipeline.png" width="800px"></p>
<p style="text-align:justify; text-justify:inter-ideograph;"><left>
DiffusionMatte integrates a Stable Diffusion-derived feature extractor, an in-context similarity module, and a matting head. It processes a target image, a reference image, and an RoI map. Features of both reference and target images, and self-attention maps of the target image, are extracted and used. The in-context similarity utilizes the in-context query from the reference image to create a guidance map, which, combined with self-attention maps, assists in locating the target object. The matting head then generates the alpha matte for the target object.
</left></p>
<br>
<center> <h2> In-Context Similarity </h2> </center>
<p><img class="left" src="./resources/similarity.png" width="800px"></p>
<p style="text-align:justify; text-justify:inter-ideograph;"><left>
Illustration of the inter- and intra-similarity modules. For simplicity, the resize operation is omitted, only the calculation of one element of the in-context query is depicted, and the fusion process of self-attention maps from a single scale is shown. The inter-similarity computes the similarity between features extracted from the target image and the in-context query derived from the reference image, generating an average similarity map. The intra-similarity combines the self-attention maps representing intra-image similarities within the target image with the similarity map obtained from the inter-similarity module.
</left></p>
<br>
<center> <h2> Test dataset: ICM-57 </h2> </center>
<center>
<p><img class="left" src="./resources/dataset.png" width="650px"></p></center>
<p style="text-align:justify; text-justify:inter-ideograph;"><left>
To assess the performance of our model, we constructed the first testing dataset for in-context matting, named ICM-57, which comprises 57 image groups that form various real-world contexts. Our test set ICM-$57$ encompasses foregrounds of the same category and same instance, fulfilling the essence of in-context matting.
</left></p>
<br>
<hr>
<center><h2>Experiment-1: Comparison with in-context segmentation models</h2></center>
<center>
<p><img class="center" src="./resources/seg.png" width="550px"></p>
<center><h2>Experiment-2: Comparison with automatic and auxiliary input-based matting models</h2></center>
<div class="container">
<div class="image" width="850px">
<center><p><img class="center" src="./resources/matting.png" width="800px"></p></center>
</div>
</div>
</div>
<center><h2>Experiment-3: Comparison with interactive matting models</h2></center>
<div class="container">
<div class="image" width="400px">
<center><p><img class="center" src="./resources/interactive.png" width="500px"></p></center>
<p style="text-align:justify; text-justify:inter-ideograph;"><left>
In the
penultimate row, our method is provided with guidance information for every image, reducing to an auxiliary input-based method.
Our method outperforms automatic methods and some of the auxiliary input-based methods, and its performance is comparable to
that of the trimap-based method, VitMatte.
</left></p>
</div>
</div>
<center>
<img src="./resources/quality.png" alt="alt text" style="width: 100%; object-fit: cover; max-width:100%;"></a>
</center>
<p style="text-align:justify; text-justify:inter-ideograph;"><left>
Qualitative results of different image matting methods.
</left></p>
<br>
<center>
<img src="./resources/more_quality.png" alt="alt text" style="width: 100%; object-fit: cover; max-width:100%;"></a>
</center>
<p style="text-align:justify; text-justify:inter-ideograph;"><left>
Qualitative results of DiffusionMatte. The first column shows the reference input, while the remaining columns display target
images and their predicted alpha mattes. Given a single reference input, our method can automatically process the same instance or category
of foreground.
</left></p>
<br>
<center>
<img src="./resources/video.png" alt="alt text" style="width: 65%; object-fit: cover; max-width:65%;"></a>
</center>
<p style="text-align:justify; text-justify:inter-ideograph;"><left>
The technique of in-context matting is easily extendable to
video object matting. The key is to use a frame of the video
as a reference.
</left></p>
<br>
<br>
<hr>
<center> <h2> Acknowledgements </h2> </center>
<p>
Based on a template by <a href="https://lipurple.github.io/">
Ziyi Li</a> and <a href="http://richzhang.github.io/">Richard Zhang</a>.
</p>
<br>
<br>
</body><grammarly-desktop-integration data-grammarly-shadow-root="true"></grammarly-desktop-integration></html>