-
Notifications
You must be signed in to change notification settings - Fork 0
/
CITATION.cff
83 lines (83 loc) · 2.91 KB
/
CITATION.cff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
cff-version: 1.2.0
title: >-
HEAD: HEtero-Assists Distillation for Heterogeneous
Object Detectors
message: Please cite this project using these metadata.
type: software
authors:
- given-names: Luting
family-names: Wang
email: [email protected]
orcid: 'https://orcid.org/0000-0001-8317-226X'
affiliation: Beihang University
- given-names: Xiaojie
family-names: Li
email: [email protected]
affiliation: SenseTime
- given-names: Yue
family-names: Liao
email: [email protected]
affiliation: Beihang University
- given-names: Zeren
family-names: Jiang
email: [email protected]
affiliation: ETH Zurich
- given-names: Jianlong
email: [email protected]
affiliation: Shandong University
family-names: Wu
- given-names: Fei
email: [email protected]
affiliation: University of Science and Technology of China
family-names: Wang
- given-names: Chen
email: [email protected]
affiliation: SenseTime
family-names: Qian
- given-names: Si
email: [email protected]
affiliation: Beihang University
family-names: Liu
identifiers:
- type: doi
value: 10.48550/arXiv.2207.05345
description: arXiv
repository-code: 'https://github.com/LutingWang/HEAD'
abstract: >-
Conventional knowledge distillation (KD) methods
for object detection mainly concentrate on
homogeneous teacher-student detectors. However, the
design of a lightweight detector for deployment is
often significantly different from a high-capacity
detector. Thus, we investigate KD among
heterogeneous teacher-student pairs for a wide
application. We observe that the core difficulty
for heterogeneous KD (hetero-KD) is the significant
semantic gap between the backbone features of
heterogeneous detectors due to the different
optimization manners. Conventional homogeneous KD
(homo-KD) methods suffer from such a gap and are
hard to directly obtain satisfactory performance
for hetero-KD. In this paper, we propose the
HEtero-Assists Distillation (HEAD) framework,
leveraging heterogeneous detection heads as
assistants to guide the optimization of the student
detector to reduce this gap. In HEAD, the assistant
is an additional detection head with the
architecture homogeneous to the teacher head
attached to the student backbone. Thus, a hetero-KD
is transformed into a homo-KD, allowing efficient
knowledge transfer from the teacher to the student.
Moreover, we extend HEAD into a Teacher-Free HEAD
(TF-HEAD) framework when a well-trained teacher
detector is unavailable. Our method has achieved
significant improvement compared to current
detection KD methods. For example, on the MS-COCO
dataset, TF-HEAD helps R18 RetinaNet achieve 33.9
mAP (+2.2), while HEAD further pushes the limit to
36.2 mAP (+4.5).
keywords:
- Knowledge Distillation
- Object Detection
- Heterogeneous
license: Apache-2.0