Fix RT-DETR weights initialization #31724

qubvel · 2024-07-01T08:50:03Z

What does this PR do?

Fix RT-DETR bbox and class head weight initialization.

In_init_weight method bbox and class heads are not reachable for initialization. This sometimes leads to unstable training and lower results (see experiments below).
Added config parameter for bias initialization, in the original code biases of the head are initialized with prior_prob=0.01 which is OK for training with 80 classes, however, while fine-tuning this value should be adjusted.

Results of the fine-tuning on main vs fix branches on CPPE-5 dataset (averaged for 6 runs each):

better convergence: +0.1-0.15 mAP50 on average on eval and test sets
lower dispersion of results

Who can review?

@amyeroberts

cc @SangbumChoi @NielsRogge

HuggingFaceDocBuilderDev · 2024-07-01T09:09:29Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

NielsRogge

Thanks for improving this! Does the model converge as fast as the original implementation?

qubvel · 2024-07-01T09:52:35Z

I didn't have a chance to run the fine-tuning with the original code, maybe @SangbumChoi has a fine-tuning script to compare. However, I would say that from my previous experiments with other detection models in transformers RT-DTER with this fix is the best in terms of compute, convergence speed, and results achieved on cppe-5 dataset.

merveenoyan · 2024-07-01T10:07:26Z

@qubvel is there anything else that needs to be done?

qubvel · 2024-07-01T10:11:30Z

@merve there is one more PR for generating anchors cache fix, but it's not that critical
#31671

I also implemented sdpa attention, but didn't observe any speed-up in inference speed

SangbumChoi · 2024-07-01T10:35:29Z

@qubvel Isn't SDPA is default operation in MDHA?

This function has already been incorporated into torch.nn.MultiheadAttention and torch.nn.TransformerEncoderLayer.

Since there are many FLOPS in encoder (which is not related to Attention module) I guess speed-up with applying attention friendly library such as SDPA, xformers might be marginal.

@qubvel @NielsRogge Thanks for this PR. (Good to here that this is the best result by far) Unfortunately I don't have any results of finetuning raw RTDETR repo. (I have some test result in Transformers RTDETR).

qubvel · 2024-07-01T14:19:58Z

@SangbumChoi I'm talking about RTDetrMultiheadAttention here, I added support for sdpa for this class, but didn't observe any speed-up, I will open a separate PR to discuss it :)

amyeroberts

Thanks for fixing!

qubvel added 5 commits June 28, 2024 16:51

Fix init for rt-detr heads

648fabf

Fixup

adbe0e6

Add separate prior_prob value to config for initialization

0a70c1d

Add bbox init

a6df227

Change to 1 / num_labels init

5962a73

NielsRogge approved these changes Jul 1, 2024

View reviewed changes

qubvel added 2 commits July 1, 2024 09:38

Adjust weights init test

c6e50f8

Fix style for test

8d200d4

qubvel requested a review from amyeroberts July 1, 2024 14:36

qubvel changed the title ~~Fix R-DETR weights initialization~~ Fix RT-DETR weights initialization Jul 1, 2024

amyeroberts approved these changes Jul 1, 2024

View reviewed changes

qubvel merged commit 048f599 into huggingface:main Jul 3, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix RT-DETR weights initialization #31724

Fix RT-DETR weights initialization #31724

qubvel commented Jul 1, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 1, 2024

NielsRogge left a comment

qubvel commented Jul 1, 2024

merveenoyan commented Jul 1, 2024

qubvel commented Jul 1, 2024

SangbumChoi commented Jul 1, 2024

qubvel commented Jul 1, 2024 •

edited

Loading

amyeroberts left a comment

Fix RT-DETR weights initialization #31724

Fix RT-DETR weights initialization #31724

Conversation

qubvel commented Jul 1, 2024 • edited Loading

What does this PR do?

Who can review?

HuggingFaceDocBuilderDev commented Jul 1, 2024

NielsRogge left a comment

Choose a reason for hiding this comment

qubvel commented Jul 1, 2024

merveenoyan commented Jul 1, 2024

qubvel commented Jul 1, 2024

SangbumChoi commented Jul 1, 2024

qubvel commented Jul 1, 2024 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

qubvel commented Jul 1, 2024 •

edited

Loading

qubvel commented Jul 1, 2024 •

edited

Loading