Add TensorFlow implementation of EfficientFormer #22620

D-Roberts · 2023-04-06T13:50:48Z

What does this PR do?

Adding EfficientFormer computer vision model TensorFlow port (not a llm port).
Fixes some minor typos and a couple of differences in the PyTorch model code: 1) the non-dict / tuple return was not returning last hidden state but the state before last stage. The dict and tuple return of the encoder should be equivalent, as seen in other models. 2) Two layernorms were not using the config eps (assuming that the config is the ground truth). Let me know how you think about this.

Ran tests (CPU-only, all pass) with:
NVIDIA_TF32_OVERRIDE=1 RUN_SLOW=1 RUN_PT_TF_CROSS_TESTS=1 py.test -vv -rA tests/models/efficientformer/test_modeling_tf_efficientformer.py

Double checked pt and tf architecture codes with the "EfficientFormer: Vision Transformers at MobileNet Speed" paper.

Verified on example image shapes and diffs in hidden states:

from transformers import EfficientFormerImageProcessor
from src.transformers.models.efficientformer.modeling_tf_efficientformer import TFEfficientFormerModel
from src.transformers.models.efficientformer.modeling_efficientformer import EfficientFormerModel 

model_tf = TFEfficientFormerModel.from_pretrained("snap-research/efficientformer-l1-300", from_pt=True)
model_pt = EfficientFormerModel.from_pretrained("snap-research/efficientformer-l1-300")

image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
proc =  EfficientFormerImageProcessor.from_pretrained("snap-research/efficientformer-l1-300")
inputstf = proc(images=image, return_tensors="tf")
inputspt = proc(images=image, return_tensors="pt")
       
outtf = model_tf(**inputstf, output_hidden_states=True, training=False)
with torch.no_grad():
    outpt = model_pt(**inputspt, output_hidden_states=True)

max_diff = np.amax(np.abs(outtf[0].numpy() - outpt[0].numpy()))
print(f"last hidden diff shape: {outtf[0].shape}, last hidden diff: {max_diff}, last hidden <= 1e-4, {max_diff <= 1e-4}")
for i in range(7):
    max_diff = np.amax(np.abs(outtf[1][i].numpy() - outpt[1][i].numpy()))
    print(f"hidden state {i} shape: {outtf[1][i].shape}, diff: {max_diff}, max_diff <= 1e-4: {max_diff <= 1e-4}")

which gives:

last hidden diff shape: (1, 49, 448), last hidden diff: 2.1457672119140625e-05, last hidden <= 1e-4, True
hidden state 0 shape: (1, 48, 56, 56), diff: 7.271766662597656e-06, max_diff <= 1e-4: True
hidden state 1 shape: (1, 48, 56, 56), diff: 5.054473876953125e-05, max_diff <= 1e-4: True
hidden state 2 shape: (1, 96, 28, 28), diff: 2.9087066650390625e-05, max_diff <= 1e-4: True
hidden state 3 shape: (1, 96, 28, 28), diff: 2.3603439331054688e-05, max_diff <= 1e-4: True
hidden state 4 shape: (1, 224, 14, 14), diff: 1.6689300537109375e-05, max_diff <= 1e-4: True
hidden state 5 shape: (1, 224, 14, 14), diff: 4.1961669921875e-05, max_diff <= 1e-4: True
hidden state 6 shape: (1, 448, 7, 7), diff: 1.9550323486328125e-05, max_diff <= 1e-4: True

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2023-04-06T14:04:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

sgugger · 2023-04-06T15:37:48Z

cc @Rocketknight1

Rocketknight1 · 2023-04-17T16:14:31Z

Hi @D-Roberts, just letting you know the TF team at Hugging Face is aware of this and definitely interested in the port! Please ping me or @gante whenever it's ready for review, or if you run into any issues while porting.

D-Roberts · 2023-05-15T18:24:52Z

@Rocketknight1 @gante This PR is now ready for review.

Rocketknight1

Overall this looks like an incredibly solid port! I think this might be the best handling of a complex PR like this that I've ever seen from someone not on the Hugging Face payroll. Most issues I raised are just comments or nits, but manipulating self.ab during the forward pass and the layer names in the encoder are two that could potentially be breaking. Let me know what you think of the proposed solutions there, but I think this PR should be ready to merge very soon.

src/transformers/models/efficientformer/modeling_efficientformer.py

src/transformers/models/efficientformer/modeling_tf_efficientformer.py

Rocketknight1 · 2023-05-16T14:26:30Z

cc @amyeroberts for core maintainer review as well

amyeroberts

Thanks for adding this model!

Overall a really nice, clean PR, super easy to review 🤗 There's a few places were the architecture implementation deviates for the standard pattern but this seems to come from the PT model.

In general, just a few comments before we're good to merge:

As @Rocketknight1 highlighted, the logic for self.ab is very non-canonical and potentially breaking for TF, so let's go for the local var ab = tf.gather(...) logic
The serving_output logic should be updated for the hidden_states and attentions to be conditionally returned based on the config settings, and a comment added about why they're not converted to tensors (different shapes) as in other vision models
Switching the NHWC format should just happen once in the main layer

src/transformers/models/efficientformer/configuration_efficientformer.py

tests/models/efficientformer/test_modeling_tf_efficientformer.py

src/transformers/models/efficientformer/modeling_tf_efficientformer.py

D-Roberts · 2023-05-21T22:00:17Z

@Rocketknight1 @amyeroberts I addressed your comments and also submitted two PRs for the l1 and l3 weights (and tagged Rocketknight1). Let me know what's next!

amyeroberts · 2023-05-22T14:42:16Z

@D-Roberts - that's great!

For the CI - it seems there is an issue with your CircleCI permissions, as the tests won't run.
Could you try refreshing your permissions as shown here? Once all the tests are green, we'll be ready for final reviews :)

D-Roberts · 2023-05-23T01:20:20Z

@amyeroberts Thanks for pointing out the circle ci fix. It appears that one doc test which (rightly) can't find tf weights is failing for now. I added back the from_pt in the model tests for the sake of ci tests until the tf weights get merged.

Rocketknight1 · 2023-05-24T13:38:14Z

@D-Roberts Just to let you know, we've reached out to the team at Snap to ask them to merge your PRs on the EfficientFormer checkpoints. Sorry for the delay!

Rocketknight1 · 2023-05-24T20:21:12Z

@D-Roberts the checkpoint PRs should be merged now. Thank you to @alanspike for the quick response!

D-Roberts · 2023-05-25T01:09:01Z

@amyeroberts @Rocketknight1 All local tests pass with the new tf weights. The CI gets this documentation tests failing; the pt version also predicts 281 which maps to label_281 in config.

Rocketknight1 · 2023-05-25T13:46:57Z

@D-Roberts I think it's fine to swap those tests for just checking the actual argmax index rather than the id2label string value. Obviously the repository config doesn't actually have the id2label values set, so fixing that would require another PR to the repos.

D-Roberts · 2023-05-30T20:27:27Z

@Rocketknight1 All green again. :)

* Add tf code for efficientformer * Fix return dict bug - return last hidden state after last stage * Fix corresponding return dict bug * Override test tol * Change default values of training to False * Set training to default False X3 * Rm axis from ln * Set init in dense projection * Rm debug stuff * Make style; all tests pass. * Modify year to 2023 * Fix attention biases codes * Update the shape list logic * Add a batch norm eps config * Remove extract comments in test files * Add conditional attn and hidden states return for serving output * Change channel dim checking logic * Add exception for withteacher model in training mode * Revert layer count for now * Add layer count for conditional layer naming * Transpose for conv happens only in main layer * Make tests smaller * Make style * Update doc * Rm from_pt * Change to actual expect image class label * Remove stray print in tests * Update image processor test * Remove the old serving output logic * Make style * Make style * Complete test

D-Roberts · 2023-06-07T11:54:42Z

@sgugger @amyeroberts @Rocketknight1 I was wondering - when do you plan a transformers release that includes this code?

amyeroberts · 2023-06-07T12:37:38Z

@D-Roberts We release roughly once a month and are planning on releasing 4.30 later this week. If you need it right now, it's possible to install from source to have the main version too.

* Add tf code for efficientformer * Fix return dict bug - return last hidden state after last stage * Fix corresponding return dict bug * Override test tol * Change default values of training to False * Set training to default False X3 * Rm axis from ln * Set init in dense projection * Rm debug stuff * Make style; all tests pass. * Modify year to 2023 * Fix attention biases codes * Update the shape list logic * Add a batch norm eps config * Remove extract comments in test files * Add conditional attn and hidden states return for serving output * Change channel dim checking logic * Add exception for withteacher model in training mode * Revert layer count for now * Add layer count for conditional layer naming * Transpose for conv happens only in main layer * Make tests smaller * Make style * Update doc * Rm from_pt * Change to actual expect image class label * Remove stray print in tests * Update image processor test * Remove the old serving output logic * Make style * Make style * Complete test

D-Roberts marked this pull request as draft April 6, 2023 13:53

D-Roberts force-pushed the add_tf_efficientformer branch from fb3da1f to 1a25b23 Compare April 8, 2023 23:31

D-Roberts force-pushed the add_tf_efficientformer branch 2 times, most recently from 500de37 to 1d3c82d Compare April 27, 2023 13:48

D-Roberts force-pushed the add_tf_efficientformer branch 2 times, most recently from 4a64767 to 6f8c787 Compare May 7, 2023 22:17

D-Roberts changed the title ~~[WIP] Add Tensorflow implementation of Efficientformer~~ [WIP] Add TensorFlow implementation of EfficientFormer May 11, 2023

D-Roberts force-pushed the add_tf_efficientformer branch 2 times, most recently from b2ac9bb to 2cc5a5e Compare May 13, 2023 00:25

D-Roberts marked this pull request as ready for review May 13, 2023 19:51

D-Roberts changed the title ~~[WIP] Add TensorFlow implementation of EfficientFormer~~ Add TensorFlow implementation of EfficientFormer May 13, 2023

Rocketknight1 approved these changes May 16, 2023

View reviewed changes

Rocketknight1 requested a review from amyeroberts May 16, 2023 14:26

amyeroberts reviewed May 17, 2023

View reviewed changes

D-Roberts force-pushed the add_tf_efficientformer branch from 4fd39d8 to dbd5e73 Compare May 20, 2023 16:53

D-Roberts force-pushed the add_tf_efficientformer branch from 092502e to 02c7ad6 Compare May 23, 2023 00:03

D-Roberts force-pushed the add_tf_efficientformer branch from 02c7ad6 to d442667 Compare May 25, 2023 00:19

D-Roberts force-pushed the add_tf_efficientformer branch from b188b39 to 9d30773 Compare May 26, 2023 12:42

D-Roberts added 22 commits May 30, 2023 20:07

Modify year to 2023

6b00027

Fix attention biases codes

b02b574

Update the shape list logic

5e266f6

Add a batch norm eps config

6f95124

Remove extract comments in test files

3f13aa1

Add conditional attn and hidden states return for serving output

427d02e

Change channel dim checking logic

2827456

Add exception for withteacher model in training mode

046a2eb

Revert layer count for now

c6e56dd

Add layer count for conditional layer naming

919eaea

Transpose for conv happens only in main layer

683afc9

Make tests smaller

ef03fe5

Make style

289f0e8

Update doc

7b6aaf8

Rm from_pt

e7232d9

Change to actual expect image class label

812218a

Remove stray print in tests

25e0b77

Update image processor test

f145693

Remove the old serving output logic

56436b9

Make style

c49d0f8

Make style

2319363

Complete test

a2b9995

D-Roberts force-pushed the add_tf_efficientformer branch from b17b4cd to a2b9995 Compare May 30, 2023 20:11

amyeroberts merged commit 88f50a1 into huggingface:main May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TensorFlow implementation of EfficientFormer #22620

Add TensorFlow implementation of EfficientFormer #22620

D-Roberts commented Apr 6, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 6, 2023

sgugger commented Apr 6, 2023

Rocketknight1 commented Apr 17, 2023

D-Roberts commented May 15, 2023

Rocketknight1 left a comment

Rocketknight1 commented May 16, 2023

amyeroberts left a comment

D-Roberts commented May 21, 2023

amyeroberts commented May 22, 2023

D-Roberts commented May 23, 2023 •

edited

Loading

Rocketknight1 commented May 24, 2023

Rocketknight1 commented May 24, 2023

D-Roberts commented May 25, 2023

Rocketknight1 commented May 25, 2023

D-Roberts commented May 30, 2023

D-Roberts commented Jun 7, 2023

amyeroberts commented Jun 7, 2023

Add TensorFlow implementation of EfficientFormer #22620

Add TensorFlow implementation of EfficientFormer #22620

Conversation

D-Roberts commented Apr 6, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Apr 6, 2023

sgugger commented Apr 6, 2023

Rocketknight1 commented Apr 17, 2023

D-Roberts commented May 15, 2023

Rocketknight1 left a comment

Choose a reason for hiding this comment

Rocketknight1 commented May 16, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

D-Roberts commented May 21, 2023

amyeroberts commented May 22, 2023

D-Roberts commented May 23, 2023 • edited Loading

Rocketknight1 commented May 24, 2023

Rocketknight1 commented May 24, 2023

D-Roberts commented May 25, 2023

Rocketknight1 commented May 25, 2023

D-Roberts commented May 30, 2023

D-Roberts commented Jun 7, 2023

amyeroberts commented Jun 7, 2023

D-Roberts commented Apr 6, 2023 •

edited

Loading

D-Roberts commented May 23, 2023 •

edited

Loading