Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TensorFlow implementation of EfficientFormer #22620

Merged
merged 32 commits into from
May 31, 2023

Conversation

D-Roberts
Copy link
Contributor

@D-Roberts D-Roberts commented Apr 6, 2023

What does this PR do?

  • Adding EfficientFormer computer vision model TensorFlow port (not a llm port).
  • Fixes some minor typos and a couple of differences in the PyTorch model code: 1) the non-dict / tuple return was not returning last hidden state but the state before last stage. The dict and tuple return of the encoder should be equivalent, as seen in other models. 2) Two layernorms were not using the config eps (assuming that the config is the ground truth). Let me know how you think about this.

Ran tests (CPU-only, all pass) with:
NVIDIA_TF32_OVERRIDE=1 RUN_SLOW=1 RUN_PT_TF_CROSS_TESTS=1 py.test -vv -rA tests/models/efficientformer/test_modeling_tf_efficientformer.py

Double checked pt and tf architecture codes with the "EfficientFormer: Vision Transformers at MobileNet Speed" paper.

Verified on example image shapes and diffs in hidden states:

from transformers import EfficientFormerImageProcessor
from src.transformers.models.efficientformer.modeling_tf_efficientformer import TFEfficientFormerModel
from src.transformers.models.efficientformer.modeling_efficientformer import EfficientFormerModel 

model_tf = TFEfficientFormerModel.from_pretrained("snap-research/efficientformer-l1-300", from_pt=True)
model_pt = EfficientFormerModel.from_pretrained("snap-research/efficientformer-l1-300")

image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
proc =  EfficientFormerImageProcessor.from_pretrained("snap-research/efficientformer-l1-300")
inputstf = proc(images=image, return_tensors="tf")
inputspt = proc(images=image, return_tensors="pt")
       
outtf = model_tf(**inputstf, output_hidden_states=True, training=False)
with torch.no_grad():
    outpt = model_pt(**inputspt, output_hidden_states=True)

max_diff = np.amax(np.abs(outtf[0].numpy() - outpt[0].numpy()))
print(f"last hidden diff shape: {outtf[0].shape}, last hidden diff: {max_diff}, last hidden <= 1e-4, {max_diff <= 1e-4}")
for i in range(7):
    max_diff = np.amax(np.abs(outtf[1][i].numpy() - outpt[1][i].numpy()))
    print(f"hidden state {i} shape: {outtf[1][i].shape}, diff: {max_diff}, max_diff <= 1e-4: {max_diff <= 1e-4}")

which gives:

last hidden diff shape: (1, 49, 448), last hidden diff: 2.1457672119140625e-05, last hidden <= 1e-4, True
hidden state 0 shape: (1, 48, 56, 56), diff: 7.271766662597656e-06, max_diff <= 1e-4: True
hidden state 1 shape: (1, 48, 56, 56), diff: 5.054473876953125e-05, max_diff <= 1e-4: True
hidden state 2 shape: (1, 96, 28, 28), diff: 2.9087066650390625e-05, max_diff <= 1e-4: True
hidden state 3 shape: (1, 96, 28, 28), diff: 2.3603439331054688e-05, max_diff <= 1e-4: True
hidden state 4 shape: (1, 224, 14, 14), diff: 1.6689300537109375e-05, max_diff <= 1e-4: True
hidden state 5 shape: (1, 224, 14, 14), diff: 4.1961669921875e-05, max_diff <= 1e-4: True
hidden state 6 shape: (1, 448, 7, 7), diff: 1.9550323486328125e-05, max_diff <= 1e-4: True

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@D-Roberts D-Roberts marked this pull request as draft April 6, 2023 13:53
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@sgugger
Copy link
Collaborator

sgugger commented Apr 6, 2023

cc @Rocketknight1

@Rocketknight1
Copy link
Member

Hi @D-Roberts, just letting you know the TF team at Hugging Face is aware of this and definitely interested in the port! Please ping me or @gante whenever it's ready for review, or if you run into any issues while porting.

@D-Roberts D-Roberts force-pushed the add_tf_efficientformer branch 2 times, most recently from 500de37 to 1d3c82d Compare April 27, 2023 13:48
@D-Roberts D-Roberts force-pushed the add_tf_efficientformer branch 2 times, most recently from 4a64767 to 6f8c787 Compare May 7, 2023 22:17
@D-Roberts D-Roberts changed the title [WIP] Add Tensorflow implementation of Efficientformer [WIP] Add TensorFlow implementation of EfficientFormer May 11, 2023
@D-Roberts D-Roberts force-pushed the add_tf_efficientformer branch 2 times, most recently from b2ac9bb to 2cc5a5e Compare May 13, 2023 00:25
@D-Roberts D-Roberts marked this pull request as ready for review May 13, 2023 19:51
@D-Roberts D-Roberts changed the title [WIP] Add TensorFlow implementation of EfficientFormer Add TensorFlow implementation of EfficientFormer May 13, 2023
@D-Roberts
Copy link
Contributor Author

@Rocketknight1 @gante This PR is now ready for review.

Copy link
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks like an incredibly solid port! I think this might be the best handling of a complex PR like this that I've ever seen from someone not on the Hugging Face payroll. Most issues I raised are just comments or nits, but manipulating self.ab during the forward pass and the layer names in the encoder are two that could potentially be breaking. Let me know what you think of the proposed solutions there, but I think this PR should be ready to merge very soon.

@Rocketknight1
Copy link
Member

cc @amyeroberts for core maintainer review as well

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this model!

Overall a really nice, clean PR, super easy to review 🤗 There's a few places were the architecture implementation deviates for the standard pattern but this seems to come from the PT model.

In general, just a few comments before we're good to merge:

  • As @Rocketknight1 highlighted, the logic for self.ab is very non-canonical and potentially breaking for TF, so let's go for the local var ab = tf.gather(...) logic
  • The serving_output logic should be updated for the hidden_states and attentions to be conditionally returned based on the config settings, and a comment added about why they're not converted to tensors (different shapes) as in other vision models
  • Switching the NHWC format should just happen once in the main layer

@D-Roberts
Copy link
Contributor Author

@Rocketknight1 @amyeroberts I addressed your comments and also submitted two PRs for the l1 and l3 weights (and tagged Rocketknight1). Let me know what's next!

@amyeroberts
Copy link
Collaborator

@D-Roberts - that's great!

For the CI - it seems there is an issue with your CircleCI permissions, as the tests won't run.
Could you try refreshing your permissions as shown here? Once all the tests are green, we'll be ready for final reviews :)

@D-Roberts
Copy link
Contributor Author

D-Roberts commented May 23, 2023

@amyeroberts Thanks for pointing out the circle ci fix. It appears that one doc test which (rightly) can't find tf weights is failing for now. I added back the from_pt in the model tests for the sake of ci tests until the tf weights get merged.

@Rocketknight1
Copy link
Member

@D-Roberts Just to let you know, we've reached out to the team at Snap to ask them to merge your PRs on the EfficientFormer checkpoints. Sorry for the delay!

@Rocketknight1
Copy link
Member

@D-Roberts the checkpoint PRs should be merged now. Thank you to @alanspike for the quick response!

@D-Roberts
Copy link
Contributor Author

@amyeroberts @Rocketknight1 All local tests pass with the new tf weights. The CI gets this documentation tests failing; the pt version also predicts 281 which maps to label_281 in config.

@Rocketknight1
Copy link
Member

@D-Roberts I think it's fine to swap those tests for just checking the actual argmax index rather than the id2label string value. Obviously the repository config doesn't actually have the id2label values set, so fixing that would require another PR to the repos.

@D-Roberts
Copy link
Contributor Author

@Rocketknight1 All green again. :)

@amyeroberts amyeroberts merged commit 88f50a1 into huggingface:main May 31, 2023
sheonhan pushed a commit to sheonhan/transformers that referenced this pull request Jun 1, 2023
* Add tf code for efficientformer

* Fix return dict bug - return last hidden state after last stage

* Fix corresponding return dict bug

* Override test tol

* Change default values of training to False

* Set training to default False X3

* Rm axis from ln

* Set init in dense projection

* Rm debug stuff

* Make style; all tests pass.

* Modify year to 2023

* Fix attention biases codes

* Update the shape list logic

* Add a batch norm eps config

* Remove extract comments in test files

* Add conditional attn and hidden states return for serving output

* Change channel dim checking logic

* Add exception for withteacher model in training mode

* Revert layer count for now

* Add layer count for conditional layer naming

* Transpose for conv happens only in main layer

* Make tests smaller

* Make style

* Update doc

* Rm from_pt

* Change to actual expect image class label

* Remove stray print in tests

* Update image processor test

* Remove the old serving output logic

* Make style

* Make style

* Complete test
gojiteji pushed a commit to gojiteji/transformers that referenced this pull request Jun 5, 2023
* Add tf code for efficientformer

* Fix return dict bug - return last hidden state after last stage

* Fix corresponding return dict bug

* Override test tol

* Change default values of training to False

* Set training to default False X3

* Rm axis from ln

* Set init in dense projection

* Rm debug stuff

* Make style; all tests pass.

* Modify year to 2023

* Fix attention biases codes

* Update the shape list logic

* Add a batch norm eps config

* Remove extract comments in test files

* Add conditional attn and hidden states return for serving output

* Change channel dim checking logic

* Add exception for withteacher model in training mode

* Revert layer count for now

* Add layer count for conditional layer naming

* Transpose for conv happens only in main layer

* Make tests smaller

* Make style

* Update doc

* Rm from_pt

* Change to actual expect image class label

* Remove stray print in tests

* Update image processor test

* Remove the old serving output logic

* Make style

* Make style

* Complete test
@D-Roberts
Copy link
Contributor Author

@sgugger @amyeroberts @Rocketknight1 I was wondering - when do you plan a transformers release that includes this code?

@amyeroberts
Copy link
Collaborator

@D-Roberts We release roughly once a month and are planning on releasing 4.30 later this week. If you need it right now, it's possible to install from source to have the main version too.

novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
* Add tf code for efficientformer

* Fix return dict bug - return last hidden state after last stage

* Fix corresponding return dict bug

* Override test tol

* Change default values of training to False

* Set training to default False X3

* Rm axis from ln

* Set init in dense projection

* Rm debug stuff

* Make style; all tests pass.

* Modify year to 2023

* Fix attention biases codes

* Update the shape list logic

* Add a batch norm eps config

* Remove extract comments in test files

* Add conditional attn and hidden states return for serving output

* Change channel dim checking logic

* Add exception for withteacher model in training mode

* Revert layer count for now

* Add layer count for conditional layer naming

* Transpose for conv happens only in main layer

* Make tests smaller

* Make style

* Update doc

* Rm from_pt

* Change to actual expect image class label

* Remove stray print in tests

* Update image processor test

* Remove the old serving output logic

* Make style

* Make style

* Complete test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants