-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flash_attn support #306
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll not approve flash-attn support as mentioned.
Why not? This is not intended to replace the existing build procedure, just an option for those that need to use it. My company can afford the extra dollars to afford the gpus/cpus that support flash attention etc. to have the latest models. A lot of other startups would be in the same position I believe. |
Understood, but it might be challenging to maintain. There is a tensortrt stage in the main dockerfile. Are you able to contribute it in similar way? Not sure if you are aware, but I am activley using flash-attention-2 via torch. https://pytorch.org/blog/pytorch2-2/ |
Can do, will get that done by end of week
I am not active in pytorch/sentence-transformers scene so I am not sure if i'm the right person to open up an issue. However, if you do I would be delighted to show all the horrible things going on with the flash_attn repo rn 😂 |
Sounds exciting. I think the flash-attn repo should be installable with pip in the main docker image. An extra stage should be minimal maintainance. |
Unfortunately I gotta delay this into next week. We are very close to launching our Beta in my company so I am absolutely swamped right now. Want to properly test it |
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #306 +/- ##
=======================================
Coverage 77.69% 77.69%
=======================================
Files 35 35
Lines 2511 2511
=======================================
Hits 1951 1951
Misses 560 560 ☔ View full report in Codecov by Sentry. |
* Add flash_attn support (#306) * add dockerfile for flash_attn setup * remove test.py * parametrize model name and engine * Update Dockerfile --------- Co-authored-by: Michael Feil <[email protected]> * Delete libs/infinity_emb/Dockerfile.flash --------- Co-authored-by: Göktürk <[email protected]>
Related to #304 #17
After A LOT of digging, using the Dockerfile setup below allows the use of flash_attn.
The only trouble I see with maintaining this long-term is package compatibility with infinity.
The base image uses
python3.10.12
withtorch 2.4.1
. Flash_attn is extremely picky with dependencies so I am not sure whether other versions would work but as long as infinity continues supportingtorch>=2.2
andpython>=3.9
I think its fineMain Points