We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在相同的1n1g的机器资源下,为什么对于tensor model parallel,bs更大,samples/s 还小了?
The text was updated successfully, but these errors were encountered:
视前向计算在整体的占比,如果是 acc 场景, 占比会更大一些,约 1/3 = 前向 /( 前向 + 反向),一般网络,反向计算量是前向的两倍。
tensor model parallel 中用到了 ac,所以才可以跑 128 这么大的 bs,代价就是会多做一次前向。
Sorry, something went wrong.
哦哦,了解了,这样看来对于bert,使用tensor parallel没有效果啊
No branches or pull requests
在相同的1n1g的机器资源下,为什么对于tensor model parallel,bs更大,samples/s 还小了?
The text was updated successfully, but these errors were encountered: