-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: functionalities for supporting NeuralOperators.jl #217
Conversation
75a93f0
to
c3b00f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reactant.jl Benchmarks
Benchmark suite | Current: 13bed11 | Previous: a17315c | Ratio |
---|---|---|---|
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1237836714 ns |
1263749401 ns |
0.98 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1181078628 ns |
1254668396 ns |
0.94 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1244666440 ns |
1218277318 ns |
1.02 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2520995691 ns |
2376495016 ns |
1.06 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Lux |
225448920 ns |
217726580 ns |
1.04 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) |
5433039540 ns |
7226166416 ns |
0.75 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant |
5262336712 ns |
5511150207 ns |
0.95 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) |
6121805326 ns |
5102020848 ns |
1.20 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) |
6654536818 ns |
6993217459 ns |
0.95 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Lux |
35399732577 ns |
38085761917 ns |
0.93 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1320508413 ns |
1208392095 ns |
1.09 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1256171167.5 ns |
1331979590 ns |
0.94 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1289987445.5 ns |
1228565001 ns |
1.05 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2459802462 ns |
2452231772 ns |
1.00 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Lux |
8834390 ns |
8748209 ns |
1.01 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) |
1594719649 ns |
1578057500 ns |
1.01 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1572563209 ns |
1557311922 ns |
1.01 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) |
1627555250 ns |
1557684126 ns |
1.04 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) |
2764021139 ns |
2769517816 ns |
1.00 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Lux |
2682591748 ns |
3303048898.5 ns |
0.81 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1305988072 ns |
1303432996 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1199774741.5 ns |
1292627349.5 ns |
0.93 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1317524235 ns |
1312140581.5 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2876325912 ns |
2608146101 ns |
1.10 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Lux |
22713088.5 ns |
22645472 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) |
2145367408 ns |
2183323759 ns |
0.98 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant |
2138354014 ns |
2161824787 ns |
0.99 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) |
2146049510 ns |
2150773246 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) |
3401752026 ns |
3353554606 ns |
1.01 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Lux |
5530337191 ns |
6032060527 ns |
0.92 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1319398209.5 ns |
1315388210 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1255618293 ns |
1313576758.5 ns |
0.96 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1268766827 ns |
1308732662.5 ns |
0.97 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2568271454 ns |
2435356858 ns |
1.05 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Lux |
7053841 ns |
6572926 ns |
1.07 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) |
1416284689 ns |
1416310529 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1415647618 ns |
1409069455 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) |
1417522917 ns |
1410196431 ns |
1.01 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) |
2625238876 ns |
2620146990 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Lux |
1340766208 ns |
1384443752 ns |
0.97 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1275439730 ns |
1325713657.5 ns |
0.96 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1299474659.5 ns |
1268777827.5 ns |
1.02 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1271826170.5 ns |
1294207842.5 ns |
0.98 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2572968803 ns |
2374603722 ns |
1.08 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Lux |
12314527 ns |
12110782.5 ns |
1.02 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) |
1706047630 ns |
1711411728 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant |
1715179909 ns |
1707811998 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) |
1705348191 ns |
1709512803 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) |
2898523358 ns |
2924854567 ns |
0.99 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Lux |
3166568622.5 ns |
2927891069 ns |
1.08 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1271100172 ns |
1270178508 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1285015972 ns |
1317660758 ns |
0.98 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1294704559.5 ns |
1263311709 ns |
1.02 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2499033817 ns |
2584191843 ns |
0.97 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Lux |
27316886 ns |
27307540.5 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) |
2168515158 ns |
2190938487 ns |
0.99 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant |
2163505498 ns |
2166284687 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) |
2184023752 ns |
2137987987 ns |
1.02 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) |
3368049564 ns |
3415666738 ns |
0.99 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Lux |
5649485923.5 ns |
6038343271.5 ns |
0.94 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1301274508 ns |
1233854317 ns |
1.05 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1287620835 ns |
1299829181.5 ns |
0.99 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1231210695 ns |
1226243251 ns |
1.00 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2549235201 ns |
2393640923 ns |
1.07 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Lux |
52701734.5 ns |
52646968 ns |
1.00 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) |
2929489222 ns |
3006477320 ns |
0.97 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant |
2957495067 ns |
2989128551 ns |
0.99 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) |
2959664638 ns |
3003357676 ns |
0.99 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) |
4286836180 ns |
4443262702 ns |
0.96 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Lux |
10616408466 ns |
24545735518 ns |
0.43 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1215633676 ns |
1288108103 ns |
0.94 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1284427454 ns |
1247053980 ns |
1.03 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1253724058 ns |
1260403416 ns |
0.99 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2452415638 ns |
2513765600 ns |
0.98 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Lux |
70797645 ns |
70692019 ns |
1.00 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) |
3157439544 ns |
3164689242 ns |
1.00 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant |
3161185461 ns |
3166667974 ns |
1.00 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) |
3177237100 ns |
3168332239 ns |
1.00 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) |
4641316240 ns |
4510953172 ns |
1.03 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Lux |
11032831735 ns |
12354970629 ns |
0.89 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1291666179 ns |
1242550154 ns |
1.04 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1260145686 ns |
1270011702 ns |
0.99 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1285175045.5 ns |
1308184956.5 ns |
0.98 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2381519418 ns |
2564144412 ns |
0.93 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Lux |
20605754 ns |
20737061 ns |
0.99 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) |
1839920458 ns |
1846241603 ns |
1.00 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1844208126 ns |
1845891211 ns |
1.00 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) |
1845581818 ns |
1838778303 ns |
1.00 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) |
3092362331 ns |
3067201183 ns |
1.01 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Lux |
2998446872 ns |
3142722042.5 ns |
0.95 |
This comment was automatically generated by workflow using github-action-benchmark.
Also we might want to optimize to a no-op? Module:
module attributes {transform.with_named_sequence} {
func.func @main(%arg0: tensor<2x4x3xf64>) -> tensor<2x4x3xf64> {
%0 = stablehlo.fft %arg0, type = RFFT, length = [2, 4, 3] : (tensor<2x4x3xf64>) -> tensor<2x4x2xcomplex<f64>>
%1 = stablehlo.fft %0, type = IRFFT, length = [2, 4, 3] : (tensor<2x4x2xcomplex<f64>>) -> tensor<2x4x3xf64>
return %1 : tensor<2x4x3xf64>
}
} |
9f6013c
to
066f2a9
Compare
fft
and variants
yeah I think this needs optimization and differentiation rules in the Enzyme-JaX repo to do so |
185dad9
to
672edbf
Compare
Benchmark Results
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
43edb9a
to
711ae3e
Compare
711ae3e
to
3e26602
Compare
stablehlo.fft
NNlib.pad_constant