-
Notifications
You must be signed in to change notification settings - Fork 97
/
GL_KHR_shader_subgroup.txt
1491 lines (1105 loc) · 59.6 KB
/
GL_KHR_shader_subgroup.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Name
KHR_shader_subgroup
Name Strings
GL_KHR_shader_subgroup
GL_KHR_shader_subgroup_basic
GL_KHR_shader_subgroup_vote
GL_KHR_shader_subgroup_arithmetic
GL_KHR_shader_subgroup_ballot
GL_KHR_shader_subgroup_shuffle
GL_KHR_shader_subgroup_shuffle_relative
GL_KHR_shader_subgroup_clustered
GL_KHR_shader_subgroup_quad
Contact
Neil Henning (neil 'at' codeplay.com), Codeplay
Contributors
Jeff Bolz, NVIDIA
Matthaeus Chajdas, AMD
Jan-Harald Fredriksen, ARM
Alexander Galazin, ARM
Aaron Greig, Codeplay
Aaron Hagan, AMD
Tobias Hector, Imagination Technologies
Neil Henning, Codeplay
John Kessenich, Google
Daniel Koch, NVIDIA
Graeme Leese, Broadcom
Timothy Lottes, AMD
David Neto, Google
Kevin Petit, ARM
Ralph Potter, Codeplay
Colin Riley, AMD
Robert Simpson, Qualcomm
Notice
Copyright (c) 2018 The Khronos Group Inc. Copyright terms at
http://www.khronos.org/registry/speccopyright.html
Status
Approved by Vulkan working group 12-Sep-2017.
Ratified by the Khronos Board of Promoters 27-Oct-2017.
Version
Last Modified Date: 14-Jul-2019
Revision: 8
Number
TBD.
Dependencies
This extension can be applied to OpenGL GLSL versions 1.40
(#version 140) and higher.
This extension can be applied to OpenGL ES ESSL versions 3.10
(#version 310) and higher.
This extension is written against revision 6 of the OpenGL Shading Language
version 4.50, dated April 14, 2016.
This extension interacts with revision 36 of the GL_KHR_vulkan_glsl
extension, dated February 13, 2017.
Overview
This extension document modifies GLSL to add subgroup functionality.
Invocations are partitioned into subgroups, where invocations within a
subgroup can synchronize and share data with each other efficiently. This
extension introduces a set of built-in functions to synchronize and share
data between invocations within a subgroup, as well as a common set of
arithmetic operations for reductions and scans.
This extension document adds support for the following extensions to be used
within GLSL:
- GL_KHR_shader_subgroup_basic - enables basic subgroup operations.
- GL_KHR_shader_subgroup_vote - enables subgroup vote operations.
- GL_KHR_shader_subgroup_arithmetic - enables subgroup arithmetic
operations.
- GL_KHR_shader_subgroup_ballot - enables subgroup ballot operations.
- GL_KHR_shader_subgroup_shuffle - enables subgroup shuffle operations.
- GL_KHR_shader_subgroup_shuffle_relative - enables subgroup shuffle
relative operations.
- GL_KHR_shader_subgroup_clustered - enables subgroup clustered operations.
- GL_KHR_shader_subgroup_quad - enables subgroup quad operations.
Mapping to SPIR-V
-----------------
For informational purposes (non-specification), the following is an
expected way for an implementation to map GLSL constructs to SPIR-V
constructs:
gl_NumSubgroups -> NumSubgroups decorated OpVariable
gl_SubgroupID -> SubgroupId decorated OpVariable
gl_SubgroupSize -> SubgroupSize decorated OpVariable
gl_SubgroupInvocationID -> SubgroupLocalInvocationId decorated OpVariable
gl_SubgroupEqMask -> SubgroupEqMask decorated OpVariable
gl_SubgroupGeMask -> SubgroupGeMask decorated OpVariable
gl_SubgroupGtMask -> SubgroupGtMask decorated OpVariable
gl_SubgroupLeMask -> SubgroupLeMask decorated OpVariable
gl_SubgroupLtMask -> SubgroupLtMask decorated OpVariable
subgroupBarrier() -> OpControlBarrier(
/*Execution*/Subgroup,
/*Memory*/Subgroup,
/*Semantics*/AcquireRelease | UniformMemory | WorkgroupMemory | ImageMemory)
subgroupMemoryBarrier() -> OpMemoryBarrier(
/*Memory*/Subgroup,
/*Semantics*/AcquireRelease | UniformMemory | WorkgroupMemory | ImageMemory)
subgroupMemoryBarrierBuffer() -> OpMemoryBarrier(
/*Memory*/Subgroup,
/*Semantics*/AcquireRelease | UniformMemory)
subgroupMemoryBarrierShared() -> OpMemoryBarrier(
/*Memory*/Subgroup,
/*Semantics*/AcquireRelease | WorkgroupMemory)
subgroupMemoryBarrierImage() -> OpMemoryBarrier(
/*Memory*/Subgroup,
/*Semantics*/AcquireRelease | ImageMemory)
subgroupElect() -> OpGroupNonUniformElect(
/*Execution*/Subgroup)
subgroupAll(value) -> OpGroupNonUniformAll(
/*Execution*/Subgroup,
/*Predicate*/value)
subgroupAny(value) -> OpGroupNonUniformAny(
/*Execution*/Subgroup,
/*Predicate*/value)
subgroupAllEqual(value) -> OpGroupNonUniformAllEqual(
/*Execution*/Subgroup,
/*Value*/value)
subgroupBroadcast(value, id) -> OpGroupNonUniformBroadcast(
/*Execution*/Subgroup,
/*Value*/value,
/*Id*/id)
subgroupBroadcastFirst(value) -> OpGroupNonUniformBroadcastFirst(
/*Execution*/Subgroup,
/*Value*/value)
subgroupBallot(value) -> OpGroupNonUniformBallot(
/*Execution*/Subgroup,
/*Predicate*/value)
subgroupInverseBallot(value) -> OpGroupNonUniformInverseBallot(
/*Execution*/Subgroup,
/*Value*/value)
subgroupBallotBitExtract(value, id) -> OpGroupNonUniformBallotBitExtract(
/*Execution*/Subgroup,
/*Value*/value,
/*Index*/id)
subgroupBallotBitCount(value) -> OpGroupNonUniformBallotBitCount(
/*Execution*/Subgroup,
/*Operation*/Reduce,
/*Value*/value)
subgroupBallotInclusiveBitCount(value) -> OpGroupNonUniformBallotBitCount(
/*Execution*/Subgroup,
/*Operation*/InclusiveScan,
/*Value*/value)
subgroupBallotExclusiveBitCount(value) -> OpGroupNonUniformBallotBitCount(
/*Execution*/Subgroup,
/*Operation*/ExclusiveScan,
/*Value*/value)
subgroupBallotFindLSB(value) -> OpGroupNonUniformBallotFindLSB(
/*Execution*/Subgroup,
/*Value*/value)
subgroupBallotFindMSB(value) -> OpGroupNonUniformBallotFindMSB(
/*Execution*/Subgroup,
/*Value*/value)
subgroupShuffle(value, id) -> OpGroupNonUniformShuffle(
/*Execution*/Subgroup,
/*Value*/value,
/*Id*/id)
subgroupShuffleXor(value, mask) -> OpGroupNonUniformShuffleXor(
/*Execution*/Subgroup,
/*Value*/value,
/*Mask*/mask)
subgroupShuffleUp(value, delta) -> OpGroupNonUniformShuffleUp(
/*Execution*/Subgroup,
/*Value*/value,
/*Delta*/delta)
subgroupShuffleDown(value, delta) -> OpGroupNonUniformShuffleDown(
/*Execution*/Subgroup,
/*Value*/value,
/*Delta*/delta)
subgroupAdd(value) -> OpGroupNonUniformIAdd | OpGroupNonUniformFAdd(
/*Execution*/Subgroup,
/*Operation*/Reduce,
/*Value*/value)
subgroupMul(value) -> OpGroupNonUniformIMul | OpGroupNonUniformFMul(
/*Execution*/Subgroup,
/*Operation*/Reduce,
/*Value*/value)
subgroupMin(value) -> OpGroupNonUniformSMin | OpGroupNonUniformUMin | OpGroupNonUniformFMin(
/*Execution*/Subgroup,
/*Operation*/Reduce,
/*Value*/value)
subgroupMax(value) -> OpGroupNonUniformSMax | OpGroupNonUniformUMax | OpGroupNonUniformFMax(
/*Execution*/Subgroup,
/*Operation*/Reduce,
/*Value*/value)
subgroupAnd(value) -> OpGroupNonUniformBitwiseAnd | OpGroupNonUniformLogicalAnd(
/*Execution*/Subgroup,
/*Operation*/Reduce,
/*Value*/value)
subgroupOr(value) -> OpGroupNonUniformBitwiseOr | OpGroupNonUniformLogicalOr(
/*Execution*/Subgroup,
/*Operation*/Reduce,
/*Value*/value)
subgroupXor(value) -> OpGroupNonUniformBitwiseXor | OpGroupNonUniformLogicalXor(
/*Execution*/Subgroup,
/*Operation*/Reduce,
/*Value*/value)
subgroupInclusiveAdd(value) -> OpGroupNonUniformIAdd | OpGroupNonUniformFAdd(
/*Execution*/Subgroup,
/*Operation*/InclusiveScan,
/*Value*/value)
subgroupInclusiveMul(value) -> OpGroupNonUniformIMul | OpGroupNonUniformFMul(
/*Execution*/Subgroup,
/*Operation*/InclusiveScan,
/*Value*/value)
subgroupInclusiveMin(value) -> OpGroupNonUniformSMin | OpGroupNonUniformUMin | OpGroupNonUniformFMin(
/*Execution*/Subgroup,
/*Operation*/InclusiveScan,
/*Value*/value)
subgroupInclusiveMax(value) -> OpGroupNonUniformSMax | OpGroupNonUniformUMax | OpGroupNonUniformFMax(
/*Execution*/Subgroup,
/*Operation*/InclusiveScan,
/*Value*/value)
subgroupInclusiveAnd(value) -> OpGroupNonUniformBitwiseAnd | OpGroupNonUniformLogicalAnd(
/*Execution*/Subgroup,
/*Operation*/InclusiveScan,
/*Value*/value)
subgroupInclusiveOr(value) -> OpGroupNonUniformBitwiseOr | OpGroupNonUniformLogicalOr(
/*Execution*/Subgroup,
/*Operation*/InclusiveScan,
/*Value*/value)
subgroupInclusiveXor(value) -> OpGroupNonUniformBitwiseXor | OpGroupNonUniformLogicalXor(
/*Execution*/Subgroup,
/*Operation*/InclusiveScan,
/*Value*/value)
subgroupExclusiveAdd(value) -> OpGroupNonUniformIAdd | OpGroupNonUniformFAdd(
/*Execution*/Subgroup,
/*Operation*/ExclusiveScan,
/*Value*/value)
subgroupExclusiveMul(value) -> OpGroupNonUniformIMul | OpGroupNonUniformFMul(
/*Execution*/Subgroup,
/*Operation*/ExclusiveScan,
/*Value*/value)
subgroupExclusiveMin(value) -> OpGroupNonUniformSMin | OpGroupNonUniformUMin | OpGroupNonUniformFMin(
/*Execution*/Subgroup,
/*Operation*/ExclusiveScan,
/*Value*/value)
subgroupExclusiveMax(value) -> OpGroupNonUniformSMax | OpGroupNonUniformUMax | OpGroupNonUniformFMax(
/*Execution*/Subgroup,
/*Operation*/ExclusiveScan,
/*Value*/value)
subgroupExclusiveAnd(value) -> OpGroupNonUniformBitwiseAnd | OpGroupNonUniformLogicalAnd(
/*Execution*/Subgroup,
/*Operation*/ExclusiveScan,
/*Value*/value)
subgroupExclusiveOr(value) -> OpGroupNonUniformBitwiseOr | OpGroupNonUniformLogicalOr(
/*Execution*/Subgroup,
/*Operation*/ExclusiveScan,
/*Value*/value)
subgroupExclusiveXor(value) -> OpGroupNonUniformBitwiseXor | OpGroupNonUniformLogicalXor(
/*Execution*/Subgroup,
/*Operation*/ExclusiveScan,
/*Value*/value)
subgroupClusteredAdd(value, clusterSize) -> OpGroupNonUniformIAdd | OpGroupNonUniformFAdd(
/*Execution*/Subgroup,
/*Operation*/ClusteredReduce,
/*Value*/value,
/*ClusterSize*/clusterSize)
subgroupClusteredMul(value, clusterSize) -> OpGroupNonUniformIMul | OpGroupNonUniformFMul(
/*Execution*/Subgroup,
/*Operation*/ClusteredReduce,
/*Value*/value,
/*ClusterSize*/clusterSize)
subgroupClusteredMin(value, clusterSize) -> OpGroupNonUniformSMin | OpGroupNonUniformUMin | OpGroupNonUniformFMin(
/*Execution*/Subgroup,
/*Operation*/ClusteredReduce,
/*Value*/value,
/*ClusterSize*/clusterSize)
subgroupClusteredMax(value, clusterSize) -> OpGroupNonUniformSMax | OpGroupNonUniformUMax | OpGroupNonUniformFMax(
/*Execution*/Subgroup,
/*Operation*/ClusteredReduce,
/*Value*/value,
/*ClusterSize*/clusterSize)
subgroupClusteredAnd(value, clusterSize) -> OpGroupNonUniformBitwiseAnd | OpGroupNonUniformLogicalAnd(
/*Execution*/Subgroup,
/*Operation*/ClusteredReduce,
/*Value*/value,
/*ClusterSize*/clusterSize)
subgroupClusteredOr(value, clusterSize) -> OpGroupNonUniformBitwiseOr | OpGroupNonUniformLogicalOr(
/*Execution*/Subgroup,
/*Operation*/ClusteredReduce,
/*Value*/value,
/*ClusterSize*/clusterSize)
subgroupClusteredXor(value, clusterSize) -> OpGroupNonUniformBitwiseXor | OpGroupNonUniformLogicalXor(
/*Execution*/Subgroup,
/*Operation*/ClusteredReduce,
/*Value*/value,
/*ClusterSize*/clusterSize)
subgroupQuadBroadcast(value, id) -> OpGroupNonUniformQuadBroadcast(
/*Execution*/Subgroup,
/*Value*/value,
/*Index*/id)
subgroupQuadSwapHorizontal(value) -> OpGroupNonUniformQuadSwap(
/*Execution*/Subgroup,
/*Value*/value,
/*Direction*/0)
subgroupQuadSwapVertical(value) -> OpGroupNonUniformQuadSwap(
/*Execution*/Subgroup,
/*Value*/value,
/*Direction*/1)
subgroupQuadSwapDiagonal(value) -> OpGroupNonUniformQuadSwap(
/*Execution*/Subgroup,
/*Value*/value,
/*Direction*/2)
Modifications to the OpenGL Shading Language Specification, Version 4.50
Including the following line in a shader can be used to control the
language features described in this extension:
#extension GL_KHR_shader_subgroup_basic : <behavior>
#extension GL_KHR_shader_subgroup_vote : <behavior>
#extension GL_KHR_shader_subgroup_arithmetic : <behavior>
#extension GL_KHR_shader_subgroup_ballot : <behavior>
#extension GL_KHR_shader_subgroup_shuffle : <behavior>
#extension GL_KHR_shader_subgroup_shuffle_relative : <behavior>
#extension GL_KHR_shader_subgroup_clustered : <behavior>
#extension GL_KHR_shader_subgroup_quad : <behavior>
where <behavior> is as specified in section 3.3. If any of
GL_KHR_shader_subgroup_vote, GL_KHR_shader_subgroup_arithmetic,
GL_KHR_shader_subgroup_ballot, GL_KHR_shader_subgroup_shuffle,
GL_KHR_shader_subgroup_shuffle_relative, GL_KHR_shader_subgroup_clustered,
or GL_KHR_shader_subgroup_quad extension are enabled, the
GL_KHR_shader_subgroup_basic extension is also implicitly enabled.
New preprocessor #defines are added:
#define GL_KHR_shader_subgroup_basic 1
#define GL_KHR_shader_subgroup_vote 1
#define GL_KHR_shader_subgroup_arithmetic 1
#define GL_KHR_shader_subgroup_ballot 1
#define GL_KHR_shader_subgroup_shuffle 1
#define GL_KHR_shader_subgroup_shuffle_relative 1
#define GL_KHR_shader_subgroup_clustered 1
#define GL_KHR_shader_subgroup_quad 1
Such that if using a GL_KHR_shader_subgroup_* extension is supported, the
corresponding GL_KHR_shader_subgroup_* #define is defined.
Additions to Chapter 3 of the OpenGL Shading Language Specification
(Basics)
Modify Section 3.8, Definitions
(Add a new subsection to the end of this section)
Subgroup
A subgroup is a set of invocations exposed as running concurrently with
the current shader invocation. The number of invocations within a
subgroup (the size of the subgroup) is a fixed property of the device.
In compute shaders, the local workgroup is a superset of the subgroup.
Within any given subgroup, an invocation may be active or inactive.
The following are cases where this state may change:
- For N active invocations within a subgroup that encounter the same
dynamic instance of non-uniform control flow, there will be [0..N]
active invocations within the control flow as some invocations can
diverge. When the corresponding reconvergence of the dynamic instance
of the non-uniform control flow occurs, N active invocations will
reconverge.
- In graphics shaders, invocations may be inactive within a subgroup
if the device was unable to fully populate a subgroup prior to
beginning execution of that group of invocations. Behavior is
implementation dependent. For example, when rendering a
full-viewport triangle, in a viewport which is not aligned and sized
such that the device can maintain fully packed subgroups for the full
draw, invocations within a subgroup could be inactive.
- In a compute shader, invocations may be inactive within a subgroup
if the local workgroup size is not a multiple of the subgroup size.
Helper invocations participate in subgroup operations but, for operations
other than subgroupQuad operations, they may be treated as inactive even
if they would be considered otherwise active.
For each active invocation within a subgroup that reaches the same
dynamic instance of a subgroup built-in function, all active invocations
within a subgroup must execute the dynamic instance of the function
before any invocation can proceed.
The subgroup memory barrier built-in functions can be used to order
reads and writes to variables stored in memory accessible to other
shader invocations within a subgroup. When called, these functions will
wait for the completion of all reads and writes previously performed by
the caller that access selected variable types, and then return with no
other effect. The built-in functions subgroupMemoryBarrierBuffer(),
subgroupMemoryBarrierShared(), and subgroupMemoryBarrierImage() wait for
the completion of accesses to buffer, shared, and image variables,
respectively. The built-in functions subgroupBarrier() and
subgroupMemoryBarrier() wait for the completion of accesses to all of
the above variable types. The function subgroupMemoryBarrierShared() is
available only in compute shaders; the other functions are available in
all shader types.
When the subgroup memory barrier built-in functions return, the results
of any memory stores performed using coherent variables performed prior
to the call will be visible to any future coherent access to the same
memory performed by any other shader invocation within the same
subgroup.
There are two classes of subgroup built-in functions that have common
properties - subgroupInclusive<op>() and subgroupExclusive<op>() where
<op> is one of: Add, Mul, Min, Max, And, Or, Xor.
These operations perform a scan operation across the active invocations
within a subgroup in linear order starting at the active invocation
with the lowest <gl_SubgroupInvocationID>, increasing to the active
invocation with the highest <gl_SubgroupInvocationID>.
genType subgroupInclusive<op>(genType value);
genIType subgroupInclusive<op>(genIType value);
genUType subgroupInclusive<op>(genUType value);
The inclusive scan operations are defined, over the set of n active
invocations within a subgroup, to return [x(0), x(0) <op> x(1), ...,
x(0) <op> x(1) <op> ... <op> x(n-1)], where x(i) is the <value> in the
i'th active invocation.
genType subgroupExclusive<op>(genType value);
genIType subgroupExclusive<op>(genIType value);
genUType subgroupExclusive<op>(genUType value);
The exclusive scan operations are defined, over the set of n active
invocations within a subgroup, to return [I(), x(0), x(0) <op> x(1),
..., x(0) <op> x(1) <op> ... <op> x(n-2)], where x(i) is the <value> in
the i'th active invocation. I() is an identity function taken from the
following table:
<op> | type | I()
--------------------------
Add | genType | +0.0
Add | genDType | +0.0
Add | genIType | 0
Add | genUType | 0
Mul | genType | 1.0
Mul | genDType | 1.0
Mul | genIType | 1
Mul | genUType | 1
Min | genType | +INF
Min | genDType | +INF
Min | genIType | INT_MAX
Min | genUType | UINT_MAX
Max | genType | -INF
Max | genDType | -INF
Max | genIType | INT_MIN
Max | genUType | 0
And | genIType | ~0
And | genUType | ~0
And | genBType | true
Or | genIType | 0
Or | genUType | 0
Or | genBType | false
Xor | genIType | 0
Xor | genUType | 0
Xor | genBType | false
For the uvec4 as used in subgroupBallot(), subgroupInverseBallot(),
subgroupBallotBitExtract(), subgroupBallotBitCount(),
subgroupBallotInclusiveBitCount(), subgroupBallotExclusiveBitCount(),
subgroupBallotFindLSB(), and subgroupBallotFindMSB() the following
properties hold:
- Bits are packed such that the first invocation is represented in bit
0 of the first vector component, and the last (up to
<gl_SubgroupSize>) is the highest bit number in the last vector
component needed to represent all bits for the total number of
subgroup invocations.
- Bits that are beyond the highest bit number in the last vector
component needed to represent all bits for the total number of
subgroup invocations are ignored.
There is a class of subgroup built-in operations of the form
subgroupClustered<op>(), where <op> is one of: Add, Mul, Min, Max, And,
Or, Xor. These built-in operations perform a clustered reduction
operation on the invocations within a subgroup, such that the <op> is
calculated on N clusters of invocations within a subgroup. For example,
assume we have a shader such that gl_SubgroupSize is 8, and uses the
following GLSL:
float value = ...; // unique for each subgroup invocation
float result = subgroupClusteredAdd(value, 2);
Where the cluster size (the second parameter to subgroupClusteredAdd())
is 2, and each of our 8 invocations is active within the subgroup.
For each subgroup invocation in the set
[x(0), x(1), x(2), x(3), x(4), x(5), x(6), x(7)], the float <value> is
[42.0, 13.0, -56.0, 0.0, 128.0, -1.0, 7.0, 3.5]. The
subgroupClusteredAdd() operation will produce the float <result>
[55.0, 55.0, -56.0, -56.0, 127.0, 127.0, 10.5, 10.5].
A cluster as used by a clustered operation is defined such that for all
invocations within the cluster, their <gl_SubgroupInvocationID> is in
[x, x+1, x+2, ..., x+n-1], where n is the cluster size, and x is a
multiple of n.
The <clusterSize> as used in the subgroupClustered<op>() operations must
be:
- An integral constant expression.
- At least 1.
- A power of 2.
Undefined behavior will occur if a subgroupClustered<op>() operation is
executed with a <clusterSize> that is greater than <gl_SubgroupSize>.
The subgroup built-in operations subgroupQuadBroadcast(),
subgroupQuadSwapHorizontal(), subgroupQuadSwapVertical(), and
subgroupQuadSwapDiagonal() operate on clusters of 4 invocations called
a quad. These built-in operations allow for sharing of data efficiently
within each quad.
In fragment shaders, this quad corresponds to 4 pixels arranged in a 2x2
grid:
0 | 1
--|--
2 | 3
such that:
- 0th index corresponds to a pixel with a coordinate of (x, y)
- 1st index corresponds to a pixel with a coordinate of (x + 1, y)
- 2nd index corresponds to a pixel with a coordinate of (x, y + 1)
- 3rd index corresponds to a pixel with a coordinate of (x + 1, y + 1)
If a primitive covers a fragment at (x, y), its fragment shader
invocation will be in a quad with fragment shader invocations
corresponding to the three neighboring pixels at (x + 1, y), (x, y + 1),
and (x + 1, y + 1). These four invocations are arranged in a 2x2 grid,
that make up the quad. If the neighbors of a fragment are not covered
by the primitive, helper fragment shader invocations will still be
generated.
Note: in non-fragment shaders, the quad has no defined mapping to
non-subgroup shader stage state.
Subgroup built-in operations that perform minimum or maximum operations
have the following properties:
- Any operation performed on the <value>s provided by active
invocations within a subgroup, if <value> is of a vector type, the
operation is performed component-wise across the vector.
- From the set of <value>s provided by active invocations within a
subgroup, if for any two <value>s of them is a NaN, the other is
chosen. If all <value>s that are used by the current invocation are
NaN, then the result is undefined.
Additions to Chapter 7 of the OpenGL Shading Language Specification
(Built-in Variables)
Modify Section 7.1, Built-in Languages Variable
(Add to the list of built-in variables for the compute languages)
highp in uint gl_NumSubgroups;
highp in uint gl_SubgroupID;
(Add to the list of built-in variables for the compute, vertex, geometry,
tessellation control, tessellation evaluation, and fragment languages)
mediump in uint gl_SubgroupSize;
mediump in uint gl_SubgroupInvocationID;
highp in uvec4 gl_SubgroupEqMask;
highp in uvec4 gl_SubgroupGeMask;
highp in uvec4 gl_SubgroupGtMask;
highp in uvec4 gl_SubgroupLeMask;
highp in uvec4 gl_SubgroupLtMask;
(Add those paragraphs at the end of this section)
If the extension GL_KHR_shader_subgroup_basic is enabled, the variable
<gl_NumSubgroups> is a compute-shader built-in containing the number of
subgroups within the local workgroup. The value of this variable is at
least 1, and is uniform across the invocation group.
If the extension GL_KHR_shader_subgroup_basic is enabled, the variable
<gl_SubgroupID> is a compute-shader built-in containing the index of the
subgroup within the local workgroup. The value of this variable is in the
range 0 to <gl_NumSubgroups>-1.
If the extension GL_KHR_shader_subgroup_basic is enabled, the variable
<gl_SubgroupSize> is the number of invocations within a subgroup, and its
value is always a power of 2. The maximum <gl_SubgroupSize> supported by
the GL_KHR_shader_subgroup_basic extension is 128.
If the extension GL_KHR_shader_subgroup_basic is enabled, the variable
<gl_SubgroupInvocationID> is a built-in containing the index of an
invocation within a subgroup. The value of this variable is in the range
0 to <gl_SubgroupSize>-1.
If the extension GL_KHR_shader_subgroup_ballot is enabled, the
<gl_Subgroup??Mask> variables are built-ins that provide a bitmask of all
invocations, with one bit per invocation. Bit 0 of the first vector
component represents the first invocation, higher-order bits within a
component and higher component numbers both represent, in order, higher
invocations, and the last invocation is the highest-order bit needed, in the
last component needed, to contiguously represent all bits of the invocations
in a subgroup. These variables are defined according to the following
table:
variable | equation for bit values
------------------|-------------------------------------
gl_SubgroupEqMask | bit index == gl_SubgroupInvocationID
gl_SubgroupGeMask | bit index >= gl_SubgroupInvocationID
gl_SubgroupGtMask | bit index > gl_SubgroupInvocationID
gl_SubgroupLeMask | bit index <= gl_SubgroupInvocationID
gl_SubgroupLtMask | bit index < gl_SubgroupInvocationID
Additions to Chapter 8 of the OpenGL Shading Language Specification
(Built-in Functions)
Add Section 8.18, Shader Invocation Group Functions
Syntax:
void subgroupBarrier(void);
Only usable if the extension GL_KHR_shader_subgroup_basic is enabled.
The function subgroupBarrier() enforces that all active invocations within a
subgroup must execute this function before any are allowed to continue their
execution, and the results of any memory stores performed using coherent
variables performed prior to the call will be visible to any future
coherent access to the same memory performed by any other shader invocation
within the same subgroup.
Syntax:
void subgroupMemoryBarrier(void);
Only usable if the extension GL_KHR_shader_subgroup_basic is enabled.
The function subgroupMemoryBarrier() enforces the ordering of all memory
transactions issued within a single shader invocation, as viewed by other
invocations in the same subgroup.
Syntax:
void subgroupMemoryBarrierBuffer(void);
Only usable if the extension GL_KHR_shader_subgroup_basic is enabled.
The function subgroupMemoryBarrierBuffer() enforces the ordering of all
memory transactions to buffer variables issued within a single shader
invocation, as viewed by other invocations in the same subgroup.
Syntax:
void subgroupMemoryBarrierShared(void);
Only usable if the extension GL_KHR_shader_subgroup_basic is enabled.
The function subgroupMemoryBarrierShared() enforces the ordering of all
memory transactions to shared variables issued within a single shader
invocation, as viewed by other invocations in the same subgroup.
Only available in compute shaders.
Syntax:
void subgroupMemoryBarrierImage(void);
Only usable if the extension GL_KHR_shader_subgroup_basic is enabled.
The function subgroupMemoryBarrierImage() enforces the ordering of all
memory transactions to images issued within a single shader invocation, as
viewed by other invocations in the same subgroup.
Syntax:
bool subgroupElect(void);
Only usable if the extension GL_KHR_shader_subgroup_basic is enabled.
The function subgroupElect() returns true for exactly one invocation out of
the set of active invocations that execute a dynamic instance of this
instruction. All other active invocations will return false. The
invocation chosen is the active invocation with the lowest
<gl_SubgroupInvocationID>.
Syntax:
bool subgroupAll(bool value);
Only usable if the extension GL_KHR_shader_subgroup_vote is enabled.
The function subgroupAll() returns true if for all active invocations
<value> evaluates to true.
Syntax:
bool subgroupAny(bool value);
Only usable if the extension GL_KHR_shader_subgroup_vote is enabled.
The function subgroupAny() returns true if for any active invocation its
<value> evaluates to true.
Syntax:
bool subgroupAllEqual(genType value);
bool subgroupAllEqual(genIType value);
bool subgroupAllEqual(genUType value);
bool subgroupAllEqual(genBType value);
bool subgroupAllEqual(genDType value);
Only usable if the extension GL_KHR_shader_subgroup_vote is enabled.
The function subgroupAllEqual() returns true if <value> for all active
invocations is equal across the subgroup.
Syntax:
genType subgroupBroadcast(genType value, uint id);
genIType subgroupBroadcast(genIType value, uint id);
genUType subgroupBroadcast(genUType value, uint id);
genBType subgroupBroadcast(genBType value, uint id);
genDType subgroupBroadcast(genDType value, uint id);
Only usable if the extension GL_KHR_shader_subgroup_ballot is enabled.
The function subgroupBroadcast() returns the <value> from the invocation
whose <gl_SubgroupInvocationID> is equal to <id>. <id> must be an integral
constant expression when targeting SPIR-V 1.4 and below, otherwise it must
be dynamically uniform within the subgroup. If the <id> is an inactive
invocation or is greater than or equal to <gl_SubgroupSize>, an undefined
value is returned.
Syntax:
genType subgroupBroadcastFirst(genType value);
genIType subgroupBroadcastFirst(genIType value);
genUType subgroupBroadcastFirst(genUType value);
genBType subgroupBroadcastFirst(genBType value);
genDType subgroupBroadcastFirst(genDType value);
Only usable if the extension GL_KHR_shader_subgroup_ballot is enabled.
The function subgroupBroadcastFirst() returns the <value> from the active
invocation with the lowest <gl_SubgroupInvocationID>.
Syntax:
uvec4 subgroupBallot(bool value);
Only usable if the extension GL_KHR_shader_subgroup_ballot is enabled.
The function subgroupBallot() returns a set of bitfields containing the
result of evaluating the expression <value> in all active invocations in the
subgroup. If <value> evaluates to true for an active invocation then the
bit corresponding to the <gl_SubgroupInvocationID> for the invocation is
set to one in the result, otherwise the bit is set to zero. Bits
corresponding to inactive invocations are set to zero. The following
assumptions can be made:
- a call to subgroupBallot() with a <value> such that for all active
invocation <value>s evaluates to true, will return a set of bitfields
where the corresponding bits are set for only the active invocations
in the subgroup.
- a call to subgroupBallot() with a <value> such that for all active
invocation <value>s evaluates to false, will return zero in each
component of the return.
Syntax:
bool subgroupInverseBallot(uvec4 value);
Only usable if the extension GL_KHR_shader_subgroup_ballot is enabled.
The function subgroupInverseBallot() returns a bool that is true if the bit
in <value> that corresponds to the current invocation's
<gl_SubgroupInvocationID> in <value> is true. All active invocations must
call subgroupInverseBallot() with the same <value>.
Syntax:
bool subgroupBallotBitExtract(uvec4 value, uint index);
Only usable if the extension GL_KHR_shader_subgroup_ballot is enabled.
The function subgroupBallotBitExtract() returns a bool that is true if the
bit in <value> that corresponds to <index> (where <index> begins at bit 0 of
the first vector component) is 1, and false otherwise. If <index> is
greater than or equal to <gl_SubgroupSize>, an undefined result is returned.
This is useful in conjunction with subgroupBallot().
Syntax:
uint subgroupBallotBitCount(uvec4 value);
Only usable if the extension GL_KHR_shader_subgroup_ballot is enabled.
The function subgroupBallotBitCount() returns the number of bits that are
set to 1 in the bits used to hold the subgroup invocations of <value>.
The bits are counted across the components of <value>. This is useful in
conjunction with subgroupBallot() to get the number of active invocations
that contributed a true value.
Syntax:
uint subgroupBallotInclusiveBitCount(uvec4 value);
Only usable if the extension GL_KHR_shader_subgroup_ballot is enabled.
The function subgroupBallotInclusiveBitCount() returns the number of bits
that are set to 1 in the ballot value for subgroup invocations with a lower,
or equal to, <gl_SubgroupInvocationID>. The bits are inclusively counted
across the components of <value>. This is useful in conjunction with
subgroupBallot().
Syntax:
uint subgroupBallotExclusiveBitCount(uvec4 value);
Only usable if the extension GL_KHR_shader_subgroup_ballot is enabled.
The function subgroupBallotExclusiveBitCount() returns the number of bits
that are set to 1 in the ballot value for subgroup invocations with a lower
<gl_SubgroupInvocationID>. The bits are exclusively counted across the
components of <value>. This is useful in conjunction with subgroupBallot().
Syntax:
uint subgroupBallotFindLSB(uvec4 value);
Only usable if the extension GL_KHR_shader_subgroup_ballot is enabled.
The function subgroupBallotFindLSB() returns the bit number of the least
significant bit set to 1 in the bits used to hold the subgroup invocations
of <value>. If <value> is 0, an undefined value is returned. This is
useful in conjunction with subgroupBallot().
Syntax:
uint subgroupBallotFindMSB(uvec4 value);
Only usable if the extension GL_KHR_shader_subgroup_ballot is enabled.
The function subgroupBallotFindMSB() returns the bit number of the most
significant bit set to 1 in the bits used to hold the subgroup invocations
of <value>. If <value> is 0, an undefined value is returned. This is
useful in conjunction with subgroupBallot().
Syntax:
genType subgroupShuffle(genType value, uint id);
genIType subgroupShuffle(genIType value, uint id);
genUType subgroupShuffle(genUType value, uint id);
genBType subgroupShuffle(genBType value, uint id);
genDType subgroupShuffle(genDType value, uint id);
Only usable if the extension GL_KHR_shader_subgroup_shuffle is enabled.
The function subgroupShuffle() returns the <value> whose
<gl_SubgroupInvocationID> is equal to <id>. If the <id> is an
inactive invocation or is greater than or equal to <gl_SubgroupSize>, an
undefined value is returned.
Syntax:
genType subgroupShuffleXor(genType value, uint mask);
genIType subgroupShuffleXor(genIType value, uint mask);
genUType subgroupShuffleXor(genUType value, uint mask);
genBType subgroupShuffleXor(genBType value, uint mask);
genDType subgroupShuffleXor(genDType value, uint mask);
Only usable if the extension GL_KHR_shader_subgroup_shuffle is enabled.
The function subgroupShuffleXor() returns the <value> whose
<gl_SubgroupInvocationID> is equal to the current invocation's
<gl_SubgroupInvocationID> xored with <mask>. If the calculated index is
an inactive invocation or is greater than or equal to <gl_SubgroupSize>, an
undefined value is returned.
Syntax:
genType subgroupShuffleUp(genType value, uint delta);
genIType subgroupShuffleUp(genIType value, uint delta);
genUType subgroupShuffleUp(genUType value, uint delta);
genBType subgroupShuffleUp(genBType value, uint delta);
genDType subgroupShuffleUp(genDType value, uint delta);
Only usable if the extension GL_KHR_shader_subgroup_shuffle_relative is
enabled.
The function subgroupShuffleUp() returns the <value> whose
<gl_SubgroupInvocationID> is equal to this invocation's
<gl_SubgroupInvocationID> minus <delta>. If <gl_SubgroupInvocationID> minus
<delta> is an inactive invocation or is less than zero, an undefined value
is returned.
Syntax:
genType subgroupShuffleDown(genType value, uint delta);
genIType subgroupShuffleDown(genIType value, uint delta);
genUType subgroupShuffleDown(genUType value, uint delta);
genBType subgroupShuffleDown(genBType value, uint delta);
genDType subgroupShuffleDown(genDType value, uint delta);
Only usable if the extension GL_KHR_shader_subgroup_shuffle_relative is
enabled.
The function subgroupShuffleDown() returns the <value> whose
<gl_SubgroupInvocationID> is equal to this invocation's
<gl_SubgroupInvocationID> plus <delta>. If <gl_SubgroupInvocationID> plus
<delta> is an inactive invocation or is greater than or equal to
<gl_SubgroupSize>, an undefined value is returned.
Syntax:
genType subgroupAdd(genType value);
genIType subgroupAdd(genIType value);
genUType subgroupAdd(genUType value);
genDType subgroupAdd(genDType value);
Only usable if the extension GL_KHR_shader_subgroup_arithmetic is enabled.
The function subgroupAdd() returns the summation of all active invocation