-
Notifications
You must be signed in to change notification settings - Fork 1
/
transform.html
152 lines (128 loc) · 4.32 KB
/
transform.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
<!doctype html>
<html>
<meta charset='utf-8'>
<title>transform results</title>
<script src='https://cdn.plot.ly/plotly-latest.min.js'></script>
<script type='text/javascript' src='scripts/bench.js'></script>
</html>
<body>
<h1> transform </h1>
So far only have an inplace version. <br />
My benchmark: double every element `x = x + x`. <br />
`std::transform` is consistently vectorized by clang (<a
href=https://gcc.godbolt.org/z/MrKGEG>godbolt</a>) so no purely scalar
baseline. <br />
Clang peels loop while I don't.<br />
Clang also uses unaligned loads and stores, even for an inplace version.
<br />
I experminented with different more fancy iteration patterns - didn't have
much success: <a
href="https://stackoverflow.com/a/62492369/5021064">StackOverflow</a>.<br />
<h2> 40 bytes summary </h2>
For 40 bytes of data cannot beat loop peeling for int. <br />
2 times win for char, nothing for short loose about 30% for int. <br />
Code alignment is a pain for both me and standard but for different types,
<br />
chars for me - 1.6 times, shorts - 2 times for standard. <br />
<h3> 40 bytes, data</h3>
<div id='40_bytes_data'></div>
<script>dynamicEntryPoint('40_bytes_data', {
name: 'inplace transform',
size: 40,
algorithm: 'selection',
type: 'x',
time: 'y',
padding: 'min',
group: 'intel_9700K_avx2_bmi',
percentage: 100,
}, ['transform<256, 4>', 'std::transform']);
</script>
<h3> 40 bytes, code alignment</h3>
<div id='40_bytes_data_code_alignment'></div>
<script>dynamicEntryPoint('40_bytes_data_code_alignment', {
name: 'inplace transform',
size: 40,
algorithm: 'selection',
type: 'selection',
time: 'y',
padding: 'minmax',
group: 'intel_9700K_avx2_bmi',
percentage: 100,
}, ['transform<256, 4>', 'std::transform']);
</script>
<h2> 1000 bytes summary </h2>
Not peeling is a good win on a 1000 bytes. 4 times chars, 2 times shorts, 1.5
ints. <br />
Code alignment for unsq_eve is about 20% for chars and shorts. <br />
For std ints misbehavae, showing abou 1.5 swings. <br />
<h3> 1000 bytes, data</h3>
<div id='1000_bytes_data'></div>
<script>dynamicEntryPoint('1000_bytes_data', {
name: 'inplace transform',
size: 1000,
algorithm: 'selection',
type: 'x',
time: 'y',
padding: 'min',
group: 'intel_9700K_avx2_bmi',
percentage: 100,
}, ['transform<256, 4>', 'std::transform']);
</script>
<h3> 1000 bytes, code alignment</h3>
<div id='1000_bytes_data_code_alignment'></div>
<script>dynamicEntryPoint('1000_bytes_data_code_alignment', {
name: 'inplace transform',
size: 1000,
algorithm: 'selection',
type: 'selection',
time: 'y',
padding: 'minmax',
group: 'intel_9700K_avx2_bmi',
percentage: 100,
}, ['transform<256, 4>', 'std::transform']);
</script>
<h2> 10'000 bytes summary </h2>
10'000 bytes behaves roughly identical. <br />
std suffers from code alignment issues - 1.7 times swings. <br />
I suspect unaligned loads/stores (see again <a
href="https://stackoverflow.com/a/62492369/5021064">StackOverflow</a>).<br />
<h3> 10'000 bytes, data</h3>
<div id='10000_bytes_data'></div>
<script>dynamicEntryPoint('10000_bytes_data', {
name: 'inplace transform',
size: 10000,
algorithm: 'selection',
type: 'x',
time: 'y',
padding: 'min',
group: 'intel_9700K_avx2_bmi',
percentage: 100,
}, ['transform<256, 4>', 'std::transform']);
</script>
<h3> 10'000 bytes, code alignment</h3>
<div id='10000_bytes_data_code_alignment'></div>
<script>dynamicEntryPoint('10000_bytes_data_code_alignment', {
name: 'inplace transform',
size: 10000,
algorithm: 'selection',
type: 'selection',
time: 'y',
padding: 'minmax',
group: 'intel_9700K_avx2_bmi',
percentage: 100,
}, ['transform<256, 4>', 'std::transform']);
</script>
<h2> Total benchmark </h2>
<div id='total_benchmark'></div>
<script>dynamicEntryPoint('total_benchmark', {
name: 'inplace transform',
size: 'x',
algorithm: 'selection',
type: 'selection',
time: 'y',
padding: 'min',
group: 'intel_9700K_avx2_bmi',
percentage: 100,
}, ['transform']);
</script>
</body>