-
Notifications
You must be signed in to change notification settings - Fork 3
/
ReleaseNotes
361 lines (273 loc) · 14.4 KB
/
ReleaseNotes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
OpenOnload-201502-u1 Release Notes
==================================
This is a minor update release that fixes a number of issues in
OpenOnload-201502.
Below is a brief summary of any limitations that we are aware of. See
the ChangeLog for further details and a list of bugs fixed.
The Release Notes for OpenOnload-201502 are included at the end of
this file.
Performance Regression
----------------------
Final testing of the OpenOnload-201502 release revealed a performance
regression, highlighted in the release notes at the time, of around
50ns.
This update release includes changes that recover this, but during
investigation we discovered a number of other relevant factors, some
of which can potentially be used to demonstrate a significant
performance improvement:
1) We saw variation between different Linux distributions, the
majority of which can be attributed to the different compiler versions
used on these. The gcc-4.8.2 based compiler included with RHEL7 for
example showed in some cases improvement in latency (tested with
sfnt-pingpong) of around 50ns compared to the gcc-4.4 based compiler
distributed with RHEL6.
2) Possibly related to (1) we found that small code changes in the
history of OpenOnload that should have no effect on the
performance (well away from the critical path) could be shown to have
an effect, both positive and negative. This is thought to be due to
changes in code and data locality and cache performance. We have
ongoing work to understand this in more detail and develop it into
further performance improvements.
3) For each of the above the magnitude of the change was test and
traffic dependent. For example, some historic code changes would show
a TCP improvement but a UDP regression, and vice versa, and the extent
of these would depend on the compiler and OS used. This suggests that
our benchmarks may not 100% accurately predict the performance of your
applications (this is hopefully not surprising, benchmarks can never
be perfect), and that your mileage may vary.
4) Some new firmware features can add noticeable latency when using
the "full featured" firmware variant. In particular the use of the
vswitch in the NIC (which is enabled when using multiple VFs, and can
be forced off using the num_vfs=0 sfc module parameter) and multicast
chaining filters. For absolute best latency we continue to recommend
using the "low latency" firmware variant.
Scalable epoll mode
-------------------
The OpenOnload-201502 release added a new scalable epoll mode
(EF_UL_EPOLL=3) - for further details see the 201502 release notes
below. An issue in the way this new mode was enabled in that release
meant that it was not being used even when selected. This has been
fixed in the update release.
OpenOnload-201502 Release Notes
===============================
This is a major update release that adds new features to OpenOnload.
Below is a brief summary of the new features and any limitations that
we are aware of. See the ChangeLog for further details and bugs
Ubuntu/Debian Support
---------------------
This release adds Ubuntu and Debian to the list of officially
supported operating systems. The currently supported versions are:
Debian: 6 and 7
Ubuntu: 12.04 LTS, 14.04 LTS and 14.10
As new releases are made, Onload will aim to support the most recent
two major releases of each.
KVM
---
This release adds support for running Onload in KVM guests which have
either had a VF or PF passed through to them. The Onload user guide
contains further details on how to configure and use Onload in a KVM
guest.
Docker containers
-----------------
This release adds support for running Onload in a docker container.
Please see the Onload user guide for details of how to install and use
Onload in a container. Onload does not support network namespaces
VF and Multi-PF Onload
----------------------
Onload can also be configured to run with VFs or multiple PFs on a
single port in a non-virtualised environment.
Socket Caching
--------------
This release adds support for caching of passively opened TCP sockets,
allowing improved connection accept rate. This feature can be
controlled using three new environement variables
(EF_SOCKET_CACHE_PORTS, EF_SOCKET_CACHE_MAX and
EF_PER_SOCKET_CACHE_MAX) that limit which ports and how many sockets
can be cached.
There are some restrictions:
- socket caching is only supported if EF_UL_EPOLL=3 (see below);
- socket caching is not supported after fork();
- currently caching does not offer a benefit if a single socket accepts
connections on multiple local addresses;
- socket caching is only supported if EF_FDS_MT_SAFE=1;
- sockets that have been dup()ed will not be cached;
- sockets that use the O_ASYNC or O_APPEND modes will not be cached;
- for full benefit sockets that use O_NONBLOCK or O_CLOEXEC should be
accept()ed with those flags, rather than setting them later;
- allowing more sockets to be cached (via EF_SOCKET_CACHE_MAX) than
there are file descriptors available can result in drastically reduced
performance. It should be considered that the socket cache limit
applies per stack, unlike the per-process file descriptor limit.
Black/White listing of interfaces
---------------------------------
New module options and their associated sysfs nodes allow the user to
control through a whitelist and a blacklist which network interfaces
will be accelerated by Onload.
- Onload will ignore any network interfaces mentioned in the
intf_black_list module option.
- If the intf_white_list module option is non-empty, Onload will only
accelerate the network interfaces mentioned.
These can be updated at runtime, but the changes will only affect new
stacks. The configuration is a global setting and can not be modified
for individual stacks.
Scalable epoll mode
-------------------
This release adds a new epoll mode EF_UL_EPOLL=3. This mode is
accelerated, supports socket caching (see above), and is scalable in
the sense that the cost of the epoll_wait() is independent of the
number of accelerated FDs in the set and depends only on the number of
FDs that become ready.
EF_UL_EPOLL=1 remains the default as it has fewer restrictions, but
EF_UL_EPOLL=3 should be considered for your application as it can
offer better performance.
The restrictions of EF_UL_EPOLL=3 which mean it's not suitable in all
cases are as follows:
- It does not support monitoring the readiness of the epoll FD via a
second poll/epoll/select.
- It does not support epoll sets which exist across a fork().
For applications that require this functionality another epoll mode
should be selected.
Remote monitoring of Onload
---------------------------
As a preview of a future feature, we have included a server process
that will provide stackdump-like statistics and details of Onload's
internal state via JSON to remote hosts that connect to it.
The server is in <onload_install_dir>/src/tools/onload_remote_monitor/
There is an example client to demonstrate how to connect and parse the
JSON returned in <onload_install_dir>/src/tests/onload/onload_remote_monitor/
The current data format is for experimentation only and will change,
but we are very interested in how this may be useful to you, what data
you need to access, and how to integrate it into your existing
monitoring tools. Please contact [email protected] with
suggestions and comments.
Delegated Sends API
-------------------
This release adds a new API that allows the user to have Onload handle
the TCP socket state machine and perform critical sends through
another mechanism (such as ef_vi) to achieve lower latency.
Sockets are created through the sockets API exactly as normal. The
user can then request that Onload delegates sending to their
application. Once the application has completed a send through (for
example) ef_vi, it updates Onload and Onload will handle the TCP state
machinery, retransmissions, and so on.
There is a pair of example applications in
<onload_install_dir>/src/tests/ef_vi/efdelegated_[client,server].c
This API is intended to be used by servers that make sporadic TCP
sends on a socket rather than large amounts of bi-directional traffic.
It should be used carefully as there are small windows of time (while
the send has been delegated to the application) where either the
application or Onload could be using out of date sequence or
acknowledgement numbers. It has been designed such that this should
be harmless, but may still have the potential to confuse other TCP
implementations.
Per-queue drop counters API
---------------------------
ef_vi users can now access details of drops and their causes for each
RX queue using the new ef_vi_stats_query() function. The efsink
example application has been updated to do this.
Configuration options
---------------------
The following configuration options have been added. Those already
mentioned above have not been included here.
EF_TCP_SEND_NONBLOCK_NO_PACKETS_MODE
- This option controls how a non-blocking TCP send() call should
behave if it is unable to allocate sufficient packet buffers. By
default Onload will mimic Linux kernel stack behaviour and block
for packet buffers to be available. If set to 1, this option will
cause Onload to return error ENOBUFS. Note this option can cause
some applications (that assume that a socket that is writeable is
able to send without error) to malfunction.
EF_TCP_RCVBUF_STRICT
- This option prevents TCP small segment attack. With this option set,
Onload limits the number of packets inside TCP receive queue and
TCP reorder buffer. In some cases, this option causes performance
penalty. You probably want this option if your application is
connecting to unrtusted partner or over untrusted network.
Off by default.
EF_TCP_SNDBUF_ESTABLISHED_DEFAULT
- Default value for SO_SNDBUF for TCP sockets in the ESTABLISHED state.
This value is used when the TCP connection transitions to ESTABLISHED
state, to avoid confusion of some applications like netperf.
If the OS default SO_SNDBUF value is less then this, then this
value is used.
If the OS default SO_SNDBUF value is more that 4 * this, then
4 * this value is used.
This variable overrides OS default SO_SNDBUF value only, it does not
change SO_SNDBUF if the application explicitly sets it
(see EF_TCP_SNDBUF variable which overrides application-supplied value).
EF_TCP_RCVBUF_ESTABLISHED_DEFAULT
- Default value for SO_RCVBUF for TCP sockets in the ESTABLISHED state.
This value is used when the TCP connection transitions to ESTABLISHED
state, to avoid confusion of some applications like netperf.
If the OS default SO_RCVBUF value is less then this, then this
value is used.
If the OS default SO_RCVBUF value is more that 4 * this, then
4 * this value is used.
This variable overrides OS default SO_RCVBUF value only, it does not
change SO_RCVBUF if the application explicitly sets it
(see EF_TCP_RCVBUF variable which overrides application-supplied value).
EF_OFE_ENGINE_SIZE
- Size (in bytes) of Onload Filter Engine to be allocated when a new
stack is created.
EF_SO_BUSY_POLL_SPIN
- Spin poll,select and epoll in a Linux-like way: enable spinning only if
a spinning soclet is preset in the poll/select/epoll set. See Linux
documentation on SO_BUSY_POLL socket option for details.
You should also enable spinning via EF_{POLL,SELECT,EPOLL}_SPIN
variable if you'd like to spin in poll,select or epoll correspondingly.
The spin duration is set via EF_SPIN_USEC, which is equivalent
to the Linux sysctl.net.busy_poll value. EF_POLL_USEC is all-in-one
variable to set for all 4 variables mentioned here.
Linux never spins in epoll, but Onload does. This variable does not
affect epoll behaviour if EF_UL_EPOLL=2.
EF_SELECT_FAST_USEC
- When spinning in a select() call, causes accelerated sockets to be
polled for N usecs before unaccelerated sockets are polled. This
reduces latency for accelerated sockets, possibly at the expense of
latency on unaccelerated sockets. Since accelerated sockets are
typically the parts of the application which are most
performance-sensitive this is typically a good tradeoff.
EF_SELECT_NONBLOCK_FAST_USEC
- When invoking select() with timeout==0 (non-blocking), this option
causes non-accelerated sockets to be polled only every N usecs. This
reduces latency for accelerated sockets, possibly at the expense of
latency on unaccelerated sockets. Since accelerated sockets are
typically the parts of the application which are most
performance-sensitive this is often a good tradeoff.
Set this option to zero to disable, or to a higher value to further
improve latency for accelerated sockets.
This option changes the behaviour of select() calls, so could
potentially cause an application to misbehave.
ef_vi Documentation
-------------------
The ef_vi API now has some introductory documentation included in
doxygen format within the source code. To access this:
cd openonload-201502/src/include/etherfabric/doxygen
doxygen doxyfile_ef_vi
This will generate HTML output in the html subdirectory, and an RTF
document in the rtf subdirectory.
Known issues and limitations
----------------------------
- A test late in the release process showed a latency performance
regression of around 50ns, affecting both TCP and UDP and not
dependent on packet size. We are currently working to understand the
cause and will include a fix in the next update release.
- Use of the ef_vi layer on a 32 bit OS is disabled in this release
due to an issue discovered close to the release date. Please let
[email protected] know if this causes you problems.
- The semantics of EF_EPOLL_MT_SAFE have been clarified. This must
only be set to 1 if all operations on the set are concurrency safe.
The documentation now states explicitly that these operations include
both epoll calls, and modifications to the file descriptors contained
in the set (bind(), listen(), connect(), close()).
- AMD IOMMUs are not supported by Onload or the net driver included
in this release.
- RedHat have implemented an alternative to the bonding driver, which
is now included in distributions (not just RedHat based ones) with
recent (> 3.3) kernels. Onload does not recognise interfaces created
with teamd/libteam/teaming driver as being acceleratable, and
connections going over these interfaces will use the Linux kernel
stack rather than Onload. We are currently investigating how to add
support for this new teaming driver to Onload and hope to include it
in a future release. Until then we recommend using the traditional
bonding driver to create Onload accelerated bonds.