Re-initialise Netty worker group on plugin restart #289

praseodym · 2018-01-24T22:06:58Z

This allows the plugin to actually recover from exceptions after a
restart. It also has the side effect of providing nicer error messages
and clearer stack traces to the end user.

Closes #268:

...
[2018-01-24T23:02:17,166][INFO ][org.logstash.beats.Server] Starting server on port: 5044
[2018-01-24T23:02:23,386][ERROR][logstash.pipeline        ] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:main
  Plugin: <LogStash::Inputs::Beats port=>5044, id=>"930bf638e61e22156ab3a5029e0060c6affa3c8e14b1c217aa4bbfa3c896ec74", enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_a103340e-59bd-4d83-a089-fce6a9404307", enable_metric=>true, charset=>"UTF-8">, host=>"0.0.0.0", ssl=>false, ssl_verify_mode=>"none", include_codec_tag=>true, ssl_handshake_timeout=>10000, tls_min_version=>1, tls_max_version=>1.2, cipher_suites=>["TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384", "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384", "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256", "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256", "TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384", "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384", "TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256", "TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256"], client_inactivity_timeout=>60, executor_threads=>64>
  Error: Address already in use
  Exception: Java::JavaNet::BindException
  Stack: sun.nio.ch.Net.bind0(Native Method)
sun.nio.ch.Net.bind(sun/nio/ch/Net.java:433)
sun.nio.ch.Net.bind(sun/nio/ch/Net.java:425)
sun.nio.ch.ServerSocketChannelImpl.bind(sun/nio/ch/ServerSocketChannelImpl.java:223)
io.netty.channel.socket.nio.NioServerSocketChannel.doBind(io/netty/channel/socket/nio/NioServerSocketChannel.java:128)
io.netty.channel.AbstractChannel$AbstractUnsafe.bind(io/netty/channel/AbstractChannel.java:558)
io.netty.channel.DefaultChannelPipeline$HeadContext.bind(io/netty/channel/DefaultChannelPipeline.java:1283)
io.netty.channel.AbstractChannelHandlerContext.invokeBind(io/netty/channel/AbstractChannelHandlerContext.java:501)
io.netty.channel.AbstractChannelHandlerContext.bind(io/netty/channel/AbstractChannelHandlerContext.java:486)
io.netty.channel.DefaultChannelPipeline.bind(io/netty/channel/DefaultChannelPipeline.java:989)
io.netty.channel.AbstractChannel.bind(io/netty/channel/AbstractChannel.java:254)
io.netty.bootstrap.AbstractBootstrap$2.run(io/netty/bootstrap/AbstractBootstrap.java:364)
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(io/netty/util/concurrent/AbstractEventExecutor.java:163)
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(io/netty/util/concurrent/SingleThreadEventExecutor.java:403)
io.netty.channel.nio.NioEventLoop.run(io/netty/channel/nio/NioEventLoop.java:463)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(io/netty/util/concurrent/SingleThreadEventExecutor.java:858)
io.netty.util.concurrent.FastThreadLocalRunnable.run(io/netty/util/concurrent/FastThreadLocalRunnable.java:30)
java.lang.Thread.run(java/lang/Thread.java:748)
[2018-01-24T23:02:24,389][INFO ][org.logstash.beats.Server] Starting server on port: 5044
...

original-brownbear · 2018-01-31T06:51:45Z

src/main/java/org/logstash/beats/Server.java

    }

    public void enableSSL(SslSimpleBuilder builder) {
        sslBuilder = builder;
    }

    public Server listen() throws InterruptedException {
+        workGroup = new NioEventLoopGroup();


@praseodym maybe we should make sure that if workGroup is != null before this line, we will shutdown the "old" worker group first so we don't leak any theads here? I think it's not impossible to get to that situation since the reason we get to a reload is effectively always an Exception in the code that should shut this down right?

Good point, I've added a non-null check with worker group shutdown and a null check at plugin shutdown. Reloads will mostly be caused by failing to listen, which leaves the worker group in a dead state (which is why reloading never worked), so I presume that the non-null shutdown will be mostly a no-op.

This allows the plugin to actually recover from exceptions after a restart. It also has the side effect of providing nicer error messages and clearer stack traces to the end user.

praseodym · 2018-04-02T19:16:53Z

Rebased. Not sure why one of the Travis build jobs is failing.

praseodym · 2018-06-01T19:25:54Z

@original-brownbear could you review or merge this PR?

original-brownbear · 2018-06-02T09:06:24Z

@praseodym looks good, retriggered Travis and will merge if it goes green. Thanks!

original-brownbear · 2018-06-02T09:07:20Z

@robbavey actually now that you're the owner here, can you merge this? :) (don't wanna interfere if I shouldn't :P)

elasticsearch-bot · 2018-06-04T13:12:13Z

Rob Bavey merged this into the following branches!

Branch	Commits
master	`f853ce6`

robbavey · 2018-06-04T13:12:41Z

@praseodym LGTM - thanks for the contribution, and apologies for the delay.

original-brownbear · 2018-06-04T13:12:46Z

@robbavey thanks!

praseodym mentioned this pull request Jan 24, 2018

Add ability to detect pipeline failures elastic/logstash#9030

Open

jakelandis assigned robbavey Jan 30, 2018

original-brownbear self-assigned this Jan 31, 2018

original-brownbear reviewed Jan 31, 2018

View reviewed changes

praseodym force-pushed the reinitialise-worker-group-on-restart branch 2 times, most recently from 8cfe5bc to 5363d0d Compare January 31, 2018 20:27

praseodym mentioned this pull request Jan 31, 2018

Plugin restart after failure does not re-initialise Netty worker group logstash-plugins/logstash-input-tcp#107

Open

praseodym force-pushed the reinitialise-worker-group-on-restart branch from 5363d0d to f082e6d Compare March 13, 2018 11:35

praseodym force-pushed the reinitialise-worker-group-on-restart branch from f082e6d to deb8225 Compare April 2, 2018 16:43

Re-initialise Netty worker group on plugin restart

336c2f1

This allows the plugin to actually recover from exceptions after a restart. It also has the side effect of providing nicer error messages and clearer stack traces to the end user.

praseodym force-pushed the reinitialise-worker-group-on-restart branch from deb8225 to 336c2f1 Compare April 2, 2018 18:54

elasticsearch-bot closed this in f853ce6 Jun 4, 2018

praseodym deleted the reinitialise-worker-group-on-restart branch June 11, 2018 18:14

praseodym mentioned this pull request Jun 11, 2018

Need better error reporting when a port is already in use. #268

Closed

praseodym mentioned this pull request Jun 18, 2018

Beats plugin does not get restarted after unrecoverable error #170

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-initialise Netty worker group on plugin restart #289

Re-initialise Netty worker group on plugin restart #289

praseodym commented Jan 24, 2018

original-brownbear Jan 31, 2018

praseodym Jan 31, 2018 •

edited

Loading

praseodym commented Apr 2, 2018

praseodym commented Jun 1, 2018

original-brownbear commented Jun 2, 2018

original-brownbear commented Jun 2, 2018

elasticsearch-bot commented Jun 4, 2018

robbavey commented Jun 4, 2018

original-brownbear commented Jun 4, 2018

Re-initialise Netty worker group on plugin restart #289

Re-initialise Netty worker group on plugin restart #289

Conversation

praseodym commented Jan 24, 2018

original-brownbear Jan 31, 2018

Choose a reason for hiding this comment

praseodym Jan 31, 2018 • edited Loading

Choose a reason for hiding this comment

praseodym commented Apr 2, 2018

praseodym commented Jun 1, 2018

original-brownbear commented Jun 2, 2018

original-brownbear commented Jun 2, 2018

elasticsearch-bot commented Jun 4, 2018

robbavey commented Jun 4, 2018

original-brownbear commented Jun 4, 2018

praseodym Jan 31, 2018 •

edited

Loading