Unify Runner dynamic launching and reloading under RunnerList #7172

exekias · 2018-05-24T17:45:00Z

This PR unifies all uses of dynamic runner handling under a new RunnerList structure. It can receive a list of configs as the desired state and do all the needed changes (start/stopping) to transition to it.

I plan to use this as part of #7028

houndci-bot · 2018-05-24T17:45:15Z

libbeat/cfgfile/list.go

+	return ok
+}
+
+// Start the given runner and add it to the list


comment on exported method RunnerList.Add should be of the form "Add ..."

houndci-bot · 2018-05-24T17:45:15Z

libbeat/cfgfile/list.go

+	return nil
+}
+
+// StopAll runners


comment on exported method RunnerList.Stop should be of the form "Stop ..."

ruflin · 2018-05-25T12:44:00Z

libbeat/cfgfile/list.go

+	for hash, runner := range stopList {
+		debugf("Stopping runner: %s", runner)
+		delete(r.runners, hash)
+		go runner.Stop()


This is a change in behaviour from previous implementation where it was synchronous. This could become a problem in Filebeat or at least lead to more errors. Before a new prospector can be started on an existing state of a file, the state must be set to Finish. If the shutdown happens in go routine, this overlap can become bigger. But I think it should still work.

ruflin · 2018-05-25T12:46:14Z

libbeat/cfgfile/list.go

+func (r *RunnerList) Has(hash uint64) bool {
+	r.mutex.Lock()
+	defer r.mutex.Unlock()
+	_, ok := r.runners[hash]


A read write mutex could be used here.

ruflin · 2018-05-25T12:47:19Z

libbeat/cfgfile/registry_test.go

@@ -1,48 +0,0 @@
-package cfgfile


Any replacement for these tests?

I forgot to add the file to the commit 🤦‍♂️ pushed

ruflin · 2018-05-25T12:49:04Z

libbeat/cfgfile/reload.go

-
-				hash, err := hashstructure.Hash(rawCfg, nil)
-				if err != nil {
-					// Make sure the next run also updates because some runners were not properly loaded


Is this case still handled?

Nope :/ I'll have a look and find a solution

See 4ea21f2

ruflin · 2018-05-25T12:50:16Z

libbeat/cfgfile/list.go

+	for h, runner := range r.runners {
+		debugf("Stopping runner: %s", runner)
+		delete(r.runners, h)
+		runner.Stop()


Before runners were stopped in parallel. This is important to speed up the shutdown process especially with a large amount of modules or prospectors.

makes sense, will update the code to do that

houndci-bot · 2018-05-25T17:40:03Z

libbeat/cfgfile/list_test.go

+
+func (r *runnerFactory) Create(x beat.Pipeline, c *common.Config, meta *common.MapStrPointer) (Runner, error) {
+	config := struct {
+		Id int64 `config:"id"`


struct field Id should be ID

exekias · 2018-05-27T20:11:18Z

Thanks @ruflin for the thorough review, I think this is ready for a 2nd look :)

exekias · 2018-05-28T17:06:46Z

I've added 7c8ac78 after our last discussion, that should cover cases were a Filebeat input fails to start due to unclosed states (it will keep retrying every 10s). It also unifies how autodiscover uses the cfgfile.List.Reload

ruflin · 2018-05-28T19:51:58Z

The RACE detector does not seem to be happy on Jenkins

This change adds some overhead to autodiscover, as we need to hash all configs for each new event, but we obtain 2 great benefits: - We use the same code paths (Reload) for autodiscover and config reload. - Autodiscover becomes resilient to module/input initialization errors, specially in the case of Filebeat, failed starts will be retried.

exekias · 2018-05-29T09:05:35Z

fixed

ruflin · 2018-05-29T11:39:35Z

libbeat/autodiscover/autodiscover.go

+
+			err := a.runners.Reload(configs)
+
+			// On error, make sure the next run also updates because some runners were not properly loaded


Should we create a debug log entry for this err?

Reload process logs errors already:

https://github.com/elastic/beats/pull/7172/files/63c350a7ea841fd6f2b5bea7bea4debe1c0afdf5#diff-ab62770abd1b3493ace2c1a6c5426f0eR59

https://github.com/elastic/beats/pull/7172/files/63c350a7ea841fd6f2b5bea7bea4debe1c0afdf5#diff-ab62770abd1b3493ace2c1a6c5426f0eR84

As this is a list of errors, I think logging it's better managed there, wdyt?

ruflin · 2018-05-29T11:42:49Z

libbeat/autodiscover/autodiscover.go

-			go runner.Stop()
-			a.runners.Remove(hash)
+		if a.runners.Has(hash) {
+			delete(a.configs, hash)


Can only 1 go routine at the time access a.configs?

Where is the runner stopped now?

configs is only modified by the event loop (handleStart and handleStop), so there is only one goroutine using it. Start/Stop is now handled by runners.Reload: https://github.com/elastic/beats/pull/7172/files/63c350a7ea841fd6f2b5bea7bea4debe1c0afdf5#diff-3b0ae03a54b2e31c58f8dcf308a6a7b8R127. This is the same code path used by configuration reload mechanism

Introduced by elastic#7172, some inputs/modules may modify the passed configuration, this results on a effective change on the hash of the config. Before this change, the reload process was taking a running config as a not running config. Resulting on a stop/start after the first run. Example (with some added debugging): First run (add 1 config): ``` 2018-06-01T02:08:20.184+0200 DEBUG [autodiscover] cfgfile/list.go:53 Starting reload procedure, current runners: 0 2018-06-01T02:08:20.185+0200 INFO cfgfile/list.go:143 map[type:docker containers:map[ids:[56322b70c8a3712494e559381c7b8a6ce62a6495e33630628e6624b75b5a7505]]] 2018-06-01T02:08:20.185+0200 INFO cfgfile/list.go:144 Hash: %!s(uint64=12172188027786936243) 2018-06-01T02:08:20.185+0200 DEBUG [autodiscover] cfgfile/list.go:71 Start list: 1, Stop list: 0 ``` Second run (add another config, the first one gets restarted): ``` 2018-06-01T02:08:20.185+0200 DEBUG [autodiscover] cfgfile/list.go:53 Starting reload procedure, current runners: 1 2018-06-01T02:08:20.185+0200 INFO cfgfile/list.go:143 map[paths:[/var/lib/docker/containers/56322b70c8a3712494e559381c7b8a6ce62a6495e33630628e6624b75b5a7505/*.log] docker-json:map[stream:all partial:true] type:docker containers:map[ids:[56322b70c8a3712494e559381c7b8a6ce62a6495e33630628e6624b75b5a7505]]] 2018-06-01T02:08:20.185+0200 INFO cfgfile/list.go:144 Hash: %!s(uint64=12741034856879725532) 2018-06-01T02:08:20.185+0200 INFO cfgfile/list.go:143 map[type:docker containers:map[ids:[a626e25679abd2b9af161277f1beee96c1bba6b9412771d17da7ebfacca640a7]]] 2018-06-01T02:08:20.185+0200 INFO cfgfile/list.go:144 Hash: %!s(uint64=7080456881540055745) 2018-06-01T02:08:20.185+0200 DEBUG [autodiscover] cfgfile/list.go:71 Start list: 2, Stop list: 1 ```

Introduced by #7172, some inputs/modules may modify the passed configuration, this results on a effective change on the hash of the config. Before this change, the reload process was taking a running config as a not running config. Resulting on a stop/start after the first run. Example (with some added debugging): First run (add 1 config): ``` 2018-06-01T02:08:20.184+0200 DEBUG [autodiscover] cfgfile/list.go:53 Starting reload procedure, current runners: 0 2018-06-01T02:08:20.185+0200 INFO cfgfile/list.go:143 map[type:docker containers:map[ids:[56322b70c8a3712494e559381c7b8a6ce62a6495e33630628e6624b75b5a7505]]] 2018-06-01T02:08:20.185+0200 INFO cfgfile/list.go:144 Hash: %!s(uint64=12172188027786936243) 2018-06-01T02:08:20.185+0200 DEBUG [autodiscover] cfgfile/list.go:71 Start list: 1, Stop list: 0 ``` Second run (add another config, the first one gets restarted): ``` 2018-06-01T02:08:20.185+0200 DEBUG [autodiscover] cfgfile/list.go:53 Starting reload procedure, current runners: 1 2018-06-01T02:08:20.185+0200 INFO cfgfile/list.go:143 map[paths:[/var/lib/docker/containers/56322b70c8a3712494e559381c7b8a6ce62a6495e33630628e6624b75b5a7505/*.log] docker-json:map[stream:all partial:true] type:docker containers:map[ids:[56322b70c8a3712494e559381c7b8a6ce62a6495e33630628e6624b75b5a7505]]] 2018-06-01T02:08:20.185+0200 INFO cfgfile/list.go:144 Hash: %!s(uint64=12741034856879725532) 2018-06-01T02:08:20.185+0200 INFO cfgfile/list.go:143 map[type:docker containers:map[ids:[a626e25679abd2b9af161277f1beee96c1bba6b9412771d17da7ebfacca640a7]]] 2018-06-01T02:08:20.185+0200 INFO cfgfile/list.go:144 Hash: %!s(uint64=7080456881540055745) 2018-06-01T02:08:20.185+0200 DEBUG [autodiscover] cfgfile/list.go:71 Start list: 2, Stop list: 1 ```

exekias added in progress Pull request is currently in progress. libbeat labels May 24, 2018

houndci-bot reviewed May 24, 2018

View reviewed changes

exekias force-pushed the unify-reloadable-list branch 6 times, most recently from 26a84a5 to d975687 Compare May 25, 2018 10:42

exekias added review and removed in progress Pull request is currently in progress. labels May 25, 2018

ruflin reviewed May 25, 2018

View reviewed changes

houndci-bot reviewed May 25, 2018

View reviewed changes

exekias force-pushed the unify-reloadable-list branch 2 times, most recently from b6727b3 to 18c4989 Compare May 27, 2018 19:57

Carlos Pérez-Aradros Herce added 8 commits May 28, 2018 18:52

Unify Runner dynamic launching and reloading under RunnerList

5d78338

Add locking

7090362

Adapt tests

3996bee

Add missing test file

9f24927

Use RWMutex

88444be

paralelize runners stop

a58b789

Return list of errors during reload process

f5cef75

Reload files on error when applying new configs

dab38de

exekias force-pushed the unify-reloadable-list branch from 87e2dbd to 0d44b7e Compare May 28, 2018 16:53

exekias force-pushed the unify-reloadable-list branch 3 times, most recently from 5f9ef83 to 7c8ac78 Compare May 28, 2018 17:37

exekias force-pushed the unify-reloadable-list branch from 7c8ac78 to a31e2d6 Compare May 29, 2018 08:28

avoid unnecessary update call

63c350a

exekias force-pushed the unify-reloadable-list branch from 5d803c1 to 63c350a Compare May 29, 2018 11:18

ruflin reviewed May 29, 2018

View reviewed changes

ruflin merged commit aed5529 into elastic:master May 29, 2018

exekias mentioned this pull request Jun 1, 2018

Avoid unneeded runner restarts when using config reloading #7232

Merged

exekias mentioned this pull request Jun 15, 2018

Allow to specify auditd rules in separate files #7331

Merged

cmacknz mentioned this pull request Jul 11, 2022

fix race condition when stopping inputs filestream ID bookkeeper #32309

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify Runner dynamic launching and reloading under RunnerList #7172

Unify Runner dynamic launching and reloading under RunnerList #7172

exekias commented May 24, 2018

houndci-bot May 24, 2018

houndci-bot May 24, 2018

ruflin May 25, 2018

ruflin May 25, 2018

ruflin May 25, 2018

exekias May 25, 2018

ruflin May 25, 2018

exekias May 25, 2018

exekias May 27, 2018

ruflin May 25, 2018

exekias May 25, 2018

houndci-bot May 25, 2018

exekias commented May 27, 2018

exekias commented May 28, 2018 •

edited

Loading

ruflin commented May 28, 2018

exekias commented May 29, 2018

ruflin May 29, 2018

exekias May 29, 2018

ruflin May 29, 2018

exekias May 29, 2018


		err := a.runners.Reload(configs)

		// On error, make sure the next run also updates because some runners were not properly loaded

Unify Runner dynamic launching and reloading under RunnerList #7172

Unify Runner dynamic launching and reloading under RunnerList #7172

Conversation

exekias commented May 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

exekias commented May 27, 2018

exekias commented May 28, 2018 • edited Loading

ruflin commented May 28, 2018

exekias commented May 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

exekias commented May 28, 2018 •

edited

Loading