Improve simulation of bulk-indexing conflicts #477

danielmitterdorfer · 2018-04-20T10:17:00Z

With this commit we introduce a new property conflict-probability for
the bulk-indexing parameter source. Previously we had a hard-codded
probability of 25% but now the user can control it.

We also use update now as bulk-indexing action when simulating a
conflict (previously the action was always index).

Closes #422

With this commit we introduce a new property `conflict-probability` for the bulk-indexing parameter source. Previously we had a hard-codded probability of 25% but now the user can control it. We also use `update` now as bulk-indexing action when simulating a conflict (previously the action was always `index`). Closes elastic#422

dliappis

LGTM

dliappis · 2018-04-22T10:45:57Z

tests/track/params_test.py

+        am_handler = params.GenerateActionMetaData("test_index", "test_type",
+                                                   conflicting_ids=[100, 200, 300, 400],
+                                                   conflict_probability=25,
+                                                   rand=lambda: next(pseudo_random_conflicts),


Clever use of iter() and lambda 👍

Hehe, thanks :)

With this commit we add a new parameter `on-conflict` which allows users to define whether the action-and-metadata line should use "index" or "update" on simulated id conflicts.

danielmitterdorfer · 2018-04-25T06:05:14Z

@dliappis I pushed another commit since your LGTM which adds a new parameter "on-conflict" that lets the user decide whether to use update or index in the action and meta-data line. The reason for this parameter is that we'd otherwise change the entire approach of handling id conflicts which leads to completely different results.

Can you please review that commit as well?

dliappis

LGTM

dliappis · 2018-04-25T06:50:20Z

tests/track/params_test.py

@@ -104,6 +105,12 @@ def test_generate_action_meta_data_without_id_conflicts(self):
                         next(params.GenerateActionMetaData("test_index", "test_type")))

    def test_generate_action_meta_data_with_id_conflicts(self):
+        def idx(id):
+            return "index", '{"index": {"_index": "test_index", "_type": "test_type", "_id": "%s"}}' % id


Very minor comment, could use .format() for consistency going forwards.

This is on the hot-code path where we decided against #format().

dliappis · 2018-04-25T06:50:37Z

tests/track/params_test.py

+            return "index", '{"index": {"_index": "test_index", "_type": "test_type", "_id": "%s"}}' % id
+
+        def conflict(action, id):
+            return action, '{"%s": {"_index": "test_index", "_type": "test_type", "_id": "%s"}}' % (action, id)


Likewise, could use .format() here too.

This is on the hot-code path where we decided against #format().

danielmitterdorfer · 2018-04-25T07:04:51Z

Thanks for the review @dliappis!

danielmitterdorfer added enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. labels Apr 20, 2018

danielmitterdorfer added this to the 0.10.2 milestone Apr 20, 2018

danielmitterdorfer requested a review from dliappis April 20, 2018 10:17

Merge branch 'master' into simulate-updates

2ce790f

dliappis approved these changes Apr 22, 2018

View reviewed changes

Allow user to define action on id conflicts

ee6fc45

With this commit we add a new parameter `on-conflict` which allows users to define whether the action-and-metadata line should use "index" or "update" on simulated id conflicts.

dliappis approved these changes Apr 25, 2018

View reviewed changes

danielmitterdorfer merged commit 581b7b1 into elastic:master Apr 25, 2018

danielmitterdorfer deleted the simulate-updates branch April 25, 2018 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve simulation of bulk-indexing conflicts #477

Improve simulation of bulk-indexing conflicts #477

danielmitterdorfer commented Apr 20, 2018

dliappis left a comment

dliappis Apr 22, 2018

danielmitterdorfer Apr 23, 2018

danielmitterdorfer commented Apr 25, 2018

dliappis left a comment

dliappis Apr 25, 2018

danielmitterdorfer Apr 25, 2018

dliappis Apr 25, 2018

dliappis Apr 25, 2018

danielmitterdorfer Apr 25, 2018

dliappis Apr 25, 2018

danielmitterdorfer commented Apr 25, 2018

Improve simulation of bulk-indexing conflicts #477

Improve simulation of bulk-indexing conflicts #477

Conversation

danielmitterdorfer commented Apr 20, 2018

dliappis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielmitterdorfer commented Apr 25, 2018

dliappis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielmitterdorfer commented Apr 25, 2018