Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve simulation of bulk-indexing conflicts #477

Merged

Conversation

danielmitterdorfer
Copy link
Member

With this commit we introduce a new property conflict-probability for
the bulk-indexing parameter source. Previously we had a hard-codded
probability of 25% but now the user can control it.

We also use update now as bulk-indexing action when simulating a
conflict (previously the action was always index).

Closes #422

With this commit we introduce a new property `conflict-probability` for
the bulk-indexing parameter source. Previously we had a hard-codded
probability of 25% but now the user can control it.

We also use `update` now as bulk-indexing action when simulating a
conflict (previously the action was always `index`).

Closes elastic#422
@danielmitterdorfer danielmitterdorfer added enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. labels Apr 20, 2018
@danielmitterdorfer danielmitterdorfer added this to the 0.10.2 milestone Apr 20, 2018
Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

am_handler = params.GenerateActionMetaData("test_index", "test_type",
conflicting_ids=[100, 200, 300, 400],
conflict_probability=25,
rand=lambda: next(pseudo_random_conflicts),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever use of iter() and lambda 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, thanks :)

With this commit we add a new parameter `on-conflict` which allows users
to define whether the action-and-metadata line should use "index" or
"update" on simulated id conflicts.
@danielmitterdorfer
Copy link
Member Author

@dliappis I pushed another commit since your LGTM which adds a new parameter "on-conflict" that lets the user decide whether to use update or index in the action and meta-data line. The reason for this parameter is that we'd otherwise change the entire approach of handling id conflicts which leads to completely different results.

Can you please review that commit as well?

Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -104,6 +105,12 @@ def test_generate_action_meta_data_without_id_conflicts(self):
next(params.GenerateActionMetaData("test_index", "test_type")))

def test_generate_action_meta_data_with_id_conflicts(self):
def idx(id):
return "index", '{"index": {"_index": "test_index", "_type": "test_type", "_id": "%s"}}' % id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comment, could use .format() for consistency going forwards.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is on the hot-code path where we decided against #format().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh yes!

return "index", '{"index": {"_index": "test_index", "_type": "test_type", "_id": "%s"}}' % id

def conflict(action, id):
return action, '{"%s": {"_index": "test_index", "_type": "test_type", "_id": "%s"}}' % (action, id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, could use .format() here too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is on the hot-code path where we decided against #format().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh yes!

@danielmitterdorfer
Copy link
Member Author

Thanks for the review @dliappis!

@danielmitterdorfer danielmitterdorfer merged commit 581b7b1 into elastic:master Apr 25, 2018
@danielmitterdorfer danielmitterdorfer deleted the simulate-updates branch April 25, 2018 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. :Track Management New operations, changes in the track format, track download changes and the like
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants