Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default KafkaEx partitioner to correctly match the Java client #399

Merged
merged 3 commits into from
Apr 3, 2020

Conversation

mjparrott
Copy link
Contributor

One way of fixing #398

@dantswain
Copy link
Collaborator

This makes sense to me. @barthez did the original murmur implementation - do you have any sense if the original may have just been a typo?

I am a little concerned about breaking implementation for anyone who is already using the existing algorithm. Maybe we can provide a legacy partitioner?

If nothing else @mjparrott you'll need to update the test cases, which should be relatively straightforward.

@kennethito
Copy link
Contributor

I think providing a legacy partitioner and some breaking change documentation is a great idea.

@joshuawscott
Copy link
Member

joshuawscott commented Mar 5, 2020

I thought these tests had been copied from the Java tests, but it doesn't look like that's actually the case. Perhaps we should remove these test cases, and use the same ones that the Java implementation uses:
https://github.com/apache/kafka/blob/8ab0994919752cd4870e771221ba934a6a539a67/clients/src/test/java/org/apache/kafka/common/utils/UtilsTest.java#L66-L78

    @Test
    public void testMurmur2() {
        Map<byte[], Integer> cases = new java.util.HashMap<>();
        cases.put("21".getBytes(), -973932308);
        cases.put("foobar".getBytes(), -790332482);
        cases.put("a-little-bit-long-string".getBytes(), -985981536);
        cases.put("a-little-bit-longer-string".getBytes(), -1486304829);
        cases.put("lkjh234lh9fiuh90y23oiuhsafujhadof229phr9h19h89h8".getBytes(), -58897971);
        cases.put(new byte[]{'a', 'b', 'c'}, 479470107);

        for (Map.Entry<byte[], Integer> c : cases.entrySet()) {
            assertEquals(c.getValue().intValue(), murmur2(c.getKey()));
        }
    }

@jbruggem
Copy link
Collaborator

jbruggem commented Mar 9, 2020

I am a little concerned about breaking implementation for anyone who is already using the existing algorithm. Maybe we can provide a legacy partitioner?

Indeed, it's touchy. With kayrock we'll have a major release, so having a breaking change is OK, but we need:

  • clear documentation on the matter, in the documentation itself and in the release notes
  • a way to activate the old behaviour for those in need of it

@joshuawscott
Copy link
Member

Breaking changes are allowed in semver before 1.0, so I have no issue with breaking this. This is a bug in any case, the intention and documentation is that the partitioning matches the java client, so if it isn't doing that now, it's broken, and changing it to match is fine, IMO

@mjparrott
Copy link
Contributor Author

I added a LegacyPartitioner module which can be used in place of the DefaultPartitioner which implements the old behaviour. This modules are mostly just a copy-paste of each other. The only difference is the umurmur2 function which is called.

It looks like the one of the builds is failing due to the code copy-paste - any thoughts?

@joshuawscott
Copy link
Member

joshuawscott commented Mar 19, 2020

I'm ok if you put in a comment to disable that check in credo for that line or function
http://rrrene.org/2017/06/01/credo-config-comments/

@joshuawscott joshuawscott merged commit 9712e8e into kafkaex:master Apr 3, 2020
@joshuawscott joshuawscott mentioned this pull request Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants