Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterMembership Partition Pin/AutoIncrement Removal #583

Merged
merged 6 commits into from
Jul 22, 2020
Merged

ClusterMembership Partition Pin/AutoIncrement Removal #583

merged 6 commits into from
Jul 22, 2020

Conversation

shawnhathaway
Copy link
Contributor

Separated from Previous Schema Changes focused on Vitess Support

Changes:

  • Remove DB insertion order and page using HostIds. This allows the table to be sharded in an easier manner.
  • For now pin ClusterMembership to MembershipPartition to 0 to allow either pinning to a single shard or choosing another column to shard by.
    • This allows alignment with ClusterMetadata partition as well.


if nextPageToken == nil {
break
}
}

s.Equal(hostCount, 100)
s.Equal(100, hostCount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit]. You can also just create another actual map (actualIds), populate that using ++ and then use s.Equal to compare with expectedIds

hostID := []byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1}
// Expire previous records
// Todo: MetaMgr should provide api to clear all members
time.Sleep(time.Second * 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] Can this be part of the Suite test setup/teardown function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once that functionality exists, yes definitely. We have a similar problem in other areas where tests in the same suites rely on cleanup of the tests rather than enforcing it themselves.

pageToken = binary.LittleEndian.Uint64(request.NextPageToken)
var lastSeenHostId []byte
if len(request.NextPageToken) == 16 {
lastSeenHostId = request.NextPageToken
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these UUIDs are going to be completely random, right? Technically that means that during paging, any newly added hosts that show up might have a UUID before the current page token and not be returned. Not really a big deal (new hosts can show up even after I finish the read), but just want to make sure this is intended and there is no weirdness that this behavior is random.

If we are actually worried about this, then we can also include the "last updated time" in the pageToken and ignore entries on subsequent page retrievals that were added after the time at which the first page was queried

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do lose the guarantee that we had previously that we would include all newly added hosts even if added in between pages because of the global ordering we had. This is nice, but shouldn't be required, especially given the retry logic you added recently.

Even though it's random, this is an optimization in a way at this point (given above though its okay if we left them out as well). A better model could be changing the sorting order to paging by Session_start as this would roughly model the previous behaviors. However this introduces similar missing results scenarios during paging due to introducing clock skew in ordering or if there is enough time collisions due to poor time granularities in the sql store. I went down this path a bit and just the different time granularities are a headache.

var lastSeenHostId []byte
if len(request.NextPageToken) == 16 {
lastSeenHostId = request.NextPageToken
} else if len(request.NextPageToken) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a test case that exercises this else if error branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is a follow-up that will be addressed with some more work done around UUID parsing. This should really never occur though given the database size is 16 bytes and so is our UUID type, so it would require client error or database misconfig/corruption.

templateWithLimitSuffix = ` LIMIT ?`
templateWithOrderByInsertionSuffix = ` ORDER BY insertion_order ASC`
templateWithLimitSuffix = ` LIMIT ?`
templateWithOrderBySessionStartSuffix = ` ORDER BY membership_partition ASC, host_id ASC`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no point ordering by membership_partition at this point unless the page token also includes the partition itself, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, I wasn't 100% if its helpful to list that in advance for future-proof sake and since the PrimaryKey is ordered like this. I found some conflicting opinions and need to dig a bit more, but its no-harm to keep afaik.

@@ -245,19 +245,19 @@ CREATE TABLE cluster_metadata (

CREATE TABLE cluster_membership
(
membership_partition INT NOT NULL,
host_id BINARY(16) NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just make this (host_id) a string? that seems like it would be easier to debug / look at the database? or is selecting a binary column via the mysql shell easy to read? Also gives more flexibility in the future so you aren't shoe-horned into a 16-byte identifier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context was 16-bytes is UUID size and this is almost more a session_id column as its provided on each restart. Id be a fan of having this change to session_id and have another field called host_id that could be user provided, but thatd be a more comprehensive change. Thoughts?

@shawnhathaway shawnhathaway merged commit c72053f into temporalio:master Jul 22, 2020
@Alex-Tideman Alex-Tideman mentioned this pull request Jun 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants