Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskQueue Schema Collapse #500

Merged
merged 16 commits into from
Jul 17, 2020
Merged

TaskQueue Schema Collapse #500

merged 16 commits into from
Jul 17, 2020

Conversation

shawnhathaway
Copy link
Contributor

@shawnhathaway shawnhathaway commented Jul 6, 2020

TaskQueue previously had a schema that was keyed as (shard_id, namespace_id, name, task_type). This structure has changed to (range_hash, task_queue_id)

  • NumShards was previously hardcoded at the config level and could never be changed as the raw hash that was computed would be modulus'd by this value before storing it in the DB as shard_id.
    • shard_id becomes range_hash and it is no longer modulus'd by the NumShards value, but rather the raw hash (range from 0 to MaxUint32) is stored.
    • Config knob renamed from NumShards to TaskScanPartitions
      This option only affects reads now; controls the minimum number of pages in a ListTaskQueue query to ensure sequentially reading from shards in sharded SQL databases where a large number of TaskQueue pages exist. If this value is set to 4, then there are 4 select queries done using sequential contiguous blocks of ShardIds.
  • namespace_id, name, task_type is combined into a single database column task_queue_id to provide a simpler (and correct) paging experience.

This was tested with local unit testing and manual test cases as well as using this Buildkite PR step to validate.

Note: This fixes previously broken ListTaskList/ListTaskQueue functionality in SQL based databases when results exceed the specified page size.

Future Todos:

  • rangeSelectFromTaskQueues should be split into two functions.
  • Task PR follow-up after this to enact the same changes on the Task table schema
  • Unique Indexing on TaskQueueID
    • Do we really need range_hash as part of the PK?

- Rename NumShards to TaskShardPartitions
go.mod Outdated Show resolved Hide resolved
common/persistence/sql/sqlTaskManager.go Outdated Show resolved Hide resolved
common/persistence/sql/sqlTaskManager.go Outdated Show resolved Hide resolved
common/persistence/sql/sqlTaskManager.go Show resolved Hide resolved
common/persistence/sql/sqlTaskManager.go Outdated Show resolved Hide resolved
@shawnhathaway shawnhathaway changed the title Raw TaskQueue ShardId TaskQueue Schema Collapse Jul 16, 2020
Copy link
Contributor

@samarabbas samarabbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better now. Great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants