Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Pagy in favour of cursor-based pagination #371

Merged
merged 3 commits into from
Mar 17, 2021

Conversation

adrianna-chang-shopify
Copy link
Contributor

Closes: #362

As mentioned in the issue, Pagy uses offset-based pagination, which decreases in performance as the collection of elements being paginated grows. It doesn't scale very well, and queries that use OFFSET are outright banned in Core for performance reasons. A cursor based pagination strategy queries based on a specific point in the dataset (the cursor), and works well for
sequential datasets / datasets with ordered attributes (maintenance_tasks_runs table is guaranteed to be sortable by created_at / id).

This PR introduces a RunsPage model that handles generating a "page" of Runs based on a cursor (id). Instead of calling TaskData#previous_runs and passing this to pagy to be paginated, the TasksController#show now initializes a new RunsPage with this relation, and passes a cursor if it receives one from the params. RunsPage has three methods - #records is the runs to be rendered on the current page, #next_cursor fetches the cursor to be used in order to move to the next page, and last? is used by the views to prevent showing the Next page link if we're on the last page.

I opted to keep these in a separate model instead of in TaskData because it was feeling like a clump of related data / behaviour and I wanted to be able to store the relation / cursor in ivars instead of having to pass them around as params in TaskData methods.

I also added a system test for pagination, but maybe we don't need this, let me know.

@adrianna-chang-shopify
Copy link
Contributor Author

@etiennebarrie could use your 👀 on this please!

@adrianna-chang-shopify
Copy link
Contributor Author

@etiennebarrie I've tightened up the queries on this. Now they look something like this:

Processing by MaintenanceTasks::TasksController#show as HTML
  Parameters: {"cursor"=>"16", "id"=>"Maintenance::UpdatePostsTask"}
  MaintenanceTasks::Run Load (0.2ms)  SELECT "maintenance_tasks_runs".* FROM "maintenance_tasks_runs" WHERE "maintenance_tasks_runs"."task_name" = ? ORDER BY "maintenance_tasks_runs"."created_at" DESC LIMIT ?  [["task_name", "Maintenance::UpdatePostsTask"], ["LIMIT", 1]]
  ActiveStorage::Attachment Load (0.1ms)  SELECT "active_storage_attachments".* FROM "active_storage_attachments" WHERE "active_storage_attachments"."record_type" = ? AND "active_storage_attachments"."name" = ? AND "active_storage_attachments"."record_id" = ?  [["record_type", "MaintenanceTasks::Run"], ["name", "csv_file"], ["record_id", 90]]
  MaintenanceTasks::Run Exists? (0.2ms)  SELECT 1 AS one FROM "maintenance_tasks_runs" WHERE "maintenance_tasks_runs"."task_name" = ? AND "maintenance_tasks_runs"."id" != ? AND (id < '16') LIMIT ?  [["task_name", "Maintenance::UpdatePostsTask"], ["id", 90], ["LIMIT", 1]]
  MaintenanceTasks::Run Load (0.3ms)  SELECT "maintenance_tasks_runs".* FROM "maintenance_tasks_runs" WHERE "maintenance_tasks_runs"."task_name" = ? AND "maintenance_tasks_runs"."id" != ? AND (id < '16') ORDER BY "maintenance_tasks_runs"."created_at" DESC LIMIT ?  [["task_name", "Maintenance::UpdatePostsTask"], ["id", 90], ["LIMIT", 20]]
  ActiveStorage::Attachment Load (0.2ms)  SELECT "active_storage_attachments".* FROM "active_storage_attachments" WHERE "active_storage_attachments"."record_type" = ? AND "active_storage_attachments"."name" = ? AND "active_storage_attachments"."record_id" IN 
    (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)  [["record_type", "MaintenanceTasks::Run"], ["name", "csv_file"], [nil, 15], [nil, 14], [nil, 13], [nil, 12], [nil, 11], [nil, 10], [nil, 9], [nil, 8], [nil, 7], [nil, 6], [nil, 5], [nil, 4], [nil, 3], [nil, 2], [nil, 1]]
  (0.2ms)  SELECT "maintenance_tasks_runs"."id" FROM "maintenance_tasks_runs" WHERE "maintenance_tasks_runs"."task_name" = ? AND "maintenance_tasks_runs"."id" != ? ORDER BY "maintenance_tasks_runs"."created_at" DESC  [["task_name", "Maintenance::UpdatePostsTask"], ["id", 90]]

Compared to before, which was:

Processing by MaintenanceTasks::TasksController#show as HTML
  Parameters: {"cursor"=>"16", "id"=>"Maintenance::UpdatePostsTask"}
  MaintenanceTasks::Run Load (0.2ms)  SELECT "maintenance_tasks_runs".* FROM "maintenance_tasks_runs" WHERE "maintenance_tasks_runs"."task_name" = ? ORDER BY "maintenance_tasks_runs"."created_at" DESC LIMIT ?  [["task_name", "Maintenance::UpdatePostsTask"], ["LIMIT", 1]]
  ActiveStorage::Attachment Load (0.1ms)  SELECT "active_storage_attachments".* FROM "active_storage_attachments" WHERE "active_storage_attachments"."record_type" = ? AND "active_storage_attachments"."name" = ? AND "active_storage_attachments"."record_id" = ?  [["record_type", "MaintenanceTasks::Run"], ["name", "csv_file"], ["record_id", 90]]
  MaintenanceTasks::Run Exists? (0.2ms)  SELECT 1 AS one FROM "maintenance_tasks_runs" WHERE "maintenance_tasks_runs"."task_name" = ? AND "maintenance_tasks_runs"."id" != ? AND (id < '16') LIMIT ?  [["task_name", "Maintenance::UpdatePostsTask"], ["id", 90], ["LIMIT", 1]]
  MaintenanceTasks::Run Load (0.3ms)  SELECT "maintenance_tasks_runs".* FROM "maintenance_tasks_runs" WHERE "maintenance_tasks_runs"."task_name" = ? AND "maintenance_tasks_runs"."id" != ? AND (id < '16') ORDER BY "maintenance_tasks_runs"."created_at" DESC LIMIT ?  [["task_name", "Maintenance::UpdatePostsTask"], ["id", 90], ["LIMIT", 20]]
  ActiveStorage::Attachment Load (0.2ms)  SELECT "active_storage_attachments".* FROM "active_storage_attachments" WHERE "active_storage_attachments"."record_type" = ? AND "active_storage_attachments"."name" = ? AND "active_storage_attachments"."record_id" IN 
    (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)  [["record_type", "MaintenanceTasks::Run"], ["name", "csv_file"], [nil, 15], [nil, 14], [nil, 13], [nil, 12], [nil, 11], [nil, 10], [nil, 9], [nil, 8], [nil, 7], [nil, 6], [nil, 5], [nil, 4], [nil, 3], [nil, 2], [nil, 1]]
  (0.2ms)  MaintenanceTasks::Run Load (0.2ms)  SELECT "maintenance_tasks_runs".* FROM "maintenance_tasks_runs" WHERE "maintenance_tasks_runs"."task_name" = ? AND "maintenance_tasks_runs"."id" != ? ORDER BY "maintenance_tasks_runs"."created_at" ASC LIMIT ?  [["task_name", "Maintenance::UpdatePostsTask"], ["id", 90], ["LIMIT", 1]]
  ActiveStorage::Attachment Load (0.1ms)  SELECT "active_storage_attachments".* FROM "active_storage_attachments" WHERE "active_storage_attachments"."record_type" = ? AND "active_storage_attachments"."name" = ? AND "active_storage_attachments"."record_id" = ?  [["record_type", "MaintenanceTasks::Run"], ["name", "csv_file"], ["record_id", 1]]
  CACHE MaintenanceTasks::Run Load (0.0ms)  SELECT "maintenance_tasks_runs".* FROM "maintenance_tasks_runs" WHERE "maintenance_tasks_runs"."task_name" = ? AND "maintenance_tasks_runs"."id" != ? AND (id < '16') ORDER BY "maintenance_tasks_runs"."created_at" DESC LIMIT ?  [["task_name", "Maintenance::UpdatePostsTask"], ["id", 90], ["LIMIT", 20]]
  CACHE ActiveStorage::Attachment Load (0.0ms)  SELECT "active_storage_attachments".* FROM "active_storage_attachments" WHERE "active_storage_attachments"."record_type" = ? AND "active_storage_attachments"."name" = ? AND "active_storage_attachments"."record_id" IN 
    (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)  [["record_type", "MaintenanceTasks::Run"], ["name", "csv_file"], [nil, 15], [nil, 14], [nil, 13], [nil, 12], [nil, 11], [nil, 10], [nil, 9], [nil, 8], [nil, 7], [nil, 6], [nil, 5], [nil, 4], [nil, 3], [nil, 2], [nil, 1]]

This was achieved by:

  • memoizing records, which removes the extra queries in next_cursor and last?
  • using @runs.unscope(:includes).pluck(:id).last for #last? in order to simplify our last query. By unscoping includes, we avoid having to do a LEFT OUTER JOIN on the active storage attachments table, and then we can use pluck to only load the id instead of the whole record

@etiennebarrie
Copy link
Member

There's still an Exists? query due to

which calls exists? if the relation isn't loaded
https://github.com/rails/rails/blob/v6.1.3/activerecord/lib/active_record/relation.rb#L266
via any?
https://github.com/rails/rails/blob/v6.1.3/activerecord/lib/active_record/relation.rb#L278

We could use present? in the view like I had mentioned previously in #187 (comment), or we could change records to add load (to keep a relation, but have it loaded already) or to_a (to get an array of records).

@adrianna-chang-shopify
Copy link
Contributor Author

adrianna-chang-shopify commented Mar 17, 2021

Thanks, forgot we could avoid the EXIST by using .present? instead 👌 Updated! Queries are now:

Processing by MaintenanceTasks::TasksController#show as HTML
  Parameters: {"cursor"=>"16", "id"=>"Maintenance::UpdatePostsTask"}
  MaintenanceTasks::Run Load (0.2ms)  SELECT "maintenance_tasks_runs".* FROM "maintenance_tasks_runs" WHERE "maintenance_tasks_runs"."task_name" = ? ORDER BY "maintenance_tasks_runs"."created_at" DESC LIMIT ?  [["task_name", "Maintenance::UpdatePostsTask"], ["LIMIT", 1]]
  ActiveStorage::Attachment Load (0.1ms)  SELECT "active_storage_attachments".* FROM "active_storage_attachments" WHERE "active_storage_attachments"."record_type" = ? AND "active_storage_attachments"."name" = ? AND "active_storage_attachments"."record_id" = ?  [["record_type", "MaintenanceTasks::Run"], ["name", "csv_file"], ["record_id", 90]]
  MaintenanceTasks::Run Load (0.3ms)  SELECT "maintenance_tasks_runs".* FROM "maintenance_tasks_runs" WHERE "maintenance_tasks_runs"."task_name" = ? AND "maintenance_tasks_runs"."id" != ? AND (id < '16') ORDER BY "maintenance_tasks_runs"."created_at" DESC LIMIT ?  [["task_name", "Maintenance::UpdatePostsTask"], ["id", 90], ["LIMIT", 20]]
  ActiveStorage::Attachment Load (0.2ms)  SELECT "active_storage_attachments".* FROM "active_storage_attachments" WHERE "active_storage_attachments"."record_type" = ? AND "active_storage_attachments"."name" = ? AND "active_storage_attachments"."record_id" IN 
    (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)  [["record_type", "MaintenanceTasks::Run"], ["name", "csv_file"], [nil, 15], [nil, 14], [nil, 13], [nil, 12], [nil, 11], [nil, 10], [nil, 9], [nil, 8], [nil, 7], [nil, 6], [nil, 5], [nil, 4], [nil, 3], [nil, 2], [nil, 1]]
  (0.2ms)  SELECT "maintenance_tasks_runs"."id" FROM "maintenance_tasks_runs" WHERE "maintenance_tasks_runs"."task_name" = ? AND "maintenance_tasks_runs"."id" != ? ORDER BY "maintenance_tasks_runs"."created_at" DESC  [["task_name", "Maintenance::UpdatePostsTask"], ["id", 90]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove dependency on Pagy
2 participants