arrive-server use cases #325

adrianna-chang-shopify · 2021-02-09T15:00:00Z

Moving the conversation from https://github.com/Shopify/super_maintenance/issues/8#issuecomment-775932361 over to our existing repo

From @tgwizard :

👋 I want to describe a use-case that I'm not sure fits well with the current CSV Task structure. In (app), we often run migrations and backfills on a large dataset. When we write such migrations, we often want to try them out on a subset of data. This can be all orders for a specific user, or the first 10k orders with id > X. We may often want to run them in parallel too (at least the process part), which could potentially be done by launching concurrent maintenance tasks on separate subsets of the data

Is that possible to do with a CSV Task? Seems a bit hard to do, since for CSV tasks only the process method seems like it should be implemented, whereas I imagine the parameters being applicable to the collection method.

(I imagine we'd write 2 separate maintenance tasks for the two flows above, one for single user, one for id + limit).

Is there some other way we could write these kind of migrations using the maintenance tasks gem?

adrianna-chang-shopify · 2021-02-09T15:13:18Z

Hey @tgwizard , can you tell me a bit more about why you want to use a CSV for these use cases? You can run a Task with a subset of a large db table by specifying an ActiveRecord relation for collection:

module Maintenance
  class SubsetOrderTask < MaintenanceTasks::Task
    def collection
        Order.where(user_id: 123)
        # or, Order.where(id: 10000..20000)
    end

    def count
      collection.count
    end

    def process(order)
      # Do something
    end
  end
end

As for concurrency, it's definitely something that's on our radar, but the gem isn't able to handle that yet. You can indeed write separate tasks to handle different subsets of the data and run them concurrently. I'm realizing now that CSV Tasks might have been appealing because you can customize which collection the Task is processing? So yes, if you wanted to, you could export a CSV with ids for all the orders you wanted to process, and then upload this to a CSV-processing Task. This way you could reuse a single Task across multiple datasets BUT the drawback is that you can only run a single instance of a Task at a time, so you wouldn't be able to achieve this concurrently. (So if concurrency is important, your best bet right now would be to just write separate Tasks for different datasets).

Let me know if this answers your question(s) or if I've misunderstood anything 😄

tgwizard · 2021-02-09T16:42:24Z

Hmm, right, that SubsetOrderTask conceptually exactly what I want to do, but I don't want to write a specific task for each user we'd run it for.

Similarly, the extract-order-ids-to-csv and run that as a CSV task kinda works, but that doesn't guarantee that all a user's orders have been migrated (if there are new orders added after the CSV has been generated).

A CSV that contains the user IDs to run for would be a good middleground, but then the maintenance task would still need to adjust the collection implementation on top of the CSV task's version, to do the Order.where(user_id: id_from_csv) part.

adrianna-chang-shopify · 2021-02-09T21:08:34Z

Ah, gotcha. So instead of having it be "CSV as a way to define a collection of rows to process", it would be "CSV as a way to define parameters that can be used in conjunction with another type of collection, like an ActiveRecord relation".

I think this would warrant a new type of task entirely. The CSV tasks are really intended to provide a collection for iteration, as opposed to allowing users to build new collections off of the contents of the CSV. (I realize that https://github.com/Shopify/super_maintenance/issues/8 is misleading because we talk about CSVs within the context of writing recurring tasks that take params, and is different from the usage of the CSV Task that is currently being built into the gem as it stands today).

As a workaround, to achieve:

A CSV that contains the user IDs to run for would be a good middleground, but then the maintenance task would still need to adjust the collection implementation on top of the CSV task's version, to do the Order.where(user_id: id_from_csv) part.

You could commit a CSV to your filesystem, and write a Task that does Order.where(user_id: id_from_csv) and manually read the CSV yourself. I realize that this isn't ideal, however, and I will keep note of this as a use case we want to explore further and potentially offer support for. 😄

tgwizard · 2021-02-10T09:54:13Z

Right, ideally we wouldn't need to deploy any changes to maintenance tasks or CSV files to run the task for a different subset of data. Ideally we wouldn't need to edit a CSV and upload that to target a new subset either - a UI with some kind of parameter input would be the ideal for us.

etiennebarrie · 2021-02-10T19:52:46Z

What if you run the task and then cancel it manually after some time? That works for some variation of subset of records.

Also we're not making this a public API just yet, but the way CsvCollection works may give you a seam to get into. It basically just defines a collection method for you, and also does what needs to be done to bring the CSV from the file input field to the collection. So maybe this could work:

module Maintenance
  class SubsetTask < MaintenanceTasks::Task
    csv_collection

    def collection
      csv = super
      SomeActiveRecord.where(user_id: csv.map {|row| row['user_id'] })
    end

    def process(record)
      #
    end
  end
end

csv_collection will give you the file input on the task page, and calling super will get you the CSV object.

Not sure how likely we are to break this API, so you may want to add a test that checks that your maintenance task works with Runner.run("Maintenance::SubsetTask", csv_file: { io: File.open('test/fixtures/files/sample.csv'), filename: 'sample.csv' }), because that is a public API, so that should always work.

tgwizard · 2021-02-11T11:18:33Z

Oh, that actually sounds like a good option for us @etiennebarrie! Slightly tedious to create a CSV when adjusting which users to target, but that is likely still less tedious than the other ways we have of doing this.

adrianna-chang-shopify · 2021-02-11T15:48:25Z

That's a super interesting point @etiennebarrie - obviously it's not-so-nice to hack with how CsvCollection works, but our set up with ActiveStorage gives us a nice extension point for developing a new sort of Task that accepts additional parameters via a CSV.

I wonder if we could develop an API that looks something like this:

module Maintenance
  class SubsetTask < MaintenanceTasks::Task
    with_params

    def collection
      SomeActiveRecord.where(user_id: params[:user_ids])
    end

    def process(record)
      #
    end
  end
end

And then in the UI, you would be expected to attach a CSV like this:

user_ids
1
2
3

And then Task would need to:

Define an attr_accessor, :params, for Tasks that have a with_params method specified. Defaults to {}
Allow CSVs to be uploaded for these types of Tasks
Parse each column of the CSV and store it in params[<header_name>] on the Task

adrianna-chang-shopify · 2021-02-11T16:01:49Z

Although arguably for ^ if you wanted to support multiple parameters, you'd have to do something like this:

user_ids, product_ids
1, 100
2, 101
3, 102

which is stretching the design of CSVs since each row is not actually a cohesive element. Maybe a different input type would work even better, like JSON?

{
  "shop_ids": [1, 2, 3],
  "product_ids": [100, 101, 102]
}

adrianna-chang-shopify · 2021-09-08T18:29:21Z

Closing this, custom parameter support has been addressed.

This was referenced May 10, 2021

Explore using ActiveModel to power Tasks with parameters #406

Closed

Support Tasks with custom parameters #413

Merged

joshbeckman mentioned this issue May 26, 2021

Concurrent task iteration support #422

Open

adrianna-chang-shopify closed this as completed Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arrive-server use cases #325

arrive-server use cases #325

adrianna-chang-shopify commented Feb 9, 2021 •

edited

Loading

adrianna-chang-shopify commented Feb 9, 2021

tgwizard commented Feb 9, 2021

adrianna-chang-shopify commented Feb 9, 2021

tgwizard commented Feb 10, 2021

etiennebarrie commented Feb 10, 2021

tgwizard commented Feb 11, 2021

adrianna-chang-shopify commented Feb 11, 2021

adrianna-chang-shopify commented Feb 11, 2021

adrianna-chang-shopify commented Sep 8, 2021 •

edited

Loading

arrive-server use cases #325

arrive-server use cases #325

Comments

adrianna-chang-shopify commented Feb 9, 2021 • edited Loading

adrianna-chang-shopify commented Feb 9, 2021

tgwizard commented Feb 9, 2021

adrianna-chang-shopify commented Feb 9, 2021

tgwizard commented Feb 10, 2021

etiennebarrie commented Feb 10, 2021

tgwizard commented Feb 11, 2021

adrianna-chang-shopify commented Feb 11, 2021

adrianna-chang-shopify commented Feb 11, 2021

adrianna-chang-shopify commented Sep 8, 2021 • edited Loading

adrianna-chang-shopify commented Feb 9, 2021 •

edited

Loading

adrianna-chang-shopify commented Sep 8, 2021 •

edited

Loading