Explore module based refactor #310

sambostock · 2021-01-22T20:35:53Z

@adrianna-chang-shopify and I paired on exploring the idea of a module based refactor to see if it would make it easier to provide a simple API to consumers for unsupported enumerator types. See discussion in #307.

This would be a breaking change, albeit one with a fairly straightforward migration path:

 class ProductTask < Task
+ include MaintenanceTasks::Adapters::ActiveRecord
   def collection Product.all end
   def process(product) puts("Processed #{product}") end
 end
 class NameTask < Task
+ include MaintenanceTasks::Adapters::Array
   def collection %w[Alice Bob Charlotte] end
   def process(name) puts("Processed #{name}") end
 end
 class DBTask < Task
- include MaintenanceTasks::CsvTask
+ include MaintenanceTasks::Adapters::CSV
   def process(row) puts("Processed #{row}") end
 end

(In fact, if we just reused the existing CSV module, that change wouldn't be breaking)

The tradeoff this makes is that the consumer needs to choose the appropriate module to include, instead of us inferring the collection type.

If this approach makes sense, it can be fleshed out.

rafaelfranca

We should not use inheritance when what we want is to change strategy. Use the strategy pattern please.

Also, the system can know which strategy needs to be used in the system, why are we asking users to specify it?

sambostock · 2021-01-22T23:13:24Z

We should not use inheritance when what we want is to change strategy. Use the strategy pattern please.

Interesting point @rafaelfranca. One difficulty I see is that the interface for each enumeration strategy isn't always the same:

The ActiveRecord::Relation and Array strategies want a collection
The CSV strategy wants a csv_content
The "custom enumerator" strategy is open ended
- Consumers might define an enumerator which has everything it needs
- Consumers might define an enumerator which expects some input (e.g. which resource to query from some API)

Is the strategy pattern still a good choice if each strategy has different requirements?

It seems to me like this would lead to two possible implementations:

Creating the strategy with all context

def strategy
  ActiveRecordStrategy.new(Product.all)
end

Having the strategy delegate to the task for context

def strategy
  ActiveRecordStrategy.new(self)
end

def collection
  Product.all
end

Is one of those what you had in mind? To me, they both seem less simple for consumers than the mixin approach.

Also, the system can know which strategy needs to be used in the system, why are we asking users to specify it?

To automatically select the strategy, we need the collection (so we can check its type), which leads the the awkwardness discussed in #307, where we either need to return a special collection, or override the method that calls collection. Did you have another alternative in mind?

adrianna-chang-shopify · 2021-01-25T14:13:09Z

Based on @rafaelfranca 's feedback here and the followup on Slack about changing CsvTask to be an abstract class instead of a mixin, I'm thinking we might want to revisit the solution you suggested with inheritance @sambostock .

EnumeratorTask would be the top-level class in the hierarchy, with CollectionTask and CsvTask both inheriting from it and supplying their own behaviour. We'll reintroduce the concept of abstract class (see our original implementation with abstract_class) to deal with the DescendantsTracker and only show concrete Tasks in the UI.

This would involve a change to our API (Task -> CollectionTask for existing Tasks), unless we alias Task to CollectionTask as Sam mentioned - but tweaking our API a bit this early is not a huge concern to me. What do you think @rafaelfranca ?

rafaelfranca · 2021-01-26T00:18:30Z

I think I'm leaning towards exposing our own enumerator_builder method but without exposing the job-iteration builder or the cursor.

The enumerator builder should be an object that follows some API and that return an enumerator when asked to based on the cursor that it receives as argument.

Something like:

class MyEnumeratorBuilder
  def initialize(collection)
    @collection = collecion
  end

  def enumerator(cursor:)
    # do what needs to be done and return an object that responds to `each` and uses the cursor
  end
end

Then in the task:

class Task
  def enumerator_builder
    MyEnumeratorBuilder.new(collection)
  end
end

In the Task superclass:

class TaskSuperclass
  def enumerator_builder
      collection = self.collection

      case collection
      when ActiveRecord::Relation
        SomethingThatWillKnowHowToReturnTheActiveRecordEnumerator.new(collection)
      when Array
        SomethingThatWillKnowHowToReturnTheArrayEnumerator.new(collection)
      when CSV
        CsvEnumeratorBuilder.new(collection)
      else
        raise ArgumentError, "#{self.class.name}#collection must be either "\
          'an Active Record Relation, an Array, or a CSV.'
      end
  end

It is similar to what you got in #307, but instead of passing any context to the task about the job-iteration enumerator builder and the curse we have our own API for enumerator builders.

sambostock · 2021-01-26T01:03:30Z

@rafaelfranca your idea is indeed pretty similar to #307, though I had chosen .call so that simple cases could use a proc or lambda. It's not much work for consumers to define their own class though.

Do you foresee a case in which we would have to pass enumerator more than just the cursor? The only one I can think of is passing it JobIteration's enumerator_builder (which we can get around for everything except ThrottleEnumerator by building a new one with nil job), but the reason I ask is that it would be tricky to add an additional keyword argument down the road without breaking backwards compatibility.

However, if we passed in an object which responds to .cursor, then we could easily add more to it in the future while maintaining backwards compatibility. Basically the EnumeratorContext from #307:

    EnumeratorContext = Struct.new(:cursor, keyword_init: true)

rafaelfranca · 2021-01-26T19:44:57Z

I also have strong opinions that we should not pass the Job Iteration's enumerator_builder. I see no reason for that object to exist. If we have a well-defined API for enumerators, we don't need to have a special object that know how to build that API.

The other argument that we may need in the future is the job so we can build the enumerators. I think passing the context object can make easy to add that argument later when we need to.

sambostock

@adrianna-chang-shopify, @etiennebarrie as discussed in our meeting earlier today, I have altered the implementation to use classes instead of modules. I have not fully fleshed it out, as the purpose of this PR is just to discuss alternatives.

I have added 3 commits, each making one of the discussed changes:

Use sub-classes instead of modules:
This adopts the class hierarchy we discussed:
```
class Task; end
class ActiveRecordTask < Task; end
class ArrayTask < Task; end
class CsvTask < Task; end
```
I didn't bother implementing any of the abstract_class code, but it would obviously be needed.
Use EnumerationContext instead of cursor: keyword
This passes context: into enumerator instead of cursor directly, allowing for forwards compatibility without breaking the API. I used a Struct for simplicity, but this could obviously change.
Add EnumeratorBuilder abstraction
This implements @rafaelfranca's suggestion from Explore module based refactor #310 (comment), changing the API slightly:
```
-def enumerator(context:)
+def enumerator_builder
```
The returned object is expected to respond to enumerator(context:), which allows us to more closely guard the concept of a cursor (unlike JobIteration, which requires consumers to know about it, as discussed in Add support for dynamic collections #307 (comment)).

Each commit also updates example.rb so the changes to consumer usage are evident.

Let me know what you think!

sambostock · 2021-01-27T02:01:58Z

app/jobs/maintenance_tasks/task_job.rb

-      case collection
-      when ActiveRecord::Relation
-        enumerator_builder.active_record_on_records(collection, cursor: cursor)
-      when Array
-        enumerator_builder.build_array_enumerator(collection, cursor: cursor)
-      when CSV
-        JobIteration::CsvEnumerator.new(collection).rows(cursor: cursor)
-      else
-        raise ArgumentError, "#{@task.class.name}#collection must be either "\
-          'an Active Record Relation, Array, or CSV.'
-      end


One of the things I like about this split implementation is that it adheres to the "open-closed" principle: it is "open to extension" (by subclassing) but "closed to modification" (we don't need to add further whens to add more collections types).

sambostock · 2021-01-27T02:05:29Z

app/tasks/maintenance_tasks/active_record_task.rb

+# frozen_string_literal: true
+
+module MaintenanceTasks
+  class ActiveRecordTask < Task


This implementation makes the tradeoff that consumers need to know about the various subclasses, but we can make that fairly seamless with good generators and error messages.

We might even allow consumers to add their own generators (e.g. in an app with many custom tasks using the same custom enumerator builder). 🤷‍♂️

app/tasks/maintenance_tasks/array_task.rb

sambostock · 2021-01-27T02:08:35Z

app/tasks/maintenance_tasks/array_task.rb

+      EnumeratorBuilder.new(collection)
+    end
+
+    def collection


With separate sub-classes, maybe it makes sense to rename the methods:

ActiveRecordTask#collection -> ActiveRecordTask#relation

ArrayTask#collection -> ArrayTask#array

sambostock · 2021-01-27T02:09:50Z

app/tasks/maintenance_tasks/array_task.rb

+    # @return [JobIteration::EnumeratorBuilder] instance of an enumerator
+    #   builder available to tasks.
+    def enumerator_builder
+      JobIteration.enumerator_builder.new(nil)


This is now firmly an implementation detail. We could implement it ourselves.

adrianna-chang-shopify

I have a couple questions / comments around the API (I think there might be some old code that didn't get updated, and just want to make sure I have my head fully wrapped around the Task interfaces, especially with the EnumeratorBuilder abstraction).

Other than that, this is definitely the approach I like the best! As we discussed in our meeting yesterday, it has a lot of benefits, most of which you've mentioned already in this PR, but I'll just recap:

The open-closed principle for Task types is a huge benefit, and also provides a natural extension point for collection-specific features like batches (implementing something like batches with our current model would be awkward with a single generic Task class)
Our framework can do more for our users now that we have Task-specific implementations (for example, handling #count inside the appropriate Task subclasses)
These changes also make it easy for us to detach ourselves completely from JobIteration's enumerator_builder, which we could just build ourselves now if we wanted like Sam mentioned
We've also detangled our API completely from cursor

As Sam noted, the main tradeoff here is forcing users to think about and specify which Task subclass they want to implement. However, I don't really think this is a problem, because:

We already call out the specific types of "collections" a Task can define in the README and with our error messaging, so users already need to be aware of this detail
We're in a weird situation right now where we say that a Task can implement a collection that is an Array, ActiveRecord::Relation, or CSV. But the reality is, a generic Task can only implement an Array or ActiveRecord::Relation, and a CsvTask is what needs to be used for implementing a CSV collection. So we're already highlighting different types of Tasks through inheritance, and IMO it makes a lot of sense to put Tasks that iterate over arrays and Tasks that iterate over ActiveRecords on the same level as Tasks that iterate over CSVs

We discussed yesterday implementing this solution, with the intent of shipping it to a v2.0 of the framework (since our changes result in a new API). We can work on developing a migration strategy for existing users to get to 2.0, but ultimately @etiennebarrie, @sambostock and I are convinced this is the route we want to go down.

@rafaelfranca can you let us know what your thoughts are?

adrianna-chang-shopify · 2021-01-27T14:10:28Z

app/tasks/maintenance_tasks/task.rb

+require 'maintenance_tasks/active_record_task'
+require 'maintenance_tasks/array_task'
+require 'maintenance_tasks/csv_task'
+# TODO: require other adapters, or make this dynamic somehow
+


Yeah, we could just move these to app/models 😁

I think app/tasks would make more sense if we're planning on keeping it around (otherwise, app/models makes sense). I think I'd seen some discussion about not using app/tasks internally at all, but maybe I'm wrong.

example.rb

adrianna-chang-shopify · 2021-01-27T14:20:00Z

app/tasks/maintenance_tasks/task.rb

@@ -71,9 +79,10 @@ def load_constants
    #
    # @raise [NotImplementedError] with a message advising subclasses to
    #   implement an override for this method.
-    def collection
+    def enumerator_builder(context:)


It's also nice now that custom iteration Tasks that don't care about the cursor (ie. are just streaming data from a queue) don't have to care about the cursor at all now!

Yeah! Though I've actually erroneously included context: here 😅

Suggested change

def enumerator_builder(context:)

def enumerator_builder

adrianna-chang-shopify · 2021-01-27T14:24:17Z

app/tasks/maintenance_tasks/task.rb

+      # @return the enumerator.
+      def enumerator(cursor:)
+        new.enumerator(cursor: cursor)


I think this might be old code, just making sure I understand - are we intending this to return new.enumerator_builder now, or new.enumerator_builder.enumerator(context)?

Yeah, I think this is a mistake and should just return new.enumerator_builder. Lots of moving pieces across those commits 😅

This was basically just me trying to put in the equivalent of the Task.collection method which used to be here.

app/tasks/maintenance_tasks/active_record_task.rb

app/tasks/maintenance_tasks/array_task.rb

app/tasks/maintenance_tasks/csv_task.rb

Co-authored-by: Adrianna Chang <[email protected]>

sambostock · 2021-01-27T17:57:53Z

Lots of moving pieces across the refactors. 😅 Thanks for catching the various things I missed @adrianna-chang-shopify!

sambostock

The migration path for consumers should be fairly straightforward:

-class ProductTask < Task
+class ProductTask < ActiveRecordTask
   def collection() Product.all end
   def process(item) puts(item) end
 end

-class NameTask < Task
+class NameTask < ArrayTask
   def collection() Product.all end
   def process(item) puts(item) end
 end

 class ShopIdTask < CsvTask
   def process(item) puts(item) end
 end

Tasks with ActiveRecord::Relation or Array collections would need to change their superclass, but CSV collection tasks already inherit from CsvTask. The support for custom collections would be a net new feature, so there's no migration path needed there.

I don't think there would be anything else, other than if someone had done heavy customization to their TaskJob. As far as "breaking changes" go, it's fairly small, IMO.

sambostock · 2021-01-27T18:06:18Z

app/tasks/maintenance_tasks/task.rb

+        "#{self.class.name} must implement `#{__method__}` or inherit from a class which does"
+      # TODO: Could make error string list available adapters


We could do something fancy here where we detect a legacy task and provide a useful error message, since if this method is reached from a class implementing collection, they're almost certainly using the legacy API.

if respond_to?(:collection) case collection when ActiveRecord::Relation raise NotImplementedError, 'Inherit from new ActiveRecordTask instead' when Array raise NotImplementedError, 'Inherit from new ArrayTask instead' else raise NotImplementedError, 'Migrate to new Task API for your custom collection' end else raise NotImplementedError, "#{self.class.name} must implement `#{__method__}` or inherit from a class which does" end

rafaelfranca · 2021-01-27T19:40:39Z

@rafaelfranca can you let us know what your thoughts are?

My thoughts are still the same. We should not use inheritance to change behavior by default. We can already infer the behavior based on the collection. Why we do need to ask people to tell us something that we can know already?

I'm ok with people that want custom enumerators using polymorphism to change the enumerator in their tasks, but the cases we support in this gem should not use any different superclass or include any new module (which is a different form of inheritance).

What I'm saying that we should have only one "abstract" class (possibly two with CSVTask but I still think we can remove that class) and only concrete classes in the applications.

gmcgibbon · 2024-01-19T20:33:26Z

Superseded by #944, so I am going to close this for now.

rafaelfranca requested changes Jan 22, 2021

View reviewed changes

adrianna-chang-shopify mentioned this pull request Jan 25, 2021

Use superclass for CSV method #313

Merged

sambostock added 4 commits January 26, 2021 16:54

[WIP] Explore module based refactor

4e11f8f

Clarify implementation & API differences

37d067a

Use sub-classes instead of modules

c5365a9

Use EnumerationContext instead of cursor: keyword

27d1404

sambostock force-pushed the modules-instead branch from 9074b99 to 809c749 Compare January 27, 2021 01:43

Add EnumeratorBuilder abstraction

3e19856

sambostock force-pushed the modules-instead branch from 809c749 to 3e19856 Compare January 27, 2021 01:45

sambostock commented Jan 27, 2021

View reviewed changes

adrianna-chang-shopify reviewed Jan 27, 2021

View reviewed changes

Apply suggestions from code review

4afdc7f

Co-authored-by: Adrianna Chang <[email protected]>

sambostock commented Jan 27, 2021

View reviewed changes

etiennebarrie mentioned this pull request Jan 28, 2021

Use a class method to declare a CSV task #317

Merged

sambostock mentioned this pull request Feb 9, 2021

Introduce Task#enumerator_builder #326

Closed

3 tasks

gmcgibbon closed this Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore module based refactor #310

Explore module based refactor #310

sambostock commented Jan 22, 2021

rafaelfranca left a comment •

edited

Loading

sambostock commented Jan 22, 2021

adrianna-chang-shopify commented Jan 25, 2021

rafaelfranca commented Jan 26, 2021

sambostock commented Jan 26, 2021

rafaelfranca commented Jan 26, 2021

sambostock left a comment

sambostock Jan 27, 2021

sambostock Jan 27, 2021

sambostock Jan 27, 2021

sambostock Jan 27, 2021

adrianna-chang-shopify left a comment

adrianna-chang-shopify Jan 27, 2021

sambostock Jan 27, 2021

adrianna-chang-shopify Jan 27, 2021

sambostock Jan 27, 2021 •

edited

Loading

adrianna-chang-shopify Jan 27, 2021

sambostock Jan 27, 2021

sambostock commented Jan 27, 2021

sambostock left a comment

sambostock Jan 27, 2021

rafaelfranca commented Jan 27, 2021 •

edited

Loading

gmcgibbon commented Jan 19, 2024

		"#{self.class.name} must implement `#{__method__}` or inherit from a class which does"
		# TODO: Could make error string list available adapters

Explore module based refactor #310

Explore module based refactor #310

Conversation

sambostock commented Jan 22, 2021

rafaelfranca left a comment • edited Loading

Choose a reason for hiding this comment

sambostock commented Jan 22, 2021

Creating the strategy with all context

Having the strategy delegate to the task for context

adrianna-chang-shopify commented Jan 25, 2021

rafaelfranca commented Jan 26, 2021

sambostock commented Jan 26, 2021

rafaelfranca commented Jan 26, 2021

sambostock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrianna-chang-shopify left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sambostock Jan 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sambostock commented Jan 27, 2021

sambostock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaelfranca commented Jan 27, 2021 • edited Loading

gmcgibbon commented Jan 19, 2024

rafaelfranca left a comment •

edited

Loading

sambostock Jan 27, 2021 •

edited

Loading

rafaelfranca commented Jan 27, 2021 •

edited

Loading