Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore module based refactor #310

Closed
wants to merge 6 commits into from
Closed

Explore module based refactor #310

wants to merge 6 commits into from

Conversation

sambostock
Copy link
Contributor

@adrianna-chang-shopify and I paired on exploring the idea of a module based refactor to see if it would make it easier to provide a simple API to consumers for unsupported enumerator types. See discussion in #307.

This would be a breaking change, albeit one with a fairly straightforward migration path:

 class ProductTask < Task
+ include MaintenanceTasks::Adapters::ActiveRecord
   def collection Product.all end
   def process(product) puts("Processed #{product}") end
 end
 class NameTask < Task
+ include MaintenanceTasks::Adapters::Array
   def collection %w[Alice Bob Charlotte] end
   def process(name) puts("Processed #{name}") end
 end
 class DBTask < Task
- include MaintenanceTasks::CsvTask
+ include MaintenanceTasks::Adapters::CSV
   def process(row) puts("Processed #{row}") end
 end

(In fact, if we just reused the existing CSV module, that change wouldn't be breaking)

The tradeoff this makes is that the consumer needs to choose the appropriate module to include, instead of us inferring the collection type.

If this approach makes sense, it can be fleshed out.

Copy link
Member

@rafaelfranca rafaelfranca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use inheritance when what we want is to change strategy. Use the strategy pattern please.

Also, the system can know which strategy needs to be used in the system, why are we asking users to specify it?

@sambostock
Copy link
Contributor Author

We should not use inheritance when what we want is to change strategy. Use the strategy pattern please.

Interesting point @rafaelfranca. One difficulty I see is that the interface for each enumeration strategy isn't always the same:

  • The ActiveRecord::Relation and Array strategies want a collection
  • The CSV strategy wants a csv_content
  • The "custom enumerator" strategy is open ended
    • Consumers might define an enumerator which has everything it needs
    • Consumers might define an enumerator which expects some input (e.g. which resource to query from some API)

Is the strategy pattern still a good choice if each strategy has different requirements?

It seems to me like this would lead to two possible implementations:

Creating the strategy with all context

def strategy
  ActiveRecordStrategy.new(Product.all)
end

Having the strategy delegate to the task for context

def strategy
  ActiveRecordStrategy.new(self)
end

def collection
  Product.all
end

Is one of those what you had in mind? To me, they both seem less simple for consumers than the mixin approach.


Also, the system can know which strategy needs to be used in the system, why are we asking users to specify it?

To automatically select the strategy, we need the collection (so we can check its type), which leads the the awkwardness discussed in #307, where we either need to return a special collection, or override the method that calls collection. Did you have another alternative in mind?

@adrianna-chang-shopify
Copy link
Contributor

Based on @rafaelfranca 's feedback here and the followup on Slack about changing CsvTask to be an abstract class instead of a mixin, I'm thinking we might want to revisit the solution you suggested with inheritance @sambostock .

EnumeratorTask would be the top-level class in the hierarchy, with CollectionTask and CsvTask both inheriting from it and supplying their own behaviour. We'll reintroduce the concept of abstract class (see our original implementation with abstract_class) to deal with the DescendantsTracker and only show concrete Tasks in the UI.

This would involve a change to our API (Task -> CollectionTask for existing Tasks), unless we alias Task to CollectionTask as Sam mentioned - but tweaking our API a bit this early is not a huge concern to me. What do you think @rafaelfranca ?

@rafaelfranca
Copy link
Member

I think I'm leaning towards exposing our own enumerator_builder method but without exposing the job-iteration builder or the cursor.

The enumerator builder should be an object that follows some API and that return an enumerator when asked to based on the cursor that it receives as argument.

Something like:

class MyEnumeratorBuilder
  def initialize(collection)
    @collection = collecion
  end

  def enumerator(cursor:)
    # do what needs to be done and return an object that responds to `each` and uses the cursor
  end
end

Then in the task:

class Task
  def enumerator_builder
    MyEnumeratorBuilder.new(collection)
  end
end

In the Task superclass:

class TaskSuperclass
  def enumerator_builder
      collection = self.collection

      case collection
      when ActiveRecord::Relation
        SomethingThatWillKnowHowToReturnTheActiveRecordEnumerator.new(collection)
      when Array
        SomethingThatWillKnowHowToReturnTheArrayEnumerator.new(collection)
      when CSV
        CsvEnumeratorBuilder.new(collection)
      else
        raise ArgumentError, "#{self.class.name}#collection must be either "\
          'an Active Record Relation, an Array, or a CSV.'
      end
  end

It is similar to what you got in #307, but instead of passing any context to the task about the job-iteration enumerator builder and the curse we have our own API for enumerator builders.

@sambostock
Copy link
Contributor Author

@rafaelfranca your idea is indeed pretty similar to #307, though I had chosen .call so that simple cases could use a proc or lambda. It's not much work for consumers to define their own class though.

Do you foresee a case in which we would have to pass enumerator more than just the cursor? The only one I can think of is passing it JobIteration's enumerator_builder (which we can get around for everything except ThrottleEnumerator by building a new one with nil job), but the reason I ask is that it would be tricky to add an additional keyword argument down the road without breaking backwards compatibility.

However, if we passed in an object which responds to .cursor, then we could easily add more to it in the future while maintaining backwards compatibility. Basically the EnumeratorContext from #307:

    EnumeratorContext = Struct.new(:cursor, keyword_init: true)

@rafaelfranca
Copy link
Member

I also have strong opinions that we should not pass the Job Iteration's enumerator_builder. I see no reason for that object to exist. If we have a well-defined API for enumerators, we don't need to have a special object that know how to build that API.

The other argument that we may need in the future is the job so we can build the enumerators. I think passing the context object can make easy to add that argument later when we need to.

Copy link
Contributor Author

@sambostock sambostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrianna-chang-shopify, @etiennebarrie as discussed in our meeting earlier today, I have altered the implementation to use classes instead of modules. I have not fully fleshed it out, as the purpose of this PR is just to discuss alternatives.

I have added 3 commits, each making one of the discussed changes:

  • Use sub-classes instead of modules:
    This adopts the class hierarchy we discussed:
    class Task; end
    class ActiveRecordTask < Task; end
    class ArrayTask < Task; end
    class CsvTask < Task; end
    I didn't bother implementing any of the abstract_class code, but it would obviously be needed.
  • Use EnumerationContext instead of cursor: keyword
    This passes context: into enumerator instead of cursor directly, allowing for forwards compatibility without breaking the API. I used a Struct for simplicity, but this could obviously change.
  • Add EnumeratorBuilder abstraction
    This implements @rafaelfranca's suggestion from Explore module based refactor #310 (comment), changing the API slightly:
    -def enumerator(context:)
    +def enumerator_builder
    The returned object is expected to respond to enumerator(context:), which allows us to more closely guard the concept of a cursor (unlike JobIteration, which requires consumers to know about it, as discussed in Add support for dynamic collections #307 (comment)).

Each commit also updates example.rb so the changes to consumer usage are evident.

Let me know what you think!

Comment on lines -32 to -42
case collection
when ActiveRecord::Relation
enumerator_builder.active_record_on_records(collection, cursor: cursor)
when Array
enumerator_builder.build_array_enumerator(collection, cursor: cursor)
when CSV
JobIteration::CsvEnumerator.new(collection).rows(cursor: cursor)
else
raise ArgumentError, "#{@task.class.name}#collection must be either "\
'an Active Record Relation, Array, or CSV.'
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things I like about this split implementation is that it adheres to the "open-closed" principle: it is "open to extension" (by subclassing) but "closed to modification" (we don't need to add further whens to add more collections types).

# frozen_string_literal: true

module MaintenanceTasks
class ActiveRecordTask < Task
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation makes the tradeoff that consumers need to know about the various subclasses, but we can make that fairly seamless with good generators and error messages.

We might even allow consumers to add their own generators (e.g. in an app with many custom tasks using the same custom enumerator builder). 🤷‍♂️

app/tasks/maintenance_tasks/array_task.rb Outdated Show resolved Hide resolved
EnumeratorBuilder.new(collection)
end

def collection
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With separate sub-classes, maybe it makes sense to rename the methods:

  • ActiveRecordTask#collection -> ActiveRecordTask#relation
  • ArrayTask#collection -> ArrayTask#array

# @return [JobIteration::EnumeratorBuilder] instance of an enumerator
# builder available to tasks.
def enumerator_builder
JobIteration.enumerator_builder.new(nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now firmly an implementation detail. We could implement it ourselves.

Copy link
Contributor

@adrianna-chang-shopify adrianna-chang-shopify left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple questions / comments around the API (I think there might be some old code that didn't get updated, and just want to make sure I have my head fully wrapped around the Task interfaces, especially with the EnumeratorBuilder abstraction).

Other than that, this is definitely the approach I like the best! As we discussed in our meeting yesterday, it has a lot of benefits, most of which you've mentioned already in this PR, but I'll just recap:

  • The open-closed principle for Task types is a huge benefit, and also provides a natural extension point for collection-specific features like batches (implementing something like batches with our current model would be awkward with a single generic Task class)
  • Our framework can do more for our users now that we have Task-specific implementations (for example, handling #count inside the appropriate Task subclasses)
  • These changes also make it easy for us to detach ourselves completely from JobIteration's enumerator_builder, which we could just build ourselves now if we wanted like Sam mentioned
  • We've also detangled our API completely from cursor

As Sam noted, the main tradeoff here is forcing users to think about and specify which Task subclass they want to implement. However, I don't really think this is a problem, because:

  • We already call out the specific types of "collections" a Task can define in the README and with our error messaging, so users already need to be aware of this detail
  • We're in a weird situation right now where we say that a Task can implement a collection that is an Array, ActiveRecord::Relation, or CSV. But the reality is, a generic Task can only implement an Array or ActiveRecord::Relation, and a CsvTask is what needs to be used for implementing a CSV collection. So we're already highlighting different types of Tasks through inheritance, and IMO it makes a lot of sense to put Tasks that iterate over arrays and Tasks that iterate over ActiveRecords on the same level as Tasks that iterate over CSVs

We discussed yesterday implementing this solution, with the intent of shipping it to a v2.0 of the framework (since our changes result in a new API). We can work on developing a migration strategy for existing users to get to 2.0, but ultimately @etiennebarrie, @sambostock and I are convinced this is the route we want to go down.

@rafaelfranca can you let us know what your thoughts are?

Comment on lines +3 to +7
require 'maintenance_tasks/active_record_task'
require 'maintenance_tasks/array_task'
require 'maintenance_tasks/csv_task'
# TODO: require other adapters, or make this dynamic somehow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could just move these to app/models 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think app/tasks would make more sense if we're planning on keeping it around (otherwise, app/models makes sense). I think I'd seen some discussion about not using app/tasks internally at all, but maybe I'm wrong.

example.rb Outdated Show resolved Hide resolved
example.rb Outdated Show resolved Hide resolved
@@ -71,9 +79,10 @@ def load_constants
#
# @raise [NotImplementedError] with a message advising subclasses to
# implement an override for this method.
def collection
def enumerator_builder(context:)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also nice now that custom iteration Tasks that don't care about the cursor (ie. are just streaming data from a queue) don't have to care about the cursor at all now!

Copy link
Contributor Author

@sambostock sambostock Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! Though I've actually erroneously included context: here 😅

Suggested change
def enumerator_builder(context:)
def enumerator_builder

Comment on lines +54 to +56
# @return the enumerator.
def enumerator(cursor:)
new.enumerator(cursor: cursor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be old code, just making sure I understand - are we intending this to return new.enumerator_builder now, or new.enumerator_builder.enumerator(context)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is a mistake and should just return new.enumerator_builder. Lots of moving pieces across those commits 😅

This was basically just me trying to put in the equivalent of the Task.collection method which used to be here.

app/tasks/maintenance_tasks/active_record_task.rb Outdated Show resolved Hide resolved
app/tasks/maintenance_tasks/array_task.rb Outdated Show resolved Hide resolved
app/tasks/maintenance_tasks/csv_task.rb Outdated Show resolved Hide resolved
@sambostock
Copy link
Contributor Author

Lots of moving pieces across the refactors. 😅 Thanks for catching the various things I missed @adrianna-chang-shopify!

Copy link
Contributor Author

@sambostock sambostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration path for consumers should be fairly straightforward:

-class ProductTask < Task
+class ProductTask < ActiveRecordTask
   def collection() Product.all end
   def process(item) puts(item) end
 end

-class NameTask < Task
+class NameTask < ArrayTask
   def collection() Product.all end
   def process(item) puts(item) end
 end

 class ShopIdTask < CsvTask
   def process(item) puts(item) end
 end

Tasks with ActiveRecord::Relation or Array collections would need to change their superclass, but CSV collection tasks already inherit from CsvTask. The support for custom collections would be a net new feature, so there's no migration path needed there.

I don't think there would be anything else, other than if someone had done heavy customization to their TaskJob. As far as "breaking changes" go, it's fairly small, IMO.

Comment on lines +84 to +85
"#{self.class.name} must implement `#{__method__}` or inherit from a class which does"
# TODO: Could make error string list available adapters
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do something fancy here where we detect a legacy task and provide a useful error message, since if this method is reached from a class implementing collection, they're almost certainly using the legacy API.

if respond_to?(:collection)
  case collection
  when ActiveRecord::Relation
    raise NotImplementedError, 'Inherit from new ActiveRecordTask instead'
  when Array
    raise NotImplementedError, 'Inherit from new ArrayTask instead'
  else
    raise NotImplementedError, 'Migrate to new Task API for your custom collection'
  end
else
  raise NotImplementedError, "#{self.class.name} must implement `#{__method__}` or inherit from a class which does"
end

@rafaelfranca
Copy link
Member

rafaelfranca commented Jan 27, 2021

@rafaelfranca can you let us know what your thoughts are?

My thoughts are still the same. We should not use inheritance to change behavior by default. We can already infer the behavior based on the collection. Why we do need to ask people to tell us something that we can know already?

I'm ok with people that want custom enumerators using polymorphism to change the enumerator in their tasks, but the cases we support in this gem should not use any different superclass or include any new module (which is a different form of inheritance).

What I'm saying that we should have only one "abstract" class (possibly two with CSVTask but I still think we can remove that class) and only concrete classes in the applications.

@gmcgibbon
Copy link
Member

Superseded by #944, so I am going to close this for now.

@gmcgibbon gmcgibbon closed this Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants