Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby phase 1 #93

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Ruby phase 1 #93

wants to merge 4 commits into from

Conversation

cretz
Copy link
Member

@cretz cretz commented Jul 10, 2024

Summary

  • This is the proposal for the first phase of the Ruby SDK. This is the second discussion of this proposal, see Ruby SDK - Phase 1 - Initial Comment Period #92 for past discussion.
  • See it rendered.
  • Much of this is inspired by the current Temporal Ruby SDK (and its proposals which were moved to a sub directory) and our other SDKs.
  • Use this PR for comments/discussion. Don't be afraid of making too many comments; if it gets too noisy, I'll just create another PR.
  • Any points of contention will be discussed in-person by the SDK team and a decision will be made.

For the public, please also join us on #ruby-sdk in Slack.

@cretz cretz requested a review from a team July 10, 2024 18:17
* Custom metrics and optional metrics buffer for custom processing
* Typed search attributes, memos, headers, etc
* API key, RPC metadata, HTTP CONNECT proxy support, etc
* Loggers with contextual information
Copy link
Member Author

@cretz cretz Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will likely be our own logger abstraction since we need a pluggable logger factory and Ruby stdlib doesn't really offer one

but in general Ruby typing right now is a mess and we likely will hit bugs or advanced cases a yard/RBS combination
library won't work with.
* ❓ How hard do we want to work to avoid duplicate yard/RBS typing?
* 💭 Why not Sorbet? Sorbet can come later or provided separately, but RBS seems to be the newer movement.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @cretz , the org where I work is very excited about the official Ruby SDK for temporal!

Just wanted to drop a note that we use Sorbet heavily and it seems like it's the much more popular typing system in the community (despite the introduction of RBS). It won't affect our usage much either way (although first-class sorbet typing would make it slightly easier to integrate), but if you were interested in exploring/chatting more about sorbet, just wanted to offer to jump on a synchronous call to talk through concerns/tradeoffs. I've added sorbet to many gems and applications and would be very happy to chat through things.

Thanks for prioritizing this work!!

Copy link
Member Author

@cretz cretz Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yes, I definitely believe RBS is weaker than Sorbet, and less adopted, and Steep is not a great tool, but I am hoping to avoid non-stdlib deps where we can. We considered no typing at all and that still may be an option. But during PoC, RBS was acceptable for this limited use case, but I expect most users to treat this as an untyped library, including Sorbet users. So I definitely agree it won't affect most people's usage either way. At this point with such low RBS adoption, it essentially just becomes a check on ourselves (we did this successfully in the last incarnation at https://github.com/temporalio/sdk-ruby/tree/v0.1.1).

We were just wanting to avoid maintaining Sorbet annotations in repo. But we would totally encourage external ones at https://github.com/Shopify/rbi-central or https://github.com/sorbet/sorbet-typed or wherever.

Would definitely like to chat on this further. If easier, can come join us in #ruby-sdk in Slack. The pinned message in that channel also has a link to a survey w/ a request at the end to setup a meeting to discuss Ruby if that's preferred.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filled out the survey, thanks!

I commented in that channel in case you wanted to chat through Sorbet more :)

Thanks again!

* 💭 Why classes instead of methods/functions like every other SDK? There are tradeoffs to both approaches. As
classes, they are more easily referenced, a bit more easily typed in RBS, and metadata can be applied more easily at
the class level. This comes at a cost of not being able to easily share state across classes, but users can provide
something in the constructor multiple activities use. Open to considering activity methods instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might've already discussed this internally, but a middle ground here might be supporting any object that responds to #call rather than #execute.

Using #call would potentially allow simple activities to be implemented as a Proc or Lambda, while still allowing people to use custom classes for more complex scenarios.

In the spirit of "there's more than one way to do it", this would give SDK users multiple options for defining activities, and they could choose whichever one is most appropriate for their circumstances:

module Activities
  # Constant procs
  SomeProcActivity = ->(params) {
    # ...
  }

  # Proc factory methods
  def self.another_proc_activity(config)
    ->(params) {
      # can use config to customize proc
    }
  }

  # Maybe some kind of DSL?
  DSLActivity = Temporalio.activity(options) do |params|
    # ...
  end

  # Child classes for full control
  ChildClassActivity < Temporalio::Activity
    def initialize(config)
      # ...
    end

    def call(params)
      # ...
    end
  end
end
# Activity registration using a hash, similar to TS API:
worker1 = Temporalio::Worker.new(
  # ...

  activities: {
    SomeProcActivity: Activities::SomeProcActivity,
    AnotherProcActivity: Activities.another_proc_activity(config),
    DSLActivity: Activities::DSLActivity,
    ChildClassActivity: Activities::ChildClassActivity,

    # Even inline activities, which is almost certainly not a good
    # idea for production code, but could be useful in some situations.
    InlineActivity: ->(params) {
      # ...
    }
  }

  # ...
)

And possibly a block-based API on the worker:

worker.register_activity('MyActivity') do |params|
  # ...
end

As well as anything that responds to #to_proc:

worker.register_activity('MyActivity', &:my_activity)

All of these could of course also be supported using the #execute method, but then there'd be a discrepancy between the SDK's #execute and the proc's #call.


Some similar APIs:

Copy link
Member Author

@cretz cretz Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using #call would potentially allow simple activities to be implemented as a Proc or Lambda, while still allowing people to use custom classes for more complex scenarios.

A couple of important aspects we want for activities are 1) they are easily referenceable by a caller without calling (e.g. Workflow.execute_activity(MyActivityClass, ..., w/ possible generic typing support w/ RBS/Sorbet), and 2) people can use existing things (e.g. mixins) and there is not much magic.

In the spirit of "there's more than one way to do it", this would give SDK users multiple options for defining activities,

We actually try to avoid this in SDKs where we can at declaration time. Sure some SDKs do have a couple of ways, but in general SDKs say "activities are defined as X" without many values for X and without more magic than we have to have. And remember there is the referenceability concern.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of important aspects we want for activities are 1) they are easily referenceable by a caller without calling (e.g. Workflow.execute_activity(MyActivityClass, ..., w/ possible generic typing support w/ RBS/Sorbet), and 2) people can use existing things (e.g. mixins) and there is not much magic.

Got it. If I'm understanding them correctly, those all make sense to me as goals / requirements 👍

For (1), is that referring to the ability to call an activity with something like MyActivity.execute rather than workflow.execute_activity(MyActivity)? If so, then using a method other than #call definitely makes sense to me. IIUC, that'll help prevent users from trying to execute activity classes that haven't been wired up to Temporal.

We actually try to avoid this in SDKs where we can at declaration time. Sure some SDKs do have a couple of ways, but in general SDKs say "activities are defined as X" without many values for X and without more magic than we have to have. And remember there is the referenceability concern.

I totally get this, and I'm personally not a huge fan of how Ruby has adopted Perl's TIMTOWTDI philosophy. But I also think a lot of Rubyists definitely view that as one of Ruby's strengths, so just wanted to put in a plug for that way of thinking 😅

That being said, I don't think anyone will avoid using Temporal just because they can't implement activities as procs, so it's probably not worth the effort to implement (& support!) that level of flexibility.

Copy link
Member Author

@cretz cretz Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that referring to the ability to call an activity with something like MyActivity.execute rather than workflow.execute_activity(MyActivity)?

No, we want the latter. We learned from other SDKs that it is best to not make the remote activity invocation look like a regular local method invocation.

they can't implement activities as procs

Maybe we can have sugar for this, e.g. class ProcActivity < Temporalio::Activity and a class method on Temporalio::Activity that creates it. But it is the same type of sugar a user can write if we make sure that we support activities as classes.


### Workflows in Client

Early thoughts on workflows are that they'll be Ractors. For the purposes of this phase, all that may matter is how

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment about this in the Google form as well, but I'm not sure if Ractors would be the right move here. They're still very experimental and still have some fundamental limitations:

  • https://rubykaigi.org/2023/presentations/ko1.html

    Ractor was introduced in Ruby 3.0 as an experimental feature to enable parallel programming in Ruby. Now Ractor is not widely used for many reasons. For example, from the perspective of a Ractor implementer, Ractor does not have enough performance and quality. In this talk, we will explain which features are not enough, why they are difficult, and how to solve these difficulties.

  • https://rubykaigi.org/2024/presentations/ko1.html

    This talk presents recent updates to Ractor, which enables parallel and concurrent programming on Ruby.

    Ractor still lacks fundamental features. For example, we cannot use “require” method and “timeout” methods on non-main Ractors because of synchronization and implementation issues. We will discuss such problems and how to solve them. From a performance point of view, we have introduced the M:N thread scheduler in Ruby 3.3 and we will show the performance analysis with recent improvements.

They also still output scary warnings, even in Ruby 3.3:

irb(main):001> Ractor.new { puts RUBY_VERSION }
(irb):1: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
3.3.4
=> #<Ractor:#2 (irb):1 terminated>

I definitely think that Ractors could be a good solution in a future world where Ractors are more polished, but IMO for now it would be safer to follow the coinbase/temporal-ruby SDK and use Fibers as the basis for scaling workflow executions.

Copy link
Member Author

@cretz cretz Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the biggest challenges in SDK development is sandboxing for determinism. There are two main impediments to determinism: state isolation and restricted API use (e.g. disk IO). Ignoring the latter, if we can get the former it may be worth it. JS is lucky in that it can spin up a new V8 instance. But it is a nightmare for Go and Java and .NET and other devs to remember that they cannot share state. We expended a lot of effort making a Python sandbox just to get this state isolation so that globals don't pollute.

We will have to evaluate performance to see if this can work, and we can provide a way to opt-out of Ractors (for the most part none of the Ractor stuff would be visible to users unless they wanted to do bad things like expose something shareable).

Fibers as the basis for scaling workflow executions.

We will definitely still be using fibers (we will have our own fiber scheduler since we need deterministic coroutines), but if we can get state isolation by default, it may be worth the performance penalty. This is more of a phase 2 thing though, workflows aren't going to be done now.

Interestingly enough, deterministic isolated Temporal workflows is like one of the only good use cases I have seen for Ractors, heh. But it's subject to testing to make sure the good limitations are good and the bad limitations aren't too bad (and of course we can allow it to be disabled). Most users will never know there are Ractors under the hood powering workflows.

They also still output scary warnings, even in Ruby 3.3:

This is scary. Maybe it won't be ready. We will evaluate in phase 2.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, that makes sense! If it's only for workflow-internal things, I think Ractors sound great.

But if the goal is to isolate global state even between activities, I could see that being very challenging with Ractors. For example, I'm not sure if ActiveRecord would would play nicely with Ractors, because (AFAIK) the connection pool wouldn't be able to be shared between Ractors.

Interestingly enough, deterministic isolated Temporal workflows is like one of the only good use cases I have seen for Ractors, heh.

Same, TBH 😅

I want to use them, but haven't really found a use case yet, especially given their current limitations :(

Copy link
Member Author

@cretz cretz Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the goal is to isolate global state even between activities, I could see that being very challenging with Ractors

Definitely not the goal. Activities will not have anything to do with Ractors at all. Activities are just normal Ruby code where you can do what you want (we will allow different executors so people can choose whether to run fiber/async vs threaded (the default)). In fact, it is a very common use case to share state between activities. For instance, it is very normal to have multiple activity invocations share a database client. This is why we will allow users to instantiate the activity class, meaning the same instance will be called on each attempt (but if you pass the class in, we will instantiate per attempt).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! Thanks for helping me understand!

* API key, RPC metadata, HTTP CONNECT proxy support, etc
* Loggers with contextual information
* Full error hierarchy with `Temporalio::Error` as the root
* ❓ Are there any Ruby libraries so widely adopted they might as well be in the standard library and therefore deserve

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ActiveSupport is probably the big contender here, but TBH I don't know how much the Temporal would really benefit from having it available.

Optional support for things like ActiveSupport::Duration might be nice for things like start_delay, but that could be a pain to implement (& maintain) alongside normal Numeric arguments.

Copy link
Member Author

@cretz cretz Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 We definitely plan to have samples showing integrations, but not sure we need an explicit integration for this. Whatever integrations we do have would be optional/additional because it is really important that we have as few dependencies as possible in the primary SDK because our dependency constraints become users' transitive constraints. I believe we can have only 1 dependency (protobuf) at this time. temporalio-opentelemetry is the only additional gem I can think we'll have at first. I expect active support, rails, etc samples to exist though.

* Workflow handles exist with all relevant calls.
* Note that we use the kwargs constructor + options struct pattern so that a user can access duped options, change one,
and reconstruct via `Client.new(**my_new_options.to_h)`
* ❓Is there a better way to allow people to easily change an option? We didn't want a `with` pattern because, at

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One very common pattern in Ruby is to have config methods (including some initializers) accept an optional block that yields the config object for modification:

class Client
  def initialize(**options)
    @config = Configuration.new(**options)
    yield @config if block_given?

    # ...
  end
end
client = Client.new do |config|
  config.namespace = "my-namespace"
end

I think that would feel very natural for many Rubyists, but might be a pretty large departure from how the other SDKs are configured?

Copy link
Member Author

@cretz cretz Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For client connect and initialization (and some others) we are wanting keyword arguments for all but one or two required positional args since the vast majority are optional and we're not really wanting multiple ways to configure the client (e.g. allow both block and kwargs). In this case it's more like Client.connect("localhost:7233", "my-namespace") which is a shortcut for Client.new(Connection.new("localhost:7233"), "my-namespace"). There is a PR coming either here in a couple of hours or early tomorrow that will show this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the intentionally-limited PR: temporalio/sdk-ruby#155. Would love feedback there, I sprinkled some PR comments throughout.

* Activity can choose its executor, default is `:default`.
* See worker section for more information on executors.
* Cancellation is reported to the executor.
* For thread pool executor (the default), this raises a cancellation exception via `Thread.raise`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this means that the executor will raise an exception on the activity by calling e.g. activity_thread.raise CancellationError, that could potentially be really dangerous. Ruby's built-in Timeout module uses this approach, and it can be problematic:

That being said, the SDK would probably be fine >99% of the time by using Thread.handle_interrupt, similar to Sidekiq:

https://github.com/sidekiq/sidekiq/blob/fdd82b797fb07f5c4e797f742b991874734493f7/lib/sidekiq/processor.rb#L181-L182

But even that can be really subtle: sidekiq/sidekiq#6017 😬

Copy link
Member Author

@cretz cretz Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was a topic of discussion in Python too where this kind of thing is even less common (we have to make a C call to https://docs.python.org/3/c-api/init.html#c.PyThreadState_SetAsyncExc, it's not even in the standard library). We can provide a way to opt-out of this (and we will provide a "shield" way to ensure certain code runs), but we've found it's dangerous to let runaway activities continue to run that cannot get interrupted. If there is some other model in the standard library for propagating cancellation to threaded tasks, we'd like to use it.

I will make a note to make sure we use handle_interrupt for our exception only and make sure to have tests demonstrate other forms of interrupt.

Comment on lines +73 to +74
* 💭 Why no `Temporalio::Common`? Ruby expects to have common things where they make sense instead of under a `Common`
module.
Copy link
Member Author

@cretz cretz Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have to walk this back a bit. It is unreasonable to have Temporalio::WorkflowIDReusePolicy, Temporalio::WorkflowIDConflictPolicy, etc cluttering the top-level (or is it?). However, we don't want RetryPolicy to be in Common. So I guess we may need some kinda hybrid rule that says use the Common module for small or lesser used common classes?

Maybe we should just have common_enums.rb at that defines top-level, common enums, but still under Temporalio module.

Comment on lines +391 to +395
* It has a class method for `current` that returns from thread/fiber-local and class method shortcuts for all instance
methods.
* 💭 Why class methods equivalents? `Temporalio::Activity::Context.heartbeat` is easier to read than
`Temporalio::Activity::Context.current.heartbeat`. In .NET we made them get current instance first because that's
normal there, and in Python we only offered package-level methods because that's normal there.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* It has a class method for `current` that returns from thread/fiber-local and class method shortcuts for all instance
methods.
* 💭 Why class methods equivalents? `Temporalio::Activity::Context.heartbeat` is easier to read than
`Temporalio::Activity::Context.current.heartbeat`. In .NET we made them get current instance first because that's
normal there, and in Python we only offered package-level methods because that's normal there.
* It has a class method for `current` that returns from thread/fiber-local.

I think we are going to walk back this decision for also having class methods. I think it's not really common in Ruby to define class and instance methods of the same name, and Singleton and others show that people are ok calling .current or something.


## Clients

`Temporalio::Client` is a namespace-specific client and `Temporalio::Client::Connection` is a namespace-independent

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporalio::Client::Connection is a namespace-independent connection

Are you still holding on to this statement, given the semantic change introduced by Cloud's API Keys? I'm just curious, not arguing either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Connection is the Rust Client and the CloudOperationsClient here (as in other SDKs) will leverage the Connection. Can see this separation in practice already at main of https://github.com/temporalio/sdk-ruby (and in Python SDK and .NET SDK).

Copy link

@mjameswh mjameswh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants