Replies: 1 comment 2 replies
-
What about adding a new and (initially) optional property
This doesn't smell at all to me ☝️ . If you want to have more than one binding, you need to name them. At the same time, the current behavior of Edit to add: what this is buying, is that users can both use auto-discovery and also customize the path of their single discovered binding if they like. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This started out as just me collecting my thoughts related to some
source-http-ingest
changes to bring it in line with the newresource_path
merging. But it was tricky to see how to do this without breaking things, and it ended up leading to what I think is a pretty reasonable conclusion aboutresource_path
s and discovery merging. So I apologize if this is difficult to follow. Please bear with me.The main goal here was to avoid introducing breaking changes in how connectors handle
resource_path
s. Overall, it was very easy to avoid any breaking changes, butsource-http-ingest
proved to be kind of an exeption.The current resource config for
source-http-ingest
is defined like so:The
path
field corresponds to the URL path that the server will accept for requests to add data. Thepath
is certainly the most reasonable thing to use for theresource_path
, but the first-order problem is that it's optional, and not set at all by default. And if it's not specified, then it defaults to the full name of the flow collection that's thetarget
of the binding. So for example, a typical binding would look like:and the resulting
resource_path
would beacmeCo/foo/webhook-data
.This means that discover cannot emit an equivalent
resource_path
, and thus the next discover operation will remove any existing bindings. Luckily, of the 19 livesource-http-ingest
captures, all 19 of them have auto-discover disabled, and also have completely default bindings. So, the next discover (which will have been triggered manually via the UI) will remove their existing bindigs. But it seems like any newly discovered binding would be identical to the ones that are already there.Obviously the way we model the
resource
config is not ideal here. Minimally, we must be able to return an identicalresource_path
from both the discover and validate RPCs, which is impossible given the current behavior ofpath
. We do at least have some ability to change how we determine theresource_path
, since there's no live captures withautoDiscover
enabled. But we also need to take care not to change the URL paths that are used for existing captures.Ideally, we would allow the 19 existing captures to keep using their same URLs, and also allow them to enable
autoDiscover
.Using the
path
as theresource_path
seems like the most reasonable mapping. In other words, the "resource" that the source-http-ingest connector can be said to be the namespace of URL paths.Here's what I think we'd need to do in order to make that work and retain seamless compatibility for all existing tasks:
path
as part of theresource
returned bydiscover
path
that's used in returned from discovers would just bewebhook-data
, not a full collection namepath
to benull
in theresource
and retain the currentvalidate
behavior that falls back to using thecollection
as theresource_path
.validate
to require a non-nullpath
once all existing captures have been updatedpath
or the collection nameThat last point is a little gross, but necessary to prevent breaking existing captures if they re-discover and publish.
Left turn?
This is where we intersect with another recent discussion about
resource_path
s and the merge behavior related to "custom" bindings. Custom bindings here are defined as any bindings that don't appear in the discover output, as determined by their resource path. Forsource-http-ingest
, this effectively means that changing the URL path would make this a "custom" binding, which as things are currently will result in the binding being removed by subsequent discover operations.So what if we were less fussy about the
resource_path
modeling the namespace of URL paths, and instead just said is any old string identifier? The update tosource-http-ingest
could then be to simply hard-code theresource_path
to some constant, and let thepath
continue to work as it does tooday and all existing captures are fully compatible.But this raises the question of what happens if you have two capture bindings that both have the same
resource_path
. This could technically happen today, but it would be guaranteed to occur if we hard-coded everyresource_path
to the same constant insource-http-ingest
.For materializations, this would cause a validation error because we explicitly try to prevent users from materializing two collections into the same database table.
But I would propose that we take a slightly different view of
resource_path
, at least for captures. Why not allow multiple capture bindings to have the sameresource_path
? To allow it only requires a tiny, but subtle change to the way we implement the discover merge. It would provide a simple solution for how to handle resource paths insource-http-ingest
, and would also provide a workable solution for supporting "custom" bindings with discovery in other connectors.The discover merge logic would just need to pull in every live binding that matches each discovered binding, instead of just the first matching binding as it does today. In other words, today we expect and assume that live bindings match 1-1 with discovered bindings, and we choose one arbitrarily if that turns out not to be the case. But if we just allow the merge logic to pull in every matching live binding, then we can gain some useful flexibility. I'm inclined at this point to move forward with allowing multiple capture bindings to have the same
resource_path
, but would like to pause and see if anyone else has thoughts on the matter.Beta Was this translation helpful? Give feedback.
All reactions