-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggested Streams via Actor Definition #21577
Conversation
airbyte-config/config-persistence/src/main/java/io/airbyte/config/persistence/DbConverter.java
Outdated
Show resolved
Hide resolved
suggestedStreams: | ||
streams: | ||
- branches | ||
- comments | ||
- issues | ||
- organizations | ||
- pull_requests | ||
- repositories | ||
- stargazers | ||
- tags | ||
- teams | ||
- users |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the new interface for connector developers who only want some streams selected by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit surprised that this is in the definition yaml file when the schema itself is not present in that file but requires calling a connector function and the suggestedStreams are definitely tied to the schema.
I'm not advocating moving that to a connector function but that might be a good signal that we would benefit from having the schema show up in the definition for API connectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... that was the original proposal (#21437) complete with protocol change. For posterity, I added a comment in that PR as to why we chose to move this out of the protocol, and you can read more details here.
In the near future, all of "actor definitions" will be replaced with "connector metadata", and things will hopefully begin to make more sense.
airbyte-server/src/main/java/io/airbyte/server/handlers/helpers/CatalogConverter.java
Outdated
Show resolved
Hide resolved
Airbyte Code Coverage
|
Just to double check (since I haven't ran this locally, just looked at the code): This selection only takes place when we newly create a connection for the first time, not every time we refresh the catalog? I.e. an already saved connection where the user does a "refresh from source" won't get all their stream selected states overwritten by this logic? |
Yep! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking this out, looks exciting!
properties: | ||
streams: | ||
type: array | ||
description: An array of streams that this connector suggests the average user will want. SuggestedStreams not being present for the source means that all streams are suggested. An empty list here means that no streams are suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for thinking this through and laying it out, I'm sure it'll be helpful in some weird issue a year from now
...ommons-server/src/main/java/io/airbyte/commons/server/handlers/helpers/CatalogConverter.java
Show resolved
Hide resolved
@@ -0,0 +1,14 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add more metadata?
like a description or a title that would indicate what use case the suggested streams are trying to achieve?
It seems your github example exclude some streams related to workflow for example. Those can be useful to investigate CI.
(There might be some connectors that would also benefit from multiple suggestedStream with different use case each but that might be beyond the scope here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure! I added more context in c025313.
I like the idea of context-aware suggestions for the future! That's why this is an object. @michel-tricot and I brainstormed a streams-matcher
array which might one day contain the logic for database sources to deselect tables with foreign keys, or any stream coming from a non-public namespace/scheams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that, I think the idea is good. I think we can do additional changes after that to make it even better. One of them being highlighting a bit better in the UI which streams are on and not require people to scroll past dozen of deactivated streams.
Exactly! That's why the API is now returning some additional information which can enable that |
Closes #10872, (inspired by https://github.com/airbytehq/oncall/issues/1303#issuecomment-1381772903 OC issue). Protocol Change Process Document.
What?
This PR allows connectors to suggest which streams are pre-selected for users creating a new connection. This is accomplished by adding a new “suggestedStreams" block to the source actor definitions. This change is backwards-compatible and does not require the re-publishing of existing connectors.
For many of our API sources, the connector provides a lot of streams. This is great because Airbyte connectors come with "batteries included" to enable the widest range of data-syncing use-cases. However, for some sources, most users probably don't want all of the streams. Some of those streams may be extraneous, but worse, they may be slow, expensive, and often make the syncing experience for the more "primary" streams worse. Perhaps this stream has a slower API, consumes more API "credits", or harms the global rate limit in a worse-than-normal way.
This PR allows each connector to tune the default stream selections to provide a better first-time sync experience.
This PR comes with an example for
soruce-github
, a source with particularly egregious streams, which we don't think many people would opt-into.Technical notes:
source_definitions.yaml
if they want to select only some streams. Otherwise, all streams will remain suggested as they are today.