Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested Streams via Actor Definition #21577

Merged
merged 20 commits into from
Jan 27, 2023

Conversation

evantahler
Copy link
Contributor

@evantahler evantahler commented Jan 19, 2023

Closes #10872, (inspired by https://github.com/airbytehq/oncall/issues/1303#issuecomment-1381772903 OC issue). Protocol Change Process Document.

What?

This PR allows connectors to suggest which streams are pre-selected for users creating a new connection. This is accomplished by adding a new “suggestedStreams" block to the source actor definitions. This change is backwards-compatible and does not require the re-publishing of existing connectors.

For many of our API sources, the connector provides a lot of streams. This is great because Airbyte connectors come with "batteries included" to enable the widest range of data-syncing use-cases. However, for some sources, most users probably don't want all of the streams. Some of those streams may be extraneous, but worse, they may be slow, expensive, and often make the syncing experience for the more "primary" streams worse. Perhaps this stream has a slower API, consumes more API "credits", or harms the global rate limit in a worse-than-normal way.

This PR allows each connector to tune the default stream selections to provide a better first-time sync experience.

Screenshot 2023-01-18 at 9 36 49 PM

The Convenience Principle: People will do what's convenient, and they will avoid what's inconvenient. All else being equal, to increase how many people do something, make it more convenient, and to decrease, make it less convenient

This PR comes with an example for soruce-github, a source with particularly egregious streams, which we don't think many people would opt-into.

Technical notes:

  • No API or webapp changes needed
  • All existing connectors will work the same way - defaulting to selecting all streams for a new connection
  • All that connector developers need to do is supply a list of suggestedStreams in source_definitions.yaml if they want to select only some streams. Otherwise, all streams will remain suggested as they are today.

@octavia-squidington-iv octavia-squidington-iv added area/api Related to the api area/platform issues related to the platform area/server labels Jan 19, 2023
@evantahler evantahler temporarily deployed to more-secrets January 19, 2023 02:32 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 19, 2023 02:32 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 19, 2023 05:13 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 19, 2023 05:13 — with GitHub Actions Inactive
@evantahler evantahler changed the title Suggested Streams in in the Actor Definition Suggested Streams via Actor Definition Jan 19, 2023
@evantahler evantahler marked this pull request as ready for review January 19, 2023 05:23
@evantahler evantahler requested a review from a team as a code owner January 19, 2023 05:23
@evantahler evantahler requested review from sherifnada and a team January 19, 2023 05:23
@evantahler evantahler temporarily deployed to more-secrets January 19, 2023 05:24 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 19, 2023 05:24 — with GitHub Actions Inactive
Comment on lines +607 to +621
suggestedStreams:
streams:
- branches
- comments
- issues
- organizations
- pull_requests
- repositories
- stargazers
- tags
- teams
- users
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the new interface for connector developers who only want some streams selected by default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit surprised that this is in the definition yaml file when the schema itself is not present in that file but requires calling a connector function and the suggestedStreams are definitely tied to the schema.

I'm not advocating moving that to a connector function but that might be a good signal that we would benefit from having the schema show up in the definition for API connectors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... that was the original proposal (#21437) complete with protocol change. For posterity, I added a comment in that PR as to why we chose to move this out of the protocol, and you can read more details here.

In the near future, all of "actor definitions" will be replaced with "connector metadata", and things will hopefully begin to make more sense.

@evantahler evantahler temporarily deployed to more-secrets January 19, 2023 05:45 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 19, 2023 05:45 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 19, 2023 05:51 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 19, 2023 05:52 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 19, 2023 23:21 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 19, 2023 23:21 — with GitHub Actions Inactive
@github-actions
Copy link
Contributor

github-actions bot commented Jan 19, 2023

Airbyte Code Coverage

File Coverage [84.5%] 🍏
V0_40_28_001__AddSuggestedStreams.java 100% 🍏
ConfigWriter.java 90.13% 🍏
DbConverter.java 79.6% 🍏
Total Project Coverage 24%

@airbytehq airbytehq deleted a comment from github-actions bot Jan 19, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jan 19, 2023

Platform Test Results

   234 files  ±0     234 suites  ±0   25m 16s ⏱️ +44s
1 612 tests ±0  1 601 ✔️ ±0  11 💤 ±0  0 ±0 
1 632 runs  ±0  1 621 ✔️ ±0  11 💤 ±0  0 ±0 

Results for commit 4ec6ba0. ± Comparison against base commit e39ee94.

♻️ This comment has been updated with latest results.

@evantahler evantahler temporarily deployed to more-secrets January 20, 2023 00:37 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 20, 2023 00:38 — with GitHub Actions Inactive
@evantahler evantahler requested a review from a team January 20, 2023 17:51
@evantahler evantahler temporarily deployed to more-secrets January 25, 2023 00:59 — with GitHub Actions Inactive
@timroes
Copy link
Collaborator

timroes commented Jan 25, 2023

Just to double check (since I haven't ran this locally, just looked at the code): This selection only takes place when we newly create a connection for the first time, not every time we refresh the catalog? I.e. an already saved connection where the user does a "refresh from source" won't get all their stream selected states overwritten by this logic?

@evantahler evantahler temporarily deployed to more-secrets January 25, 2023 16:58 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 25, 2023 16:58 — with GitHub Actions Inactive
@evantahler
Copy link
Contributor Author

Just to double check (since I haven't ran this locally, just looked at the code): This selection only takes place when we newly create a connection for the first time, not every time we refresh the catalog? I.e. an already saved connection where the user does a "refresh from source" won't get all their stream selected states overwritten by this logic?

Yep!
I've also added additional detail to the API response so in the future we can highlight "suggested by not selected" streams if we want

@evantahler evantahler temporarily deployed to more-secrets January 26, 2023 20:17 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 26, 2023 20:17 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 26, 2023 21:12 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 26, 2023 21:12 — with GitHub Actions Inactive
Copy link
Contributor

@erohmensing erohmensing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking this out, looks exciting!

properties:
streams:
type: array
description: An array of streams that this connector suggests the average user will want. SuggestedStreams not being present for the source means that all streams are suggested. An empty list here means that no streams are suggested.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for thinking this through and laying it out, I'm sure it'll be helpful in some weird issue a year from now

@@ -0,0 +1,14 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add more metadata?
like a description or a title that would indicate what use case the suggested streams are trying to achieve?
It seems your github example exclude some streams related to workflow for example. Those can be useful to investigate CI.
(There might be some connectors that would also benefit from multiple suggestedStream with different use case each but that might be beyond the scope here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure! I added more context in c025313.

I like the idea of context-aware suggestions for the future! That's why this is an object. @michel-tricot and I brainstormed a streams-matcher array which might one day contain the logic for database sources to deselect tables with foreign keys, or any stream coming from a non-public namespace/scheams.

Copy link
Contributor

@malikdiarra malikdiarra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that, I think the idea is good. I think we can do additional changes after that to make it even better. One of them being highlighting a bit better in the UI which streams are on and not require people to scroll past dozen of deactivated streams.

@evantahler evantahler temporarily deployed to more-secrets January 27, 2023 01:33 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 27, 2023 01:33 — with GitHub Actions Inactive
@evantahler
Copy link
Contributor Author

Thanks for that, I think the idea is good. I think we can do additional changes after that to make it even better. One of them being highlighting a bit better in the UI which streams are on and not require people to scroll past dozen of deactivated streams.

Exactly! That's why the API is now returning some additional information which can enable that

@evantahler evantahler temporarily deployed to more-secrets January 27, 2023 17:31 — with GitHub Actions Inactive
@evantahler evantahler temporarily deployed to more-secrets January 27, 2023 17:31 — with GitHub Actions Inactive
@evantahler evantahler enabled auto-merge (squash) January 27, 2023 17:36
@evantahler evantahler merged commit b9e3c8a into master Jan 27, 2023
@evantahler evantahler deleted the evan/suggested-streams-actor-def branch January 27, 2023 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Related to the api area/documentation Improvements or additions to documentation area/platform issues related to the platform
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a notion of streams which are "selected by default"
9 participants