Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Connectors using an index yaml file, instead of individual json files #1046

Merged
merged 11 commits into from
Nov 30, 2020

Conversation

cgardens
Copy link
Contributor

@cgardens cgardens commented Nov 22, 2020

Note: Number of files in this PR looks scary, but it is really only 5 files to review. The rest is just changes in generated files.

What

  • When adding a new connectors, a contributor needs to manual add files in the airbyte-config:init following an exacting format. This includes them needing to generate a unique uuid themselves. That uuid must be placed in the file and also be the name of the file. This burden leads to bug and is unintuitive to a contributor. It is an artifact of the records being written in that directory being in the format that is used by the database.

How

  • This PR allows a contributor to add connectors to a yaml file. The contributor is now only required to provide information that is intuitive to them (no more random uuids).

Open Questions

  • Not sure that I got this abstraction quite right yet. The is in a sort of in between spot right now RepositorySeed. It pretends that it is just stupidly adding uuids to objects that are passed to it. But it also assumes that a name attribute on those objects. Either it should be configurable how it determines how to generate its UUID, or it should just go ahead and directly reference source and destination. I think we can probably make the former work.

Checklist

  • Update docs
  • Add tests

Recommended reading order

  1. airbyte-config/init/src/main/java/io/airbyte/config/init/RepositorySeed.java
  2. airbyte-config/init/src/main/resources/seed/source_definitions.yaml
  3. airbyte-config/init/src/main/resources/seed/destination_definitions.yaml
  4. airbyte-config/init/build.gradle
  5. the rest (they are just deleting the old seeded files and replacing them with the new ones--the uuids change).


while (elements.hasNext()) {
final JsonNode element = Jsons.clone(elements.next());
final String name = element.get("name").asText();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See PR comment at the top about how much this class should assume about the structure of the input objects.

}
names.add(name);

final UUID uuid = UUID.nameUUIDFromBytes(name.getBytes(Charsets.UTF_8));
Copy link
Contributor Author

@cgardens cgardens Nov 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm on the fence on this. i like the idea of being able to generate the uuids deterministically from the configs in some way. the issue is that if someone ever changes the name of a connector then the uuid will change. right now i don't think that causes any immediate problems, but it could be really confusing in our version migration story. if a connector name ever changes we need to make sure its uuid gets propagated to any already persisted config that uses the old one. a few options... not in love with any of these:

  1. name field is never allowed to change. if a display name needs to change we add that as an optional attribute. (bad because not immediately intuitive to a contributor, easy to mess up.)
  2. generate uuid based off of position in the list. if an integration is ever removed then a placeholder must be left in the list. (bad because not immediately intuitive to a contributor, and cluttering up the list with placeholders seems not great, easy to mess up)
  3. user still needs to generate and hardcode a uuid, but they do it in the yaml and don't need to worry about the file name and the uuid in the object in the file being the same. keeps the yaml which is a bit easier to work with. we still validate that all uuids are unique (still generating uuids is annoying, but ultimately this is the most bullet proof solution, so if we can't think of something better, it is probably the right option).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about we use the image name for the uuid? (airbyte/destination-bigquery) ?

It gives us two things:

  • if someone tweaks the integrations and publishes it on their image repo, it automatically gets a new uuid
  • it is deterministic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this. It is the same problem as using name though. It means we better never change the underlying image name that we are using! I think the cases where that might be something we do is in cases like the file source, were we have considered exposing it at multiple separate sources but all with the same image. there are things we could do to tweak the name, but it seems like weird incidental complexity to add.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that so then we should have the uuid in the yaml. That gives us the best of both world for the small cost of generating a random uuid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name as more chances to change because it is just a matter of modifying an input whereas the docker repo is a bit more involve. so whichever we choose we should pick the one that is the most stable.
But in the end nothing beats forcing the uuid in the yaml

}
names.add(name);

final UUID uuid = UUID.nameUUIDFromBytes(name.getBytes(Charsets.UTF_8));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about we use the image name for the uuid? (airbyte/destination-bigquery) ?

It gives us two things:

  • if someone tweaks the integrations and publishes it on their image repo, it automatically gets a new uuid
  • it is deterministic

@@ -0,0 +1,16 @@
- name: Local CSV
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this.

@@ -0,0 +1,64 @@
- name: "exchangeratesapi.io"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this.

@@ -34,3 +34,5 @@ dependencies {
implementation files(project(':airbyte-integrations:bases:base-java').airbyteDocker.outputs)
// integrationTestImplementation files(project(':airbyte-integrations:bases:base-java').airbyteDocker.outputs)
}

compileJava.dependsOn(':airbyte-integrations:connectors:source-mysql:build')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a hack, we should at least have a comment explaining why and how to get rid of it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be here.

davydov-d added a commit that referenced this pull request Nov 29, 2022
davydov-d added a commit that referenced this pull request Nov 29, 2022
davydov-d added a commit that referenced this pull request Nov 29, 2022
* #1046 source twilio: fix generating date ranges

* #1046 source twilio - upd changelog

* #1046 source twilio: add freezegun to requirements

* #1046 source twilio: fix incremental tests

* #1046 source twilio: fix expected records

* #1046 source twilio - fix expected records

* #1046 source twilio: cut expected records

* #1046 source twilio: cut expected records

* auto-bump connector version

Co-authored-by: Octavia Squidington III <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants