Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 New level of abstraction for relational database sources #4024

Closed
DoNotPanicUA opened this issue Jun 10, 2021 · 4 comments · Fixed by #4123
Closed

🎉 New level of abstraction for relational database sources #4024

DoNotPanicUA opened this issue Jun 10, 2021 · 4 comments · Fixed by #4123
Assignees
Labels
area/connectors Connector related issues lang/java type/enhancement New feature or request

Comments

@DoNotPanicUA
Copy link
Contributor

DoNotPanicUA commented Jun 10, 2021

Tell us about the problem you're trying to solve

At this moment we provide only source connectors with the JDBC driver for relational databases. But there are relational database sources without public JDBC drivers which we want to cover. For example, database BigQuery.
To develop a connector for such sources we need to provide a new level of abstraction.

Describe the solution you’d like

Take AbstractJdbcSource.java and move common logic to a new class like AbstractRelationsDatabaseSource.java.

Analysis of what can be reused:

Group Name Usage Can be reused Is common Comment
Check check Implemented basic source function. It runs check operations for a database source and returns common status class Yes Yes
Check getCheckOperations Returns list operations required for the check. Implementation specific for the JDBC No No
Discover discover Implemented basic source function. It runs a list of potential streams for the corresponding source Yes Yes
Discover discoverPrimaryKeys Returns a list of primary keys for listed tables No Yes
Discover getTables Returns list of tables Yes Yes
Discover getExcludedInternalSchemas defines list of system namespaces to exclude them in spec No Yes
Discover aggregatePrimateKeys Utility method. Yes No
Discover assertColumnsWithSameNameAreSame Utility method. Yes No
Read read Implemented basic source function. Read data from streamers. Yes Yes
Read discoverInternal collects metadata about tables and columns for reading No Yes
Read getIncrementalIterators Proxy method to get all Increment iterators Yes Yes
Read getFullRefreshIterators Proxy method to get all FullSync iterators Yes Yes
Read getSelectedIterators Creates iterators for data read Yes Yes
Read createReadIterator Creates iterator for streamer(table) Yes Yes
Read getFullRefreshStream Proxy method to get iterator Yes Yes
Read getIncrementalStream Proxy method to get iterator Yes Yes
Read getMessageIterator Proxy method Yes Yes
Read queryTableFullRefresh Query data from source and put it into iterator Yes Yes
Read queryTableIncremental Query data from source and put it into iterator No Yes
Common createDatabase No Yes New Database implementation should be provided.
Common toJdbcConfig returns configuration in Json format No Yes
Common TableInfo Contains meta data about tables. Used for Discover Yes Yes
Common TableInfoInternal Contains meta data about tables for internal use Yes Yes
Common ColumnInfo Contains meta data about table columns. Yes Yes New Enum with types should be provided and class will be parameterized for different type enums
@DoNotPanicUA DoNotPanicUA added type/enhancement New feature or request lang/java labels Jun 10, 2021
@DoNotPanicUA DoNotPanicUA self-assigned this Jun 10, 2021
@sherifnada sherifnada added the area/connectors Connector related issues label Jun 10, 2021
@sherifnada
Copy link
Contributor

@DoNotPanicUA this looks great, very thorough! thank you for coming up with this. I think this looks pretty good as is, there may be some small details to work out here and there (especially with some method signatures which assume JDBC types), but overall I think this looks doable.

@subodh1810 can you also review this? I want to make sure this doesn't interact in any weird ways with the CDC refactor you had in mind. Is the division Andrii is proposing here congruent with what you have in mind?

@DoNotPanicUA
Copy link
Contributor Author

@sherifnada @subodh1810 @yaroslav-hrytsaienko
Small update from my side.
I've started refactoring here and localized the biggest "rock" here. There is no database abstraction. So, we need to rethink JdbcDatabase first.
My vision is that we should try to reach SQL database abstraction.

@subodh1810
Copy link
Contributor

We just need to make sure that the current contract between AbstractJdbcSource and MySqlSource or PostgresSource around the incremental iterators stays the same for CDC

@DoNotPanicUA
Copy link
Contributor Author

How to implement SQL relational source (non Jdbc)?

Preparations:

  • Implement source data types SqlType.java
  • Build a mapping between Airbyte and source types. Example, for JDBC sources JdbcUtils.getType()
  • Implement source database SqlDatabase.java
  • Add the database creation into factory class Databases.java

Source implementation:

  • implements class AbstractRelationalDbSource.java using your new SqlDatabase and SqlType implementation as class parameters.

    Source database initialization

    • toDatabaseConfig(sourceConfig) - create a database config using source config. The source config should be in line with the spec.json file.
    • createDatabase(databaseConfig) - the database instancing using config file.

    Source specific utilities

    • getExcludedInternalNameSpaces - define a list of system namespaces in order to skip system tables at the discover operation.
    • getQuoteString - define quote symbol for the source. A constant value can be used here.
    • getType - call your mapping between Airbyte and source types.

    Check source

    • getCheckOperations - get list of actions required for the source validation.

    Discover source

    • discoverInternal - implement a fetching list of source tables.
      Note! The result contains the generic class CommonField<T>.java. Use source data type, which you created at the Preparation step.
    • discoverPrimaryKeys - implement a fetching a list of primary keys for a list of tables.

    Read source

    • queryTableIncremental - implement incremental stream initialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues lang/java type/enhancement New feature or request
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants