Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 New source: TiDB #11283

Merged
merged 12 commits into from
Apr 20, 2022
Merged

🎉 New source: TiDB #11283

merged 12 commits into from
Apr 20, 2022

Conversation

Daemonxiao
Copy link
Contributor

What

Add new source TiDB.

Recommended reading order

  1. docs/integrations/tidb.md
  2. airbyte-integrations/connectors/source-tidb/src/main/java/io/airbyte/integrations/source/tidb/TiDBSource.java
  3. /airbyte-integrations/connectors/source-tidb/src/main/java/io/airbyte/integrations/source/tidb/TiDBSourceOperations.java

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/SUMMARY.md
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Tests

Unit 1647846914(1)
Integration 1647846838(1)
Acceptance 1647846654(1)

@CLAassistant
Copy link

CLAassistant commented Mar 21, 2022

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Mar 21, 2022
@Daemonxiao
Copy link
Contributor Author

refer issue #10891

@marcosmarxm
Copy link
Member

Thanks for the contribution @Daemonxiao I'll review it tomorrow

@marcosmarxm marcosmarxm self-assigned this Mar 23, 2022
@marcosmarxm
Copy link
Member

@Daemonxiao is it not possible to use TiDB using the MySQL connector? The implementation looks is duplicating code from MySQL connector.

import org.slf4j.LoggerFactory;


public class TiDBSource extends AbstractJdbcSource<MysqlType> implements Source {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the implementation of mysql-strict-encryt connector. So far I didn't find any change from Mysql that is specifically applied to TiDB.

@zhangyangyu
Copy link
Contributor

zhangyangyu commented Mar 24, 2022

@Daemonxiao is it not possible to use TiDB using the MySQL connector? The implementation looks is duplicating code from MySQL connector.

Thanks for your review @marcosmarxm. TiDB is a MySQL compatible database, just like CockroachDB vs postgresql. So in most cases using MySQL connector or even JDBC connector works. But the compatibility is not 100%. For example, TiDB has its own internal system databases(present in code), TiDB doesn't support spatial types(present in doc). And the biggest incompatibility is TiDB doesn't support MySQL binlog at all so currently MySQL connector CDC would not function at all(present in both code and doc). And I expect in future the differences might vary. So we'd like to develop a dedicated TiDB source connector although currently it replicate many MySQL connector code.

@marcosmarxm
Copy link
Member

@Daemonxiao is it not possible to use TiDB using the MySQL connector? The implementation looks is duplicating code from MySQL connector.

Thanks for your review @marcosmarxm. TiDB is a MySQL compatible database, just like CockroachDB vs postgresql. So in most cases using MySQL connector or even JDBC connector works. But the compatibility is not 100%. For example, TiDB has its own internal system databases(present in code), TiDB doesn't support spatial types(present in doc). And the biggest incompatibility is TiDB doesn't support MySQL binlog at all so currently MySQL connector CDC would not function at all(present in both code and doc). And I expect in future the differences might vary. So we'd like to develop a dedicated TiDB source connector although currently it replicate many MySQL connector code.

But it's possible to reduce the duplicate code right. The SQL operations can be the same because it's a subset of MySQL. You only need to create the spec and the building spec (see the connector I commented previously). Did you try to use only creating the TiDBSource.java file? Can you point in your code where is different from the MySQL implementation? Why not use the MySQL connector and only create a TiDB documentation explaining you can use the MySQL connector with ONLY the default method?

@zhangyangyu
Copy link
Contributor

zhangyangyu commented Mar 25, 2022

But it's possible to reduce the duplicate code right.

Definitely. We change the implementation from extending JDBC to MySQL, so no need to duplicate much code.

The SQL operations can be the same because it's a subset of MySQL. You only need to create the spec and the building spec (see the connector I commented previously).

Simply extending the spec is not ideal since the difference is not only in configuration but also code behavior. The difference is small for now, but we also want to leave the possibility for future. For example, add CDC ability, directly accessing the storage layer of TiDB.

Can you point in your code where is different from the MySQL implementation?

Please review the new implementation. :-) I think it's more clear now.

import java.sql.ResultSet;
import java.sql.SQLException;

import static io.airbyte.db.jdbc.JdbcConstants.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please convert to single import here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you review. I've changed it.
convert to single import

Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments

Comment on lines +70 to +74
return Set.of(
"information_schema",
"metrics_schema",
"performance_schema",
"mysql");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal of TiDB is exactly as MySQL ones?

Copy link
Contributor

@zhangyangyu zhangyangyu Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. TiDB is MySQL compatible but it's implemented from scratch, in golang and rust. So at least for these internal databases, it's definitely not 100% compatible. It doesn't support SYS database https://docs.pingcap.com/tidb/stable/mysql-compatibility#unsupported-features and has its own METRICS_SCHEMA database https://docs.pingcap.com/tidb/stable/metrics-schema#metrics-schema.

Comment on lines 39 to 138
@Override
public MysqlType getFieldType(JsonNode field) {
try {
final MysqlType literalType = MysqlType.getByName(field.get(INTERNAL_COLUMN_TYPE_NAME).asText());
final int columnSize = field.get(INTERNAL_COLUMN_SIZE).asInt();

switch (literalType) {
// TINYINT(1) are interpreted as boolean
case TINYINT, TINYINT_UNSIGNED -> {
if (columnSize == 1) {
return MysqlType.BOOLEAN;
}
}
// When CHAR[N] and VARCHAR[N] columns have binary character set, the returned
// types are BINARY[N] and VARBINARY[N], respectively. So we don't need to
// convert them here. This is verified in MySqlSourceDatatypeTest.
}

return literalType;
} catch (final IllegalArgumentException ex) {
LOGGER.warn(String.format("Could not convert column: %s from table: %s.%s with type: %s (type name: %s). Casting to VARCHAR.",
field.get(INTERNAL_COLUMN_NAME),
field.get(INTERNAL_SCHEMA_NAME),
field.get(INTERNAL_TABLE_NAME),
field.get(INTERNAL_COLUMN_TYPE),
field.get(INTERNAL_COLUMN_TYPE_NAME)));
return MysqlType.VARCHAR;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this function for now, you're not overwritten it. If in the future will be a change let this happens with it, but today the behavior is equal to MySQL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly same. bit(1) is not interpreted as boolean.

Comment on lines 30 to 61
case BIT -> {
putBinary(json, columnName, resultSet, colIndex);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link to TiDB docs showing BIT works only a binary?

Copy link
Contributor

@zhangyangyu zhangyangyu Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formal doc of BIT is https://docs.pingcap.com/tidb/stable/data-type-numeric#bit-type. It explicitly documents TINYINT(1) as an alias of boolean https://docs.pingcap.com/tidb/stable/data-type-numeric#boolean-type. And for TiDB replication tools, none of its supported protocol interprets BIT(1) as boolean: https://docs.pingcap.com/tidb/stable/ticdc-canal-json#sql-type-field.

TiDB is built from scratch to be compatible with MySQL 5.7 so don't suffer history debts.

@marcosmarxm
Copy link
Member

Thanks @Daemonxiao I requested to the Connector team to give a final review of your contribution.

@marcosmarxm
Copy link
Member

/test connector=connectors/source-tidb

Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Daemonxiao

@marcosmarxm marcosmarxm temporarily deployed to more-secrets April 20, 2022 00:18 Inactive
@marcosmarxm marcosmarxm temporarily deployed to more-secrets April 20, 2022 00:18 Inactive
@marcosmarxm marcosmarxm temporarily deployed to more-secrets April 20, 2022 00:21 Inactive
@marcosmarxm marcosmarxm temporarily deployed to more-secrets April 20, 2022 00:21 Inactive
@marcosmarxm marcosmarxm merged commit 5164e12 into airbytehq:master Apr 20, 2022
@Daemonxiao
Copy link
Contributor Author

@marcosmarxm Thanks for your work. BTW, do we need to provide the TiDB logo for Airbyte UI?

@marcosmarxm
Copy link
Member

@marcosmarxm Thanks for your work. BTW, do we need to provide the TiDB logo for Airbyte UI?

you can add into airbyte-config/init/src/main/resources/icons
also update the seed file to receive the icon

suhomud pushed a commit that referenced this pull request May 23, 2022
* add new source tidb

* formate java code style and add item in SUMMARY.md

* update doc

* Update airbyte-integrations/connectors/source-tidb/src/main/resources/spec.json

uptdate doc

Co-authored-by: Xiang Zhang <[email protected]>

* Update airbyte-integrations/connectors/source-tidb/README.md

* Update docs/integrations/sources/tidb.md

Co-authored-by: Xiang Zhang <[email protected]>

* Update docs/integrations/sources/tidb.md

Co-authored-by: Xiang Zhang <[email protected]>

* add seed and doc changelog

* run format

* regenerate seed file

Co-authored-by: Xiang Zhang <[email protected]>
Co-authored-by: marcosmarxm <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants