-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 New source: TiDB #11283
🎉 New source: TiDB #11283
Conversation
airbyte-integrations/connectors/source-tidb/src/main/resources/spec.json
Outdated
Show resolved
Hide resolved
…/spec.json uptdate doc Co-authored-by: Xiang Zhang <[email protected]>
Co-authored-by: Xiang Zhang <[email protected]>
Co-authored-by: Xiang Zhang <[email protected]>
refer issue #10891 |
Thanks for the contribution @Daemonxiao I'll review it tomorrow |
…marcos/test-pr-11283
@Daemonxiao is it not possible to use TiDB using the MySQL connector? The implementation looks is duplicating code from MySQL connector. |
import org.slf4j.LoggerFactory; | ||
|
||
|
||
public class TiDBSource extends AbstractJdbcSource<MysqlType> implements Source { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the implementation of mysql-strict-encryt
connector. So far I didn't find any change from Mysql that is specifically applied to TiDB.
Thanks for your review @marcosmarxm. TiDB is a MySQL compatible database, just like CockroachDB vs postgresql. So in most cases using MySQL connector or even JDBC connector works. But the compatibility is not 100%. For example, TiDB has its own internal system databases(present in code), TiDB doesn't support spatial types(present in doc). And the biggest incompatibility is TiDB doesn't support MySQL binlog at all so currently MySQL connector CDC would not function at all(present in both code and doc). And I expect in future the differences might vary. So we'd like to develop a dedicated TiDB source connector although currently it replicate many MySQL connector code. |
But it's possible to reduce the duplicate code right. The SQL operations can be the same because it's a subset of MySQL. You only need to create the spec and the building spec (see the connector I commented previously). Did you try to use only creating the TiDBSource.java file? Can you point in your code where is different from the MySQL implementation? Why not use the MySQL connector and only create a TiDB documentation explaining you can use the MySQL connector with ONLY the default method? |
Definitely. We change the implementation from extending JDBC to MySQL, so no need to duplicate much code.
Simply extending the spec is not ideal since the difference is not only in configuration but also code behavior. The difference is small for now, but we also want to leave the possibility for future. For example, add CDC ability, directly accessing the storage layer of TiDB.
Please review the new implementation. :-) I think it's more clear now. |
import java.sql.ResultSet; | ||
import java.sql.SQLException; | ||
|
||
import static io.airbyte.db.jdbc.JdbcConstants.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please convert to single import here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you review. I've changed it.
convert to single import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments
return Set.of( | ||
"information_schema", | ||
"metrics_schema", | ||
"performance_schema", | ||
"mysql"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internal of TiDB is exactly as MySQL ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. TiDB is MySQL compatible but it's implemented from scratch, in golang and rust. So at least for these internal databases, it's definitely not 100% compatible. It doesn't support SYS
database https://docs.pingcap.com/tidb/stable/mysql-compatibility#unsupported-features and has its own METRICS_SCHEMA
database https://docs.pingcap.com/tidb/stable/metrics-schema#metrics-schema.
@Override | ||
public MysqlType getFieldType(JsonNode field) { | ||
try { | ||
final MysqlType literalType = MysqlType.getByName(field.get(INTERNAL_COLUMN_TYPE_NAME).asText()); | ||
final int columnSize = field.get(INTERNAL_COLUMN_SIZE).asInt(); | ||
|
||
switch (literalType) { | ||
// TINYINT(1) are interpreted as boolean | ||
case TINYINT, TINYINT_UNSIGNED -> { | ||
if (columnSize == 1) { | ||
return MysqlType.BOOLEAN; | ||
} | ||
} | ||
// When CHAR[N] and VARCHAR[N] columns have binary character set, the returned | ||
// types are BINARY[N] and VARBINARY[N], respectively. So we don't need to | ||
// convert them here. This is verified in MySqlSourceDatatypeTest. | ||
} | ||
|
||
return literalType; | ||
} catch (final IllegalArgumentException ex) { | ||
LOGGER.warn(String.format("Could not convert column: %s from table: %s.%s with type: %s (type name: %s). Casting to VARCHAR.", | ||
field.get(INTERNAL_COLUMN_NAME), | ||
field.get(INTERNAL_SCHEMA_NAME), | ||
field.get(INTERNAL_TABLE_NAME), | ||
field.get(INTERNAL_COLUMN_TYPE), | ||
field.get(INTERNAL_COLUMN_TYPE_NAME))); | ||
return MysqlType.VARCHAR; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this function for now, you're not overwritten it. If in the future will be a change let this happens with it, but today the behavior is equal to MySQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly same. bit(1)
is not interpreted as boolean.
case BIT -> { | ||
putBinary(json, columnName, resultSet, colIndex); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you link to TiDB docs showing BIT works only a binary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formal doc of BIT is https://docs.pingcap.com/tidb/stable/data-type-numeric#bit-type. It explicitly documents TINYINT(1) as an alias of boolean https://docs.pingcap.com/tidb/stable/data-type-numeric#boolean-type. And for TiDB replication tools, none of its supported protocol interprets BIT(1) as boolean: https://docs.pingcap.com/tidb/stable/ticdc-canal-json#sql-type-field.
TiDB is built from scratch to be compatible with MySQL 5.7 so don't suffer history debts.
Thanks @Daemonxiao I requested to the Connector team to give a final review of your contribution. |
/test connector=connectors/source-tidb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @Daemonxiao
@marcosmarxm Thanks for your work. BTW, do we need to provide the TiDB logo for Airbyte UI? |
you can add into airbyte-config/init/src/main/resources/icons |
* add new source tidb * formate java code style and add item in SUMMARY.md * update doc * Update airbyte-integrations/connectors/source-tidb/src/main/resources/spec.json uptdate doc Co-authored-by: Xiang Zhang <[email protected]> * Update airbyte-integrations/connectors/source-tidb/README.md * Update docs/integrations/sources/tidb.md Co-authored-by: Xiang Zhang <[email protected]> * Update docs/integrations/sources/tidb.md Co-authored-by: Xiang Zhang <[email protected]> * add seed and doc changelog * run format * regenerate seed file Co-authored-by: Xiang Zhang <[email protected]> Co-authored-by: marcosmarxm <[email protected]>
What
Add new source TiDB.
Recommended reading order
docs/integrations/tidb.md
airbyte-integrations/connectors/source-tidb/src/main/java/io/airbyte/integrations/source/tidb/TiDBSource.java
/airbyte-integrations/connectors/source-tidb/src/main/java/io/airbyte/integrations/source/tidb/TiDBSourceOperations.java
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/SUMMARY.md
docs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Tests
Unit
Integration
Acceptance