Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the wrong parsing of jvm parameters in jdbc URL #11

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yanghua
Copy link

@yanghua yanghua commented Apr 14, 2023

Why are the changes needed?

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

@@ -477,4 +486,4 @@ public static String getCanonicalHostName(String hostName) {
public static boolean isKyuubiOperationHint(String hint) {
return KYUUBI_OPERATION_HINT_PATTERN.matcher(hint).matches();
}
}
} No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这段代码主要是对Hive JDBC连接字符串进行解析,提取其中的相关信息并填充到对象中。以下是一些注释:

  • 第 2 行:导入了一些包。
  • 第 9-10 行:引入了 Apache Commons Lang3 库,用于在检查字符串是否为空时方便使用 StringUtils 工具类。
  • 第 12 行:导入了 Thrift 相关的类。
  • 第 13 行:导入了日志相关的类。
  • 第 16-18 行:该方法通过分析 JDBC URL(Uniform Resource Locator)并返回一个 JdbcConnectionParams 对象实例。
  • 第 20-24 行:基于正则表达式来分析 URI(Uniform Resource Identifier)中的每个部分,并添加到 HashMap 对象中以创建一个包含所有属性的连接参数对象。
  • 第 26-32 行:声明并初始化了一个 confPattern 正则表达式用于解析 Hive 配置设置。
  • 第 34-38 行:从 JDBC URI 字符串中提取 Hive 配置设置。
  • 第 40-46 行:使用新的 confPattern 正则表达式替换之前的 pattern 正则表达式。将解析后的参数添加到连接参数对象的映射中。
  • 第 48-57 行:处理连接参数字符串,并将其中的键值对添加到连接参数对象中的 Hive 配置映射中。
  • 第 59-75 行:定义了一些帮助程序函数和正则表达式。

根据这段代码,以下是我的注释:

  • 在第 9-10 行中,StringUtils.isNotBlank 和 StringUtils.contains 都是非常实用的方法。可以使用这些方法来检查字符串是否为空或包含特定字符。
  • 第 26 行中 confPattern 的初始化部分似乎不太对。最后一项是 ";?" 而不是 ";?" 小数点是多余的。
  • 建议将变量命名更改为更具可读性和可维护性。
  • 传入的参数 jdbcURI、info 和 logger 可能存在空引用异常(NullPointerException),建议在使用它们之前进行 null 检查。
  • 程序中可能会出现其他的空引用异常,应逐个检查所有的 HashMap 对象,并确保它们包含了必要的键值映射。

### _Why are the changes needed?_

Support GetPrimaryKeys for Trino Fe, close apache#4320

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes apache#4321 from Yikf/primaryKeys.

Closes apache#4320

3690a2c [Yikf] Support GetPrimaryKeys for Trino Fe

Authored-by: Yikf <[email protected]>
Signed-off-by: ulyssesyou <[email protected]>
CAST: 'CAST';
AS: 'AS';
KEY_SEQ: 'KEY_SEQ';
PK_NAME: 'PK_NAME';

fragment SEARCH_STRING_ESCAPE: '\'' '\\' '\'';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这段代码是一个补丁(patch),它包含了一些变量的添加和一个字符转义片段。

添加的变量有:

  • VARCHAR:字符串类型
  • SMALLINT:短整型
  • CAST:用于进行显式类型转换
  • AS:用于设置别名
  • KEY_SEQ:主键序列
  • PK_NAME:主键名称

这些变量是用于数据库模型中的一些属性的,该补丁在数据库操作时可能会用到。

另外,代码中还定义了一个字符转义片段 SEARCH_STRING_ESCAPE,用于转义单引号和反斜杠符号。

就整体而言,这个补丁代码没有看到任何明显的风险和改进建议。

CAST LEFT_PAREN NULL AS VARCHAR RIGHT_PAREN COLUMN_NAME COMMA
CAST LEFT_PAREN NULL AS SMALLINT RIGHT_PAREN KEY_SEQ COMMA
CAST LEFT_PAREN NULL AS VARCHAR RIGHT_PAREN PK_NAME
WHERE FALSE #getPrimaryKeys
| .*? #passThrough
;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这段代码补丁看起来是在修改 SQL 查询语句,主要的变化如下:

  1. 在 "getColumns" 语句中增加了一个过滤条件(tableCatalogFilter、tableSchemaFilter、tableNameFilter 和 colNameFilter)。
  2. 增加了一个新的 "getPrimaryKeys" 语句,目前认为它没有实现任何功能,因为 WHERE 子句始终为 FALSE。
  3. 添加了一个用于跳过其他内容的通配符表达式。

如果查询语句本身没有逻辑问题,这个补丁应该不会引入太大的风险。然而,我们无法确认新增的 "getPrimaryKeys" 是否存在潜在的错误或是遗漏的实现部分,需要进行进一步的检查或完善。

val operationHandle = backendService.getPrimaryKeys(sessionHandle, null, null, null)
// The trino implementation always returns empty.
operationHandle.setHasResultSet(false)
operationHandle
case PassThroughNode() =>
backendService.executeStatement(sessionHandle, statement, configs, runAsync, queryTimeout)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这段代码在导入包后,扩展了org.apache.kyuubi.sql.plan.trino中的GetCatalogs, GetColumns, GetSchemas, GetTables, GetTableTypes, GetTypeInfo以及新增了GetPrimaryKeys,表示该类可以通过这些节点将Trino SQL翻译成Kylin实现。

在函数translatePlan中,对于不同的SQL类型,分别构建了相应的operationHandle。值得注意的是,对于新加的GetPrimaryKeys操作,trino的实现总是返回空。为了保持一致,此处也设置为空。

在风险和改进方面,由于我没有上下文信息,无法判断是否存在风险或者提供改进建议。

override def visitGetPrimaryKeys(ctx: GetPrimaryKeysContext): KyuubiTreeNode = {
GetPrimaryKeys()
}

override def visitNullCatalog(ctx: NullCatalogContext): AnyRef = {
null
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这段代码看起来没有明显的错误风险,只是对于该代码功能的了解不足,无法给出完整性能改进建议。

在这个代码段中,开发人员想在Trino查询引擎中添加获取主键信息的功能(通过 GetPrimaryKeys 函数)。此外,还有一些更改涉及导入和类方法重写。


case class GetPrimaryKeys() extends KyuubiTreeNode {
override def name(): String = "Get Primary Keys"
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这段代码补丁中新增了一个名为GetPrimaryKeyscase class,它继承自KyuubiTreeNode 并实现了 name() 方法返回"Get Primary Keys"。

从这段代码来看,似乎没有明显的错误或缺陷。如果想要完整地评估代码中的风险和提出改进建议,则需要更深入地研究该方法使用的编程语言、框架、第三方库以及系统设计。

yanghua pushed a commit that referenced this pull request Apr 25, 2023
…` and `CacheTable`

close apache#4332
### _Why are the changes needed?_

For the case where the table name has been resolved and an `Expand` logical plan exists
```
InsertIntoHiveTable `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [a, b]
+- Aggregate [a#0], [a#0, ansi_cast((count(if ((gid#9 = 1)) spark_catalog.default.t2.`b`#10 else null) * count(if ((gid#9 = 2)) spark_catalog.default.t2.`c`#11 else null)) as string) AS b#8]
   +- Aggregate [a#0, spark_catalog.default.t2.`b`#10, spark_catalog.default.t2.`c`#11, gid#9], [a#0, spark_catalog.default.t2.`b`#10, spark_catalog.default.t2.`c`#11, gid#9]
      +- Expand [ArrayBuffer(a#0, b#1, null, 1), ArrayBuffer(a#0, null, c#2, 2)], [a#0, spark_catalog.default.t2.`b`#10, spark_catalog.default.t2.`c`#11, gid#9]
         +- HiveTableRelation [`default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#0, b#1, c#2], Partition Cols: []]
```
For the case `CacheTable` with `window` function
```
InsertIntoHiveTable `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, true, false, [a, b]
+- Project [a#98, b#99]
   +- InMemoryRelation [a#98, b#99, rank#100], StorageLevel(disk, memory, deserialized, 1 replicas)
         +- *(2) Filter (isnotnull(rank#4) AND (rank#4 = 1))
            +- Window [row_number() windowspecdefinition(a#9, b#10 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS rank#4], [a#9], [b#10 ASC NULLS FIRST]
               +- *(1) Sort [a#9 ASC NULLS FIRST, b#10 ASC NULLS FIRST], false, 0
                  +- Exchange hashpartitioning(a#9, 200), ENSURE_REQUIREMENTS, [id=apache#38]
                     +- Scan hive default.t2 [a#9, b#10], HiveTableRelation [`default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#9, b#10], Partition Cols: []]

```

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes apache#4393 from iodone/kyuubi-4332.

Closes apache#4393

d2afdab [odone] fix cache table bug
443af79 [odone] fix some bugs with groupby

Authored-by: odone <[email protected]>
Signed-off-by: ulyssesyou <[email protected]>
yanghua pushed a commit that referenced this pull request Apr 25, 2023
…` and `CacheTable`

close apache#4332
### _Why are the changes needed?_

For the case where the table name has been resolved and an `Expand` logical plan exists
```
InsertIntoHiveTable `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [a, b]
+- Aggregate [a#0], [a#0, ansi_cast((count(if ((gid#9 = 1)) spark_catalog.default.t2.`b`#10 else null) * count(if ((gid#9 = 2)) spark_catalog.default.t2.`c`#11 else null)) as string) AS b#8]
   +- Aggregate [a#0, spark_catalog.default.t2.`b`#10, spark_catalog.default.t2.`c`#11, gid#9], [a#0, spark_catalog.default.t2.`b`#10, spark_catalog.default.t2.`c`#11, gid#9]
      +- Expand [ArrayBuffer(a#0, b#1, null, 1), ArrayBuffer(a#0, null, c#2, 2)], [a#0, spark_catalog.default.t2.`b`#10, spark_catalog.default.t2.`c`#11, gid#9]
         +- HiveTableRelation [`default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#0, b#1, c#2], Partition Cols: []]
```
For the case `CacheTable` with `window` function
```
InsertIntoHiveTable `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, true, false, [a, b]
+- Project [a#98, b#99]
   +- InMemoryRelation [a#98, b#99, rank#100], StorageLevel(disk, memory, deserialized, 1 replicas)
         +- *(2) Filter (isnotnull(rank#4) AND (rank#4 = 1))
            +- Window [row_number() windowspecdefinition(a#9, b#10 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS rank#4], [a#9], [b#10 ASC NULLS FIRST]
               +- *(1) Sort [a#9 ASC NULLS FIRST, b#10 ASC NULLS FIRST], false, 0
                  +- Exchange hashpartitioning(a#9, 200), ENSURE_REQUIREMENTS, [id=apache#38]
                     +- Scan hive default.t2 [a#9, b#10], HiveTableRelation [`default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#9, b#10], Partition Cols: []]

```

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes apache#4393 from iodone/kyuubi-4332.

Closes apache#4393

d2afdab [odone] fix cache table bug
443af79 [odone] fix some bugs with groupby

Authored-by: odone <[email protected]>
Signed-off-by: ulyssesyou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants