feat: support insert function in offline mode #3854

Matagits · 2024-04-08T13:59:32Z

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
feature: Enable insert function in offline mode, add corresponding test cases
What is the current behavior? (You can also link to an open issue here)
Insert function is not supported in offline mode
What is the new behavior (if this is a feature change)?
We can use insert in offline mode

github-actions · 2024-04-08T14:20:22Z

SDK Test Report

102 files +1 102 suites +1 2m 12s ⏱️ -1s
357 tests +8 343 ✅ +8 14 💤 ±0 0 ❌ ±0
483 runs +8 469 ✅ +8 14 💤 ±0 0 ❌ ±0

Results for commit 05dbe85. ± Comparison against base commit 7f758af.

This pull request removes 30 and adds 17 tests. Note that renamed tests count towards both.

  PARTITION BY db1.t1.col2 ORDER BY db1.t1.col1
  PARTITION BY t1.col2 ORDER BY t1.col1
  ROWS_RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
 ) limit 10;](1)
 ) limit 10;](2)
 ) limit 10;](3)
 FROM db1.t1
 FROM t1
 WINDOW w1 AS (
 last join db2.t2 order by db2.t2.col1
…

com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlLastJoinWithMultipleDB[,  SELECT sum(db1.t1.col1) over w1 as sum_t1_col1, db2.t2.str1 as t2_str1
 FROM db1.t1
 last join db2.t2 order by db2.t2.col1
 on db1.t1.col1 = db2.t2.col1 and db1.t1.col2 = db2.t2.col0
 WINDOW w1 AS (
  PARTITION BY db1.t1.col2 ORDER BY db1.t1.col1
  ROWS_RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
 ) limit 10;](2)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlLastJoinWithMultipleDB[db1,  SELECT sum(t1.col1) over w1 as sum_t1_col1, db2.t2.str1 as t2_str1
 FROM t1
 last join db2.t2 order by db2.t2.col1
 on t1.col1 = db2.t2.col1 and t1.col2 = db2.t2.col0
 WINDOW w1 AS (
  PARTITION BY t1.col2 ORDER BY t1.col1
  ROWS_RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
 ) limit 10;](1)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlLastJoinWithMultipleDB[null,  SELECT sum(db1.t1.col1) over w1 as sum_t1_col1, db2.t2.str1 as t2_str1
 FROM db1.t1
 last join db2.t2 order by db2.t2.col1
 on db1.t1.col1 = db2.t2.col1 and db1.t1.col2 = db2.t2.col0
 WINDOW w1 AS (
  PARTITION BY db1.t1.col2 ORDER BY db1.t1.col1
  ROWS_RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
 ) limit 10;](3)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlMultipleDBErrorTest[, SELECT db2.t2.str1 as t2_str1
 FROM t1
 last join db2.t2 order by db2.t2.col1
 on t1.col1 = db2.t2.col1 and t1.col2 = db2.t2.col0;
, SQL parse error: Fail to transform data provider op: table t1 not exists in database []](4)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlMultipleDBErrorTest[db1, SELECT db1.t2.str1 as t2_str1
 FROM t1
 last join db2.t2 order by db2.t2.col1
 on t1.col1 = db2.t2.col1 and t1.col2 = db2.t2.col0;
, SQL parse error: Column Not found: db1.t2.str1](2)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlMultipleDBErrorTest[db1, SELECT db2.t2.str1 as t2_str1
 FROM t1
 last join db2.t2 order by db2.t2.col1
 on t1.col1 = t2.col1 and t1.col2 = db2.t2.col0;
, SQL parse error: Column Not found: .t2.col1](3)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlMultipleDBErrorTest[db1, SELECT t2.str1 as t2_str1
 FROM t1
 last join db2.t2 order by db2.t2.col1
 on t1.col1 = db2.t2.col1 and t1.col2 = db2.t2.col0;
, SQL parse error: Column Not found: .t2.str1](1)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlMultipleDBErrorTest[null, SELECT db2.t2.str1 as t2_str1
 FROM t1
 last join db2.t2 order by db2.t2.col1
 on t1.col1 = db2.t2.col1 and t1.col2 = db2.t2.col0;
, SQL parse error: Fail to transform data provider op: table t1 not exists in database []](5)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlWindowLastJoin[ SELECT sum(t1.col1) over w1 as sum_t1_col1, t2.str1 as t2_str1
 FROM t1
 last join t2 order by t2.col1
 on t1.col1 = t2.col1 and t1.col2 = t2.col0
 WINDOW w1 AS (
  PARTITION BY t1.col2 ORDER BY t1.col1
  ROWS_RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
 ) limit 10;](1)
com._4paradigm.openmldb.batch.TestInsertPlan ‑ Test column with default value
…

♻️ This comment has been updated with latest results.

github-actions · 2024-04-08T15:20:25Z

HybridSE Mac Test Report

20 124 tests ±0 20 122 ✅ ±0 7m 36s ⏱️ - 1m 3s
256 suites ±0 2 💤 ±0
68 files ±0 0 ❌ ±0

Results for commit 05dbe85. ± Comparison against base commit 7f758af.

♻️ This comment has been updated with latest results.

github-actions · 2024-04-08T15:20:27Z

HybridSE Linux Test Report

20 124 tests ±0 20 122 ✅ ±0 6m 21s ⏱️ ±0s
256 suites ±0 2 💤 ±0
68 files ±0 0 ❌ ±0

Results for commit 05dbe85. ± Comparison against base commit 7f758af.

♻️ This comment has been updated with latest results.

github-actions · 2024-04-08T15:50:15Z

Linux Test Report

57 files ±0 244 suites ±0 1h 41m 48s ⏱️ + 3m 26s
12 631 tests ±0 12 624 ✅ ±0 7 💤 ±0 0 ❌ ±0
17 908 runs ±0 17 901 ✅ ±0 7 💤 ±0 0 ❌ ±0

Results for commit 05dbe85. ± Comparison against base commit 7f758af.

♻️ This comment has been updated with latest results.

Matagits · 2024-04-09T03:19:01Z

java/openmldb-batch/src/main/scala/com/_4paradigm/openmldb/batch/nodes/InsertPlan.scala

+      val newOfflineInfo = OfflineTableInfo
+        .newBuilder()
+        .setPath(offlineDataPath)
+        .setFormat("csv")


format parquet

Matagits · 2024-04-09T03:19:23Z

java/openmldb-batch/src/main/scala/com/_4paradigm/openmldb/batch/nodes/InsertPlan.scala

+    val spark = ctx.getSparkSession
+    var insertDf = spark.createDataFrame(spark.sparkContext.parallelize(insertRows), insertSchema)
+    val schemaDf = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], oriSchema)
+    insertDf = schemaDf.unionByName(insertDf, allowMissingColumns = true)


default value

Matagits · 2024-04-09T03:25:40Z

java/openmldb-batch/src/main/scala/com/_4paradigm/openmldb/batch/nodes/InsertPlan.scala

+    val offlineDataPath = getOfflineDataPath(ctx, db, table)
+    val newTableInfoBuilder = tableInfo.toBuilder
+    val hasOfflineTableInfo = tableInfo.hasOfflineTableInfo
+    if (!hasOfflineTableInfo) {


如果有软链接，直接抛异常，拒绝这次写入

symbolic path

Matagits · 2024-04-09T03:49:48Z

java/openmldb-batch/src/test/scala/com/_4paradigm/openmldb/batch/TestInsertPlan.scala

+
+
+class TestInsertPlan extends SparkTestSuite {
+  var sparkSession: SparkSession = _


考虑写入已load data的table，对应的情况

vagetablechicken · 2024-04-09T04:31:41Z

add desc about offline insert in docs/zh/openmldb_sql/dml/INSERT_STATEMENT.md, offline insert can use 'yyyy-MM-dd xx' format, but online insert can't now.

Matagits · 2024-04-12T06:44:53Z

已做出如下修改点：
1、默认存储方式改为parquet
2、考虑列有default值的情况
3、拒绝有软链接的table执行离线数据写入
4、拒绝default mode之外的insert方式（如insert or ignore）
5、增加对应的测试代码（如对已load data的table执行离线insert）
6、修改对应中英文doc

vagetablechicken · 2024-04-12T07:21:27Z

docs/zh/openmldb_sql/dml/INSERT_STATEMENT.md

 - 默认`INSERT`不会去重，`INSERT OR IGNORE` 则可以忽略已存在于表中的数据，可以反复重试。
+- 离线模式仅支持`INSERT`，不支持`INSERT OR IGNORE`


还有限制：“离线insert不能用于有软链接的表“，由于format对一张表唯一，如果format为hive等，我们没法给它建硬拷贝地址，并保存insert数据到硬拷贝地址的parquet文件。使用insert只能用户先保证无软链接。

tobegit3hub

LGTM

Matagits and others added 8 commits March 28, 2024 21:15

native interface for insert function in offline mode

07cf0bb

Merge branch '4paradigm:main' into main

39ac6ad

Merge branch '4paradigm:main' into main

67c7796

native entrance of insert plan

589712e

java entrance of insert plan

7d62fac

add insert plan

f64a3d8

test cases

9f0a2bc

init test env

cfbd64f

Matagits requested review from tobegit3hub and aceforeverd April 8, 2024 13:59

github-actions bot added batch-engine openmldb batch(offline) engine execute-engine hybridse sql engine storage-engine openmldb storage engine. nameserver & tablet task-manager openmldb taskmanager labels Apr 8, 2024

fix scala style check

ff89c91

Matagits commented Apr 9, 2024

View reviewed changes

tobegit3hub assigned Matagits Apr 9, 2024

Matagits added 4 commits April 10, 2024 16:43

support column with default value

b05266a

refuse to insert into table with loaded soft copied data

4a39403

only support default insert mode

6216081

update docs

56aa801

github-actions bot added the documentation Improvements or additions to documentation label Apr 11, 2024

Matagits and others added 6 commits April 11, 2024 15:55

Merge branch '4paradigm:main' into main

fd87c59

fix test issue

e35f222

fix test issue

09db1c8

fix test issue

64aedf2

fix test issue

ae75b27

fix test issue

bf838d3

vagetablechicken self-requested a review April 12, 2024 07:17

vagetablechicken reviewed Apr 12, 2024

View reviewed changes

update docs

05dbe85

vagetablechicken approved these changes Apr 15, 2024

View reviewed changes

tobegit3hub approved these changes Apr 15, 2024

View reviewed changes

tobegit3hub merged commit 67138ef into 4paradigm:main Apr 15, 2024
29 of 31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support insert function in offline mode #3854

feat: support insert function in offline mode #3854

Matagits commented Apr 8, 2024

github-actions bot commented Apr 8, 2024 •

edited

Loading

github-actions bot commented Apr 8, 2024 •

edited

Loading

github-actions bot commented Apr 8, 2024 •

edited

Loading

github-actions bot commented Apr 8, 2024 •

edited

Loading

Matagits Apr 9, 2024

Matagits Apr 9, 2024

Matagits Apr 9, 2024

Matagits Apr 9, 2024

Matagits Apr 9, 2024

vagetablechicken commented Apr 9, 2024

Matagits commented Apr 12, 2024

vagetablechicken Apr 12, 2024

tobegit3hub left a comment



		class TestInsertPlan extends SparkTestSuite {
		var sparkSession: SparkSession = _

		- 默认`INSERT`不会去重，`INSERT OR IGNORE` 则可以忽略已存在于表中的数据，可以反复重试。
		- 离线模式仅支持`INSERT`，不支持`INSERT OR IGNORE`

feat: support insert function in offline mode #3854

feat: support insert function in offline mode #3854

Conversation

Matagits commented Apr 8, 2024

github-actions bot commented Apr 8, 2024 • edited Loading

SDK Test Report

github-actions bot commented Apr 8, 2024 • edited Loading

HybridSE Mac Test Report

github-actions bot commented Apr 8, 2024 • edited Loading

HybridSE Linux Test Report

github-actions bot commented Apr 8, 2024 • edited Loading

Linux Test Report

Matagits Apr 9, 2024

Choose a reason for hiding this comment

Matagits Apr 9, 2024

Choose a reason for hiding this comment

Matagits Apr 9, 2024

Choose a reason for hiding this comment

Matagits Apr 9, 2024

Choose a reason for hiding this comment

Matagits Apr 9, 2024

Choose a reason for hiding this comment

vagetablechicken commented Apr 9, 2024

Matagits commented Apr 12, 2024

vagetablechicken Apr 12, 2024

Choose a reason for hiding this comment

tobegit3hub left a comment

Choose a reason for hiding this comment

github-actions bot commented Apr 8, 2024 •

edited

Loading

github-actions bot commented Apr 8, 2024 •

edited

Loading

github-actions bot commented Apr 8, 2024 •

edited

Loading

github-actions bot commented Apr 8, 2024 •

edited

Loading