Failed to query fields of type Decimal and data is 0 #265

eye-gu · 2023-08-10T10:14:28Z

spark sql: select number_col1 from clickhouse.zhouwq.user_test;

log

spark-sql> select number_col1 from clickhouse.zhouwq.user_test;
23/08/10 18:11:38 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.NullPointerException
at java.math.BigDecimal.(BigDecimal.java:831)
at scala.math.BigDecimal$.apply(BigDecimal.scala:290)
at xenon.clickhouse.read.format.ClickHouseJsonReader.decodeValue(ClickHouseJsonReader.scala:74)
at xenon.clickhouse.read.format.ClickHouseJsonReader.decode(ClickHouseJsonReader.scala:48)
at xenon.clickhouse.read.format.ClickHouseJsonReader.decode(ClickHouseJsonReader.scala:33)
at xenon.clickhouse.read.ClickHouseReader.get(ClickHouseReader.scala:89)
at xenon.clickhouse.read.ClickHouseReader.get(ClickHouseReader.scala:29)

The text was updated successfully, but these errors were encountered:

eye-gu · 2023-08-10T10:16:29Z

maybe use asText method is better

pan3793 · 2023-08-10T10:22:51Z

I think asText does not help, if ClickHouse eliminates data of 0 in result, we should set it to default value explicitly.

scala> BigDecimal("")
java.lang.NumberFormatException
  at java.math.BigDecimal.<init>(BigDecimal.java:599)
  at java.math.BigDecimal.<init>(BigDecimal.java:383)
  at java.math.BigDecimal.<init>(BigDecimal.java:809)
  at scala.math.BigDecimal$.exact(BigDecimal.scala:126)
  at scala.math.BigDecimal$.apply(BigDecimal.scala:284)
  ... 47 elided

pan3793 · 2023-08-10T10:23:10Z

BTW, thanks for reporting this issue, PR is welcome~

eye-gu · 2023-08-10T10:25:04Z

I'm not sure what caused it. In my local environment, it works after modifying to 'asText'

pan3793 · 2023-08-10T10:27:25Z

Could you try this way?

      case d: DecimalType if jsonNode.isBigDecimal =>
        Decimal(jsonNode.decimalValue, d.precision, d.scale)
      case d: DecimalType if jsonNode.isFloat | jsonNode.isDouble =>
        Decimal(BigDecimal(jsonNode.doubleValue, new MathContext(d.precision)), d.precision, d.scale)
+     case d: DecimalType if jsonNode.inInt =>
+       Decimal(BigDecimal(jsonNode.intValue, new MathContext(d.precision)), d.precision, d.scale)
      case d: DecimalType =>
        Decimal(BigDecimal(jsonNode.textValue, new MathContext(d.precision)), d.precision, d.scale)

pan3793 · 2023-08-10T10:30:26Z

Seems there are other unhandled cases here, like isLong, isBigInteger ...

eye-gu · 2023-08-10T10:32:26Z

isInt works

pan3793 · 2023-08-10T10:33:38Z

So that's the root cause, would u like to send PR to fix it, and other potential cases?

eye-gu · 2023-08-10T10:51:25Z

PTAL

eye-gu · 2023-08-10T10:59:07Z

I found that the binary format does not have this issue, also #264 issue. Is it more recommended to use binary?

paf91 · 2024-01-28T11:12:04Z

issue is still present. any updates? @eye-gu why issue is closed? it's still present in 0.7.2 with 0.4.5 jdbc driver with spark 3.4.2

paf91 · 2024-01-28T11:15:42Z

pan3793 · 2024-01-28T18:44:28Z

@paf91 I need to find some time to publish a new version containing this patch, maybe in a few days, you can try building the master branch before the publish is done.

pan3793 · 2024-02-06T09:49:51Z

@paf91 FYI, 0.7.3 is available now, it includes this patch.

paf91 · 2024-02-06T12:32:32Z

@pan3793 could you please tell if you know what's the best practice to save resulting dataframe into clickhouse? I couldn't find it in docs https://housepower.github.io/spark-clickhouse-connector/quick_start/02_play_with_spark_sql/ and the best I could do it use clickhouse-jdbc like df.write.format("jdbc").options(options_map).mode("Overwrite").save

pan3793 · 2024-02-06T12:59:03Z

@paf91 Well, the "best practices" depend on a lot of stuff. The built-in JDBC data source is maintained by the Apache Spark community as a generic solution for interacting with RDBMS, just keep using it if it works well for your cases.

Instead of providing best practices, I'd like to list some points that I think are worth careful consideration.

Performance in the distributed system is a big topic, I wrote an article to explain how this connector improves the query performance.

Convenience. For example, the data engineer may want to use CREATE/REPLACE TABLE ... AS SELECT ... or equivalent df.writeTo(table).create() and df.writeTo(table).createOrReplace(), to create a ClickHouse table automatically and save data from Spark to ClickHouse, it's impossible without SPARK-43390

Transactions. For a long time, transactions were not strict or even missing in the big data world. In a distributed system, the failure of a single node is normal. Without the guarantee of writing transactions, the resulting retries may lead to eventual data duplication. Have some thoughts about this topic previously #145

eye-gu mentioned this issue Aug 10, 2023

Spark: Add int long bigInteger case for Decimal #266

Merged

pan3793 closed this as completed in #266 Aug 10, 2023

pan3793 mentioned this issue Feb 20, 2024

Any benchmarks? #298

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to query fields of type Decimal and data is 0 #265

Failed to query fields of type Decimal and data is 0 #265

eye-gu commented Aug 10, 2023

eye-gu commented Aug 10, 2023

pan3793 commented Aug 10, 2023

pan3793 commented Aug 10, 2023

eye-gu commented Aug 10, 2023

pan3793 commented Aug 10, 2023

pan3793 commented Aug 10, 2023

eye-gu commented Aug 10, 2023

pan3793 commented Aug 10, 2023 •

edited

Loading

eye-gu commented Aug 10, 2023

eye-gu commented Aug 10, 2023

paf91 commented Jan 28, 2024 •

edited

Loading

paf91 commented Jan 28, 2024

pan3793 commented Jan 28, 2024

pan3793 commented Feb 6, 2024

paf91 commented Feb 6, 2024

pan3793 commented Feb 6, 2024 •

edited

Loading

Failed to query fields of type Decimal and data is 0 #265

Failed to query fields of type Decimal and data is 0 #265

Comments

eye-gu commented Aug 10, 2023

log

eye-gu commented Aug 10, 2023

pan3793 commented Aug 10, 2023

pan3793 commented Aug 10, 2023

eye-gu commented Aug 10, 2023

pan3793 commented Aug 10, 2023

pan3793 commented Aug 10, 2023

eye-gu commented Aug 10, 2023

pan3793 commented Aug 10, 2023 • edited Loading

eye-gu commented Aug 10, 2023

eye-gu commented Aug 10, 2023

paf91 commented Jan 28, 2024 • edited Loading

paf91 commented Jan 28, 2024

pan3793 commented Jan 28, 2024

pan3793 commented Feb 6, 2024

paf91 commented Feb 6, 2024

pan3793 commented Feb 6, 2024 • edited Loading

pan3793 commented Aug 10, 2023 •

edited

Loading

paf91 commented Jan 28, 2024 •

edited

Loading

pan3793 commented Feb 6, 2024 •

edited

Loading