Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to query fields of type Decimal and data is 0 #265

Closed
eye-gu opened this issue Aug 10, 2023 · 16 comments · Fixed by #266
Closed

Failed to query fields of type Decimal and data is 0 #265

eye-gu opened this issue Aug 10, 2023 · 16 comments · Fixed by #266

Comments

@eye-gu
Copy link
Contributor

eye-gu commented Aug 10, 2023

spark sql: select number_col1 from clickhouse.zhouwq.user_test;

image

log

spark-sql> select number_col1 from clickhouse.zhouwq.user_test;
23/08/10 18:11:38 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.NullPointerException
at java.math.BigDecimal.(BigDecimal.java:831)
at scala.math.BigDecimal$.apply(BigDecimal.scala:290)
at xenon.clickhouse.read.format.ClickHouseJsonReader.decodeValue(ClickHouseJsonReader.scala:74)
at xenon.clickhouse.read.format.ClickHouseJsonReader.decode(ClickHouseJsonReader.scala:48)
at xenon.clickhouse.read.format.ClickHouseJsonReader.decode(ClickHouseJsonReader.scala:33)
at xenon.clickhouse.read.ClickHouseReader.get(ClickHouseReader.scala:89)
at xenon.clickhouse.read.ClickHouseReader.get(ClickHouseReader.scala:29)

@eye-gu
Copy link
Contributor Author

eye-gu commented Aug 10, 2023

image

maybe use asText method is better

@pan3793
Copy link
Collaborator

pan3793 commented Aug 10, 2023

I think asText does not help, if ClickHouse eliminates data of 0 in result, we should set it to default value explicitly.

scala> BigDecimal("")
java.lang.NumberFormatException
  at java.math.BigDecimal.<init>(BigDecimal.java:599)
  at java.math.BigDecimal.<init>(BigDecimal.java:383)
  at java.math.BigDecimal.<init>(BigDecimal.java:809)
  at scala.math.BigDecimal$.exact(BigDecimal.scala:126)
  at scala.math.BigDecimal$.apply(BigDecimal.scala:284)
  ... 47 elided

@pan3793
Copy link
Collaborator

pan3793 commented Aug 10, 2023

BTW, thanks for reporting this issue, PR is welcome~

@eye-gu
Copy link
Contributor Author

eye-gu commented Aug 10, 2023

I'm not sure what caused it. In my local environment, it works after modifying to 'asText'

@pan3793
Copy link
Collaborator

pan3793 commented Aug 10, 2023

Could you try this way?

      case d: DecimalType if jsonNode.isBigDecimal =>
        Decimal(jsonNode.decimalValue, d.precision, d.scale)
      case d: DecimalType if jsonNode.isFloat | jsonNode.isDouble =>
        Decimal(BigDecimal(jsonNode.doubleValue, new MathContext(d.precision)), d.precision, d.scale)
+     case d: DecimalType if jsonNode.inInt =>
+       Decimal(BigDecimal(jsonNode.intValue, new MathContext(d.precision)), d.precision, d.scale)
      case d: DecimalType =>
        Decimal(BigDecimal(jsonNode.textValue, new MathContext(d.precision)), d.precision, d.scale)

@pan3793
Copy link
Collaborator

pan3793 commented Aug 10, 2023

Seems there are other unhandled cases here, like isLong, isBigInteger ...

@eye-gu
Copy link
Contributor Author

eye-gu commented Aug 10, 2023

isInt works

@pan3793
Copy link
Collaborator

pan3793 commented Aug 10, 2023

So that's the root cause, would u like to send PR to fix it, and other potential cases?

@eye-gu
Copy link
Contributor Author

eye-gu commented Aug 10, 2023

PTAL

@eye-gu
Copy link
Contributor Author

eye-gu commented Aug 10, 2023

I found that the binary format does not have this issue, also #264 issue. Is it more recommended to use binary?

@paf91
Copy link

paf91 commented Jan 28, 2024

issue is still present. any updates? @eye-gu why issue is closed? it's still present in 0.7.2 with 0.4.5 jdbc driver with spark 3.4.2

@paf91
Copy link

paf91 commented Jan 28, 2024

image

@pan3793
Copy link
Collaborator

pan3793 commented Jan 28, 2024

@paf91 I need to find some time to publish a new version containing this patch, maybe in a few days, you can try building the master branch before the publish is done.

@pan3793
Copy link
Collaborator

pan3793 commented Feb 6, 2024

@paf91 FYI, 0.7.3 is available now, it includes this patch.

@paf91
Copy link

paf91 commented Feb 6, 2024

@pan3793 could you please tell if you know what's the best practice to save resulting dataframe into clickhouse? I couldn't find it in docs https://housepower.github.io/spark-clickhouse-connector/quick_start/02_play_with_spark_sql/ and the best I could do it use clickhouse-jdbc like df.write.format("jdbc").options(options_map).mode("Overwrite").save

@pan3793
Copy link
Collaborator

pan3793 commented Feb 6, 2024

@paf91 Well, the "best practices" depend on a lot of stuff. The built-in JDBC data source is maintained by the Apache Spark community as a generic solution for interacting with RDBMS, just keep using it if it works well for your cases.

Instead of providing best practices, I'd like to list some points that I think are worth careful consideration.

Performance in the distributed system is a big topic, I wrote an article to explain how this connector improves the query performance.

Convenience. For example, the data engineer may want to use CREATE/REPLACE TABLE ... AS SELECT ... or equivalent df.writeTo(table).create() and df.writeTo(table).createOrReplace(), to create a ClickHouse table automatically and save data from Spark to ClickHouse, it's impossible without SPARK-43390

Transactions. For a long time, transactions were not strict or even missing in the big data world. In a distributed system, the failure of a single node is normal. Without the guarantee of writing transactions, the resulting retries may lead to eventual data duplication. Have some thoughts about this topic previously #145

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants