[BUG] Spark reports a decimal error when create lit scalar when generate Decimal(34, -5) data. #9404

res-life · 2023-10-09T09:31:16Z

Describe the bug
Spark reports the following error when create lit scalar when generate Decimal(34, -5) data.

pyspark.sql.utils.AnalysisException: decimal can only support precision up to 38

Steps/Code to reproduce bug

Case 1, failed

Update test_greatest to the following, then run on Spark 311.

@pytest.mark.parametrize('data_gen', [DecimalGen(34, -5)], ids=idfn)
def test_greatest1(data_gen):
    num_cols = 20
    s1 = gen_scalar(data_gen, force_no_nulls=not isinstance(data_gen, NullGen))
    # we want lots of nulls
    gen = StructGen([('_c' + str(x), data_gen.copy_special_case(None, weight=100.0))
        for x in range(0, num_cols)], nullable=False)
    command_args = [f.col('_c' + str(x)) for x in range(0, num_cols)]
    command_args.append(s1)
    data_type = data_gen.data_type
    assert_gpu_and_cpu_are_equal_collect(
            lambda spark : gen_df(spark, gen).select(
                f.greatest(*command_args)))

case 2, passed. Only modify the parameter of the test case to add a `DecimalGen(7, 7)`

@pytest.mark.parametrize('data_gen', [DecimalGen(7, 7), [DecimalGen(34, -5)]], ids=idfn)
def test_greatest2(data_gen):
    num_cols = 20
    s1 = gen_scalar(data_gen, force_no_nulls=not isinstance(data_gen, NullGen))
    # we want lots of nulls
    gen = StructGen([('_c' + str(x), data_gen.copy_special_case(None, weight=100.0))
        for x in range(0, num_cols)], nullable=False)
    command_args = [f.col('_c' + str(x)) for x in range(0, num_cols)]
    command_args.append(s1)
    data_type = data_gen.data_type
    assert_gpu_and_cpu_are_equal_collect(
            lambda spark : gen_df(spark, gen).select(
                f.greatest(*command_args)))

case 3, failed. Only comment the tail lines.

@pytest.mark.parametrize('data_gen', [DecimalGen(7, 7), [DecimalGen(34, -5)]], ids=idfn)
def test_greatest3(data_gen):
    num_cols = 20
    s1 = gen_scalar(data_gen, force_no_nulls=not isinstance(data_gen, NullGen))
    # we want lots of nulls
    # gen = StructGen([('_c' + str(x), data_gen.copy_special_case(None, weight=100.0))
    #     for x in range(0, num_cols)], nullable=False)
    # command_args = [f.col('_c' + str(x)) for x in range(0, num_cols)]
    # command_args.append(s1)
    # data_type = data_gen.data_type
    # assert_gpu_and_cpu_are_equal_collect(
    #         lambda spark : gen_df(spark, gen).select(
    #             f.greatest(*command_args)))

The error is from:

s1 = gen_scalar(data_gen, force_no_nulls=not isinstance(data_gen, NullGen))
   --  return f.lit(data).cast(data_type) in datagen.py

Expected behavior
A clear and concise description of what you expected to happen.

Environment details (please complete the following information)

Environment location: [Standalone]
Spark 311

Additional context
Detail error is:

________________________ test_greatest3[Decimal(36,-5)] ________________________

data_gen = Decimal(36,-5)

    @pytest.mark.parametrize('data_gen', all_basic_gens + _arith_decimal_gens, ids=idfn)
    def test_greatest3(data_gen):
        num_cols = 20
>       s1 = gen_scalar(data_gen, force_no_nulls=not isinstance(data_gen, NullGen))

../../src/main/python/arithmetic_ops_test.py:991: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../src/main/python/data_gen.py:859: in gen_scalar
    v = list(gen_scalars(data_gen, 1, seed=seed, force_no_nulls=force_no_nulls))
../../src/main/python/data_gen.py:855: in <genexpr>
    return (_mark_as_lit(src.gen(force_no_nulls=force_no_nulls), data_type) for i in range(0, count))
../../src/main/python/data_gen.py:833: in _mark_as_lit
    return f.lit(data).cast(data_type)
/home/chongg/progs/sparks/spark-home/python/lib/pyspark.zip/pyspark/sql/functions.py:98: in lit
    return col if isinstance(col, Column) else _invoke_function("lit", col)
/home/chongg/progs/sparks/spark-home/python/lib/pyspark.zip/pyspark/sql/functions.py:58: in _invoke_function
    return Column(jf(*args))
/home/chongg/progs/sparks/spark-home/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py:1304: in __call__
    return_value = get_return_value(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

a = ('xro889', <py4j.java_gateway.GatewayClient object at 0x7fc007704c10>, 'z:org.apache.spark.sql.functions', 'lit')
kw = {}
converted = AnalysisException('decimal can only support precision up to 38', 'org.apache.spark.sql.AnalysisException: decimal can ...:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:750)\n', None)

    def deco(*a, **kw):
        try:
            return f(*a, **kw)
        except py4j.protocol.Py4JJavaError as e:
            converted = convert_exception(e.java_exception)
            if not isinstance(converted, UnknownException):
                # Hide where the exception came from that shows a non-Pythonic
                # JVM exception message.
>               raise converted from None
E               pyspark.sql.utils.AnalysisException: decimal can only support precision up to 38

/home/chongg/progs/sparks/spark-home/python/lib/pyspark.zip/pyspark/sql/utils.py:117: AnalysisException

The text was updated successfully, but these errors were encountered:

res-life · 2023-10-09T09:34:14Z

One more mininal repro:

$SPARK_HOME/bin/pyspark
Spark 3.1.1
>>> from decimal import Decimal
>>> from pyspark.sql.functions import *
>>> d = Decimal('4.8764759382421948924115781565938778E+39')
>>> lit(d)

Some times f.lit(4.8764759382421948924115781565938778E+39) can pass, but some times it failed.

res-life · 2023-10-09T09:36:12Z

It's from #9289 (comment)

res-life · 2023-10-09T10:15:28Z

Thanks @pxLi

He found the root cause, in rare case, we did not set spark.sql.legacy.allowNegativeScaleOfDecimal=true when creating a literal scalar.

If the Spark session already initilized with this config, then the cases can pass.
If no Spark session is initilized and therefore this config value is false, then create literal scalar will fail.

pyspark --conf spark.sql.legacy.allowNegativeScaleOfDecimal=true
>>> spark.sparkContext.getConf().get('spark.sql.legacy.allowNegativeScaleOfDecimal')
'true'
>>> from pyspark.sql.functions import *
>>> from decimal import Decimal
>>> d = Decimal('4.87647593824219489241157815659387781E+39')
>>> lit(d)
Column<'4.87647593824219489241157815659387781E+39'>
>>>

res-life added bug Something isn't working ? - Needs Triage Need team to review and classify labels Oct 9, 2023

thirtiseven mentioned this issue Oct 9, 2023

Wrap scalar generation into spark session in integration test #9405

Merged

res-life mentioned this issue Oct 9, 2023

Add tests to check compatibility with pyarrow [databricks] #9289

Merged

mattahrens assigned thirtiseven Oct 10, 2023

mattahrens removed the ? - Needs Triage Need team to review and classify label Oct 10, 2023

revans2 closed this as completed in #9405 Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Spark reports a decimal error when create lit scalar when generate Decimal(34, -5) data. #9404

[BUG] Spark reports a decimal error when create lit scalar when generate Decimal(34, -5) data. #9404

res-life commented Oct 9, 2023 •

edited by thirtiseven

Loading

res-life commented Oct 9, 2023

res-life commented Oct 9, 2023

res-life commented Oct 9, 2023 •

edited

Loading

[BUG] Spark reports a decimal error when create lit scalar when generate Decimal(34, -5) data. #9404

[BUG] Spark reports a decimal error when create lit scalar when generate Decimal(34, -5) data. #9404

Comments

res-life commented Oct 9, 2023 • edited by thirtiseven Loading

Case 1, failed

case 2, passed. Only modify the parameter of the test case to add a DecimalGen(7, 7)

case 3, failed. Only comment the tail lines.

res-life commented Oct 9, 2023

res-life commented Oct 9, 2023

res-life commented Oct 9, 2023 • edited Loading

res-life commented Oct 9, 2023 •

edited by thirtiseven

Loading

case 2, passed. Only modify the parameter of the test case to add a `DecimalGen(7, 7)`

res-life commented Oct 9, 2023 •

edited

Loading