Bug fix for supported expressions #408

nartal1 · 2023-07-04T00:23:38Z

This PR fixes #381 and fixes #382

For #381 -> Earlier a Map was created for Expression/SQL function name for all supported expressions. Here it was assumed that there is co-relation beween Expression and SQL function name For Example: Expression name : DateAdd and SQL function name: date_add. So we would remove _ from the SQL function and create a Map. But this doesn't apply for few operators. Example: Expression: First , SQL function name: first_value. So we need 2 different keys in this case. So the fix is to include both SQL function name and expression name in the map.

For #382 -> There was a bug where the keys of the map contains space i.e key of mod was mod. So it would be considered as unsupported expressions as mod != mod. Updated the code to remove whitespaces for keys.

Signed-off-by: Niranjan Artal <[email protected]>

amahussein

One general comment not very related to this PR.
I notice that we use toLowerCase frequently; especially in pluginTypeChecker.scala
Everytime we use toLowerCase means that we iterate over the string which can be a significant slow down if it is redundant.

We may need to file an issue to revisit how we process the expressions, so that we can remove redundant.

Targets
    Occurrences of 'toLowerCase' in Project with mask '*.scala'
Found Occurrences in Project with mask '*.scala'  (26 usages found)
    Unclassified  (26 usages found)
        rapids-4-spark-tools_2.12  (26 usages found)
            com.nvidia.spark.rapids.tool.planparser  (3 usages found)
                DataWritingCommandExecParser.scala  (1 usage found)
                    41 new ExecInfo(sqlID, s"${node.name.trim} ${wStub.dataFormat.toLowerCase.trim}", "",
                SQLPlanParser.scala  (2 usages found)
                    356 ignoreExpressions.contains(expr.toLowerCase)
                    704 functionMatches.map(_.group(1)).filter(_.toLowerCase() != "cast")
            com.nvidia.spark.rapids.tool.profiling  (8 usages found)
                AutoTuner.scala  (3 usages found)
                    626 .getOrElse("false").toLowerCase
                    1063 val sizesArr = size.toLowerCase.split("(?=[a-z])")
                    1075 val sizesArr = size.toLowerCase.split("(?=[a-z])")
                ProfileOutputWriter.scala  (1 usage found)
                    63 val suffix = header.replace(" ", "_").toLowerCase
                QualificationInfoUtils.scala  (4 usages found)
                    171 if (logType.toLowerCase.equals("dataset")) {
                    173 } else if (logType.toLowerCase.equals("udfds")) {
                    175 } else if (logType.toLowerCase.equals("udffunc")) {
                    177 } else if (logType.toLowerCase.equals("dsanddf")) {
            com.nvidia.spark.rapids.tool.qualification  (14 usages found)
                PluginTypeChecker.scala  (12 usages found)
                    115 x => (x._1.toLowerCase.replaceAll("\\`", ""), x._2))
                    136 val header = fileContents.head.split(",").map(_.toLowerCase)
                    189 val header = fileContents.head.split(",").map(_.toLowerCase)
                    197 val format = cols(0).toLowerCase
                    198 val direction = cols(1).toLowerCase()
                    203 }.keys.toSeq.map(_.toLowerCase)
                    236 val schemaLower = schema.toLowerCase
                    237 val formatInLower = format.toLowerCase
                    242 val nsFiltered = dtSupMap(NS).filter(t => schemaLower.contains(t.toLowerCase()))
                    261 val format = writeFormat.toLowerCase.trim
                    266 writeFormat.map(x => x.toLowerCase.trim).filterNot(
                    303 expr.toLowerCase.replace("_", "")
                QualificationArgs.scala  (2 usages found)
                    193 order.toLowerCase.startsWith("asc")
                    197 order.toLowerCase.startsWith("desc")
            org.apache.spark.sql.rapids.tool.qualification  (1 usage found)
                QualificationAppInfo.scala  (1 usage found)
                    179 s"${ds.format.toLowerCase()}[${ds.schema}]"

core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala

…portedexprs_bugfix

nartal1 · 2023-07-05T19:16:25Z

I notice that we use toLowerCase frequently; especially in pluginTypeChecker.scala
Everytime we use toLowerCase means that we iterate over the string which can be a significant slow down if it is redundant.

We may need to file an issue to revisit how we process the expressions, so that we can remove redundant.

Filed followon issue - #409

core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala

Bug fix for supported expressions

4f2a034

Signed-off-by: Niranjan Artal <[email protected]>

nartal1 added the bug Something isn't working label Jul 4, 2023

nartal1 self-assigned this Jul 4, 2023

amahussein reviewed Jul 5, 2023

View reviewed changes

core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala Outdated Show resolved Hide resolved

core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala Show resolved Hide resolved

amahussein added the core_tools Scope the core module (scala) label Jul 5, 2023

nartal1 added 2 commits July 5, 2023 11:10

Merge branch 'dev' of github.com:NVIDIA/spark-rapids-tools into unsup…

60c7cf5

…portedexprs_bugfix

Addressed review comments

ab6838f

nartal1 mentioned this pull request Jul 5, 2023

[BUG] Evaluate converting expressions names to lowercase #409

Open

amahussein approved these changes Jul 5, 2023

View reviewed changes

core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala Show resolved Hide resolved

core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala Show resolved Hide resolved

nartal1 merged commit 6f473a8 into NVIDIA:dev Jul 5, 2023

nartal1 mentioned this pull request Jul 6, 2023

[BUG] Audit expression name and SQL function name in supportedExprs.csv #392

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix for supported expressions #408

Bug fix for supported expressions #408

nartal1 commented Jul 4, 2023

amahussein left a comment

nartal1 commented Jul 5, 2023

Bug fix for supported expressions #408

Bug fix for supported expressions #408

Conversation

nartal1 commented Jul 4, 2023

amahussein left a comment

Choose a reason for hiding this comment

nartal1 commented Jul 5, 2023