Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix for supported expressions #408

Merged
merged 3 commits into from
Jul 5, 2023

Conversation

nartal1
Copy link
Collaborator

@nartal1 nartal1 commented Jul 4, 2023

This PR fixes #381 and fixes #382

For #381 -> Earlier a Map was created for Expression/SQL function name for all supported expressions. Here it was assumed that there is co-relation beween Expression and SQL function name For Example: Expression name : DateAdd and SQL function name: date_add. So we would remove _ from the SQL function and create a Map. But this doesn't apply for few operators. Example: Expression: First , SQL function name: first_value. So we need 2 different keys in this case. So the fix is to include both SQL function name and expression name in the map.

For #382 -> There was a bug where the keys of the map contains space i.e key of mod was mod. So it would be considered as unsupported expressions as mod != mod. Updated the code to remove whitespaces for keys.

@nartal1 nartal1 added the bug Something isn't working label Jul 4, 2023
@nartal1 nartal1 self-assigned this Jul 4, 2023
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One general comment not very related to this PR.
I notice that we use toLowerCase frequently; especially in pluginTypeChecker.scala
Everytime we use toLowerCase means that we iterate over the string which can be a significant slow down if it is redundant.

We may need to file an issue to revisit how we process the expressions, so that we can remove redundant.

Targets
    Occurrences of 'toLowerCase' in Project with mask '*.scala'
Found Occurrences in Project with mask '*.scala'  (26 usages found)
    Unclassified  (26 usages found)
        rapids-4-spark-tools_2.12  (26 usages found)
            com.nvidia.spark.rapids.tool.planparser  (3 usages found)
                DataWritingCommandExecParser.scala  (1 usage found)
                    41 new ExecInfo(sqlID, s"${node.name.trim} ${wStub.dataFormat.toLowerCase.trim}", "",
                SQLPlanParser.scala  (2 usages found)
                    356 ignoreExpressions.contains(expr.toLowerCase)
                    704 functionMatches.map(_.group(1)).filter(_.toLowerCase() != "cast")
            com.nvidia.spark.rapids.tool.profiling  (8 usages found)
                AutoTuner.scala  (3 usages found)
                    626 .getOrElse("false").toLowerCase
                    1063 val sizesArr = size.toLowerCase.split("(?=[a-z])")
                    1075 val sizesArr = size.toLowerCase.split("(?=[a-z])")
                ProfileOutputWriter.scala  (1 usage found)
                    63 val suffix = header.replace(" ", "_").toLowerCase
                QualificationInfoUtils.scala  (4 usages found)
                    171 if (logType.toLowerCase.equals("dataset")) {
                    173 } else if (logType.toLowerCase.equals("udfds")) {
                    175 } else if (logType.toLowerCase.equals("udffunc")) {
                    177 } else if (logType.toLowerCase.equals("dsanddf")) {
            com.nvidia.spark.rapids.tool.qualification  (14 usages found)
                PluginTypeChecker.scala  (12 usages found)
                    115 x => (x._1.toLowerCase.replaceAll("\\`", ""), x._2))
                    136 val header = fileContents.head.split(",").map(_.toLowerCase)
                    189 val header = fileContents.head.split(",").map(_.toLowerCase)
                    197 val format = cols(0).toLowerCase
                    198 val direction = cols(1).toLowerCase()
                    203 }.keys.toSeq.map(_.toLowerCase)
                    236 val schemaLower = schema.toLowerCase
                    237 val formatInLower = format.toLowerCase
                    242 val nsFiltered = dtSupMap(NS).filter(t => schemaLower.contains(t.toLowerCase()))
                    261 val format = writeFormat.toLowerCase.trim
                    266 writeFormat.map(x => x.toLowerCase.trim).filterNot(
                    303 expr.toLowerCase.replace("_", "")
                QualificationArgs.scala  (2 usages found)
                    193 order.toLowerCase.startsWith("asc")
                    197 order.toLowerCase.startsWith("desc")
            org.apache.spark.sql.rapids.tool.qualification  (1 usage found)
                QualificationAppInfo.scala  (1 usage found)
                    179 s"${ds.format.toLowerCase()}[${ds.schema}]"

@amahussein amahussein added the core_tools Scope the core module (scala) label Jul 5, 2023
@nartal1
Copy link
Collaborator Author

nartal1 commented Jul 5, 2023

I notice that we use toLowerCase frequently; especially in pluginTypeChecker.scala
Everytime we use toLowerCase means that we iterate over the string which can be a significant slow down if it is redundant.

We may need to file an issue to revisit how we process the expressions, so that we can remove redundant.

Filed followon issue - #409

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
2 participants