-
Notifications
You must be signed in to change notification settings - Fork 186
Support Top/Rare Command In PPL #720
Support Top/Rare Command In PPL #720
Conversation
@@ -83,6 +84,16 @@ public static LogicalPlan dedupe( | |||
input, Arrays.asList(fields), allowedDuplication, keepEmpty, consecutive); | |||
} | |||
|
|||
public static LogicalPlan rareTopN(LogicalPlan input, Boolean rareTopFlag, | |||
List<Expression> groupByList, Expression... fields) { | |||
return rareTopN(input, rareTopFlag, 10, groupByList, fields); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some use a constant like DEFAULT_NO_OF_RESULTS and some have 10 hardcoded. Should probably be consistent with the constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used the variable/function to get default value wherever required. hardcoded values are used for the test purpose.
...src/main/java/com/amazon/opendistroforelasticsearch/sql/planner/logical/LogicalRareTopN.java
Show resolved
Hide resolved
|
||
@Override | ||
public List<LogicalPlan> getChild() { | ||
return Collections.singletonList(child); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a different method we use ImmutableList.of() here. Let's be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced ImmutableList.of(child)
in RareTopN.java
with Collections.singletonList(child)
...c/main/java/com/amazon/opendistroforelasticsearch/sql/planner/physical/RareTopNOperator.java
Show resolved
Hide resolved
* Get a list of top values. | ||
*/ | ||
public List<FieldKey> findTop(HashMap<FieldKey, Integer> map) { | ||
PriorityQueue<FieldKey> topQueue = new PriorityQueue<>(new Comparator<FieldKey>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findTop() and findRare() have the same implementation, except the comparator is different. We could just have a single find() function that takes in a comparator instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced with find() function
/** | ||
* Get a list of result. | ||
*/ | ||
public List<FieldKey> getList(HashMap<FieldKey, Integer> map, PriorityQueue<FieldKey> queue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This algorithm for building this list is too complicated. I think you can model this more easily with streams.
map.entrySet().stream()
.sort(ValueComparator)
.limit(noOfResults)
Pass in different comparator for rare vs. top.
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added this in find() function.
/** | ||
* Return the Map of field and field value. | ||
*/ | ||
public LinkedHashMap<String, ExprValue> fieldKeyMap() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually you don't want to return the specific type and instead return the interface unless the caller really needs to work with LinkedHashMap in particular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Changed return type to Map.
@@ -0,0 +1,49 @@ | |||
package com.amazon.opendistroforelasticsearch.sql.ast.tree; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing the license header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
*/ | ||
public FieldKey(ExprValue value) { | ||
this.fieldByValueList = new ArrayList<>(); | ||
for (Expression fieldExpr : fieldExprList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use fieldExprList.stream().map(...).collect(Collectors.toList())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to stream
*/ | ||
public Map<String, ExprValue> fieldKeyMap() { | ||
Map<String, ExprValue> map = new LinkedHashMap<>(); | ||
for (int i = 0; i < fieldExprList.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you using guava? Can you use Streams.zip().collect(Collectors.toMap(...))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced with Streams.zip()...
*/ | ||
public GroupKey(ExprValue value) { | ||
this.groupByValueList = new ArrayList<>(); | ||
for (Expression groupExpr : groupByExprList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch to stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/** | ||
* Return the Map of group field and group field value. | ||
*/ | ||
public Map<String, ExprValue> groupKeyMap() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch to stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same logic and return type as for FieldKey#fieldKeyMap. Can they derive from a common base, or use a generic parameter?
The constructor is also pretty much the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used common class for finding group & field key
* Top command. | ||
*/ | ||
@Override | ||
public UnresolvedPlan visitTopCommand(TopCommandContext ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is the same for these two methods, except differing by the flag. Can they be combined into a helper? If the two context classes have the same methods...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added helper methods for getting a list of groups & fields. Can't add other arguments in helper since that requires respective command context object
|
||
PPL query:: | ||
|
||
od> source=accounts | rare age by gender; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how many value for each field will be returned? 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 values per group for each field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add this to the doc? I think the max value = 10, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the max is 10. Added in the Description section.
============ | ||
top [N] <field-list> [by-clause] | ||
|
||
* N: number of results to return. **Default**: 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the rare command doesn't has thsi paramater?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rare doesn't have N because it's not in the syntax given in this documentation: https://docs.splunk.com/Documentation/Splunk/8.0.2/SearchReference/Rare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limit
controls the number of results for rare but support for top-options
isn't covered in this PR.
Should I add parameter N for rare?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need now, just confirm,
@@ -197,6 +206,56 @@ public UnresolvedPlan visitEvalCommand(EvalCommandContext ctx) { | |||
); | |||
} | |||
|
|||
private List<UnresolvedExpression> getGroupByList(ByClauseContext ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also change the stats command to use these genric method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used getGroupByList in stats command
.collect(Collectors.toList()); | ||
} | ||
|
||
private List<Field> getFieldList(FieldListContext ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used getFieldList for dedup command
@RequiredArgsConstructor | ||
public class Group { | ||
|
||
private final Map<Key, List<HashMap<Key, Integer>>> groupListMap = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the value is List instead of just Map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing. The list is not required here. I will remove it.
I had list<list<ExprValue>>
before I understood every input has values for the entire row and not just for a cell. Looks like I missed this while updating the structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed list
- remove list - use generic getGroupByList function for stats - use generic getFieldList function for dedup
public class RareTopN extends UnresolvedPlan { | ||
|
||
private UnresolvedPlan child; | ||
private final Boolean rareTopFlag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a boolean instead of Boolean? Would an enum with values RARE or TOP be better? I can't tell if setting this to true means rare or top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced with enum
@EqualsAndHashCode.Exclude | ||
private Iterator<ExprValue> iterator; | ||
|
||
private static final Integer DEFAULT_NO_OF_RESULTS = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this and noOfResults just be primitive ints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can replace it with primitive ints. All other operators had an object of Integer so I just followed the same structure. Should I use primitive instead?
|
||
@Override | ||
public boolean hasNext() { | ||
return iterator.hasNext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a precondition check here and on next() to verify open() has been called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate more? Like what should we return if a condition fails? Maybe Null?
I have simply followed the logic of AggregationOperator here. It might not be required since open() is updating an iterator and if open() isn't called then hasNext() will simply return false since there are no elements. And when hasNext() will return false, next() won't be called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add Precondition.checkNotNull(iterator) here. But really depends on what coding style is preferred.
...c/main/java/com/amazon/opendistroforelasticsearch/sql/planner/physical/RareTopNOperator.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change.
@@ -80,6 +80,17 @@ public static DedupeOperator dedupe( | |||
input, Arrays.asList(expressions), allowedDuplication, keepEmpty, consecutive); | |||
} | |||
|
|||
public static RareTopNOperator rareTopN(PhysicalPlan input, Boolean rareTopFlag, | |||
List<Expression> groups, Expression... expressions) { | |||
return rareTopN(input, rareTopFlag, 10, groups, expressions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment on constant for 10.
} | ||
|
||
/** | ||
* Get a list of top values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
/** | ||
* Field Key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This javadoc isn't very useful. Comments should help the developer understand the code and provide value like "what, why, how".
Issue #685
Description of changes:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.