Skip to content

Field Readers and Complex Writers

Paul Rogers edited this page May 3, 2018 · 1 revision

The holder classes discussed thus far are all you need for the vast majority of your UDFs. However, there are two special cases you may need for unusual needs: the FieldReader and ComplexWriter classes.

FieldReader

The FieldReader class provides a generic way to read any vector without having to declare a type-specific Holder. Drill itself provides the perfect example: the typeOf() function that takes any type as input and returns the name of the type. You'll find it in org.apache.drill.exec.expr.fn.impl.UnionFunctions. Here is a simplified version:

  @FunctionTemplate(names = {"typeOf"},
          scope = FunctionTemplate.FunctionScope.SIMPLE,
          nulls = NullHandling.INTERNAL)
  public static class GetType implements DrillSimpleFunc {

    @Param FieldReader input;
    @Output VarCharHolder out;
    @Inject DrillBuf buf;

    @Override
    public void eval() {
      String typeName = input.getType().getMinorType().name();
      ...
    }
  }

Here we are concerned with just the (major) type as returned from FieldReader.getType(). The major type includes the actual data type (the so-called "minor type") and type parameters such as precision and scale for decimal types, nested fields for maps and so on.

You can also use the FieldReader to read data by casting the reader to the proper subtype. Use your IDE to explore the various interfaces which FieldReader implements. However, if you go this route, you'll find that you'll be creating nested case statements for every data type and cardinality ("mode").

To avoid this dynamic run-time checking, Drill typically uses FreeMarker to generate different versions of a function for each type. See exec/java-exec/src/codegen/templates for a wide range of examples.

ComplexWriter

The @Output holders work for most Drill types, but they do not work for maps. In Drill, a map is a nested tuple: each map has the same structure as the top-level row. A Map in Drill is like a struct in Impala or Hive: a collection of columns with fixed schema.

The ComplexWriter lets you define and write to these fields.

The ComplexWriter also lets you write to an array of maps (that is, a Repeated Map).

A good example is the Drill mappify() (AKA kvgen) function defined in org.apache.drill.exec.expr.fn.impl.Mappify:

  @FunctionTemplate(names = {"mappify", "kvgen"}, scope = FunctionTemplate.FunctionScope.SIMPLE, nulls = FunctionTemplate.NullHandling.NULL_IF_NULL, isRandom = true)
  public static class ConvertMapToKeyValuePairs implements DrillSimpleFunc {
    @Param  FieldReader reader;
    @Inject DrillBuf buffer;
    @Output ComplexWriter writer;

    public void setup() {
    }

    public void eval() {
      buffer = org.apache.drill.exec.expr.fn.impl.MappifyUtility.mappify(reader, writer, buffer);
    }
  }

If we examine the Java implementation of mappify we'll see the code that creates columns and populates them:

  public static DrillBuf mappify(FieldReader reader, BaseWriter.ComplexWriter writer, DrillBuf buffer) {
    ...
    BaseWriter.ListWriter listWriter = writer.rootAsList();
    listWriter.startList();
    BaseWriter.MapWriter mapWriter = listWriter.map();

    // Iterate over the fields in the map
    Iterator<String> fieldIterator = reader.iterator();
    while (fieldIterator.hasNext()) {
      String str = fieldIterator.next();
      FieldReader fieldReader = reader.reader(str);

      ...

      // writing a new field, start a new map
      mapWriter.start();

      // write "key":"columnname" into the map
      VarCharHolder vh = new VarCharHolder();
      ...
      mapWriter.varChar(fieldKey).write(vh);

      // Write the value to the map
      MapUtility.writeToMapFromReader(fieldReader, mapWriter);

      mapWriter.end();
    }
    listWriter.endList();

    return buffer;
  }

The use of the ComplexWriter is quite advanced, but it is the only way to go if you need to create a function that emits a map (or list of maps) as its output value.

Clone this wiki locally