Skip to content

Latest commit

 

History

History
36 lines (26 loc) · 2.29 KB

README.md

File metadata and controls

36 lines (26 loc) · 2.29 KB

Thrift IDL parser and code generator for the compact protocol

This is an alternative implementation of the official Apache Thrift code generator, with a focus on the compact protocol.

The initial goal of this project is to develop a more efficient rust parser for the metadata embedded in Apache Parquet files.

Higher performance is achieved by the following design decisions:

  • Fewer abstractions by focusing on the compact protocol.
    • The generated code for example inlines the reading of field headers and so avoids method calls and passing of slightly larger structure like TFieldIdentifier.
    • The field id and field delta can be tracked inside the generated code, similar for boolean fields, making the actual protocol code much simpler.
  • The rust target avoids moving structures from optional local variables into fields of the returned struct by directly filling the struct. This unfortunately requires all generated structs to implement the default trait.

Even though the initial target language for the code generator is rust, the code generator is written in Kotlin. The reasons for this choice are:

  • Using a jvm-based language gives access to one of the most developer friendly parser generators, ANTLR. (There is a rust implementation of Antlr, but it is mostly unmaintained at the moment.)
  • Kotlins sealed and data classes are very powerful for modeling domain objects (similar to rust enums).
  • Kotlin has built-in support for string templates, which are checked at compile time.

The runtime support for the generated rust code can be found in the src/main/rust folder.

How to run

To run this code generator you will need a Java Distribution like Amazon Corretto and Apache Maven as a build tool. Once these are installed and their bin folders added to the PATH, the definitions for the included parquet.thrift can be generated by running:

$ ./generate-parquet.sh