Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fail-fast function validation support for Presto C++ #23000

Open
tdcmeehan opened this issue Jun 13, 2024 · 0 comments · May be fixed by #23358
Open

Add fail-fast function validation support for Presto C++ #23000

tdcmeehan opened this issue Jun 13, 2024 · 0 comments · May be fixed by #23358
Labels
feature request prestissimo Presto Native Execution

Comments

@tdcmeehan
Copy link
Contributor

tdcmeehan commented Jun 13, 2024

Background

Unfenced functions are functions which execute within the same process as the underlying evaluation engine. Typically, this term is used in the context of user-defined functions, and it is meant to contrast with fenced functions, which execute in a separate process. Unfenced functions are more efficient than fenced functions, but they are also more dangerous, as they can crash the process in which they are running. Presto has supported unfenced functions for a very long time, and they are a convenient way to efficiently add new functions in Presto.

Currently, the Presto analyzer only validates functions that are registered through the Presto functions SPI, and also built-in functions which are implemented inJava. C++ functions are not validated at query planning time, which can lead to runtime errors if the function is not implemented correctly or if the function only exists in C++.

Expected Behavior or Use Case

To support RFC-0003, and to enable an equivalent SPI that allows for the registration of built-in functions written in C++, we need enhance Presto's SPI to allow for the registration of functions which are not present in-memory in the Java runtime. This issue is to create a new SPI which will allow out of process, yet built-in, functions to be registered to the Presto analyzer, which will allow these functions to be planned in the same way as existing Java functions, and also to quickly validate incorrect usage of these functions.

Presto Component, Service, or Connector

Presto SPI, Presto Sidecar, Native execution module

Possible Implementation

#22829 added a new sidecar process type to Presto. This sidecar is a separate process which shares the same code as the Presto C++ worker.

A new endpoint will be added to the Presto sidecar which will return the function mapping for all built-in and externally registered functions which are implemented in C++. This will allow the Presto analyzer to validate the function signatures of these functions at query planning time.

(Currently, there is no way to add new functions to the C++ engine without forking Prestissimo. However, a separate issue will be created which will enable registering such functions as an externally loaded shared library. Once this feature is enabled, then we can consider this to be a new SPI which allows for the registration of functions which are not present in-memory in the Java runtime or built-in to the Presto C++ engine.)

An enhancement to the SPI will be added in the Java codebase which will allow for the registration of functions which are not present in-memory in the Java runtime. This SPI will be used to register the functions which are returned by the sidecar process. This SPI is the same as the existing FunctionNamespaceManager SPI, with some important additions:

  • Currently, there is no support in this SPI for registering functions which take in parametric types. This support will be added.
  • Currently, there is no support in this SPI for registering functions which take in a variable number of arguments. This support will be added.
  • Currently, built-in functions are hardcoded to refer to an in-memory list of Java implemented functions. Built-in functions are just like other functions, except they don't require a namespace to refer to them (e.g., instead of typing presto.default.sum(x), where presto.default is the namespace of the function sum, built-in functions can simply be referred to as sum(x)). To address this, the FunctionNamespaceManager will be enhanced to allow for itself to be marked as the default namespace. Only a single namespace may be marked as providing the built-in namespace. When it is marked as a default namespace, then built-in functions can be redirected to a different namespace (e.g., instead of presto.default, presto.native.sum(x) may be referenced as sum(x)).

A new module will be developed which has the sole purpose of retrieving information from the Presto sidecar process. A new FunctionNamespaceManager will be added there which will retrieve the function mapping from the sidecar process and cache it in-memory. This module will be responsible for registering the functions which are returned by the sidecar process.

Because this functionality will be enabled through an SPI, use of it will be voluntary. However, it is expected that this will eventually be used by all Presto installations which have C++ functions, as it will allow for the validation of these functions at query planning time.

Example Screenshots (if appropriate):

Context

RFC-0003

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request prestissimo Presto Native Execution
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.

1 participant