layout | toc_group | link_title | permalink |
---|---|---|---|
docs |
truffle |
Truffle Native Function Interface |
/graalvm-as-a-platform/language-implementation-framework/NFI/ |
Truffle includes a way to call native functions, called the Native Function Interface (NFI). It is implemented as an internal language on top of Truffle that language implementors can access via the standard polyglot eval interface and Truffle interoperability. NFI is intended to be used, for example, to implement a language's FFI, or to call native runtime routines that are not available in Java.
NFI uses libffi
.
On a standard JVM it calls it using JNI, and on GraalVM Native Image it uses system Java.
In the future it may be optimised by the Graal Compiler in native executables so that native calls are made directly from the compiled code.
The NFI is an internal language designed for language implementors. It is not considered stable and the interface and behavior may change without warning. It is not intended to be used directly by end-users.
The NFI is accessed via the polyglot interface of whatever language you are using. This could be Java, or it could be a Truffle language. This lets you use the NFI from both your Java language implementation code, or from your guest language to reduce the amount of Java that you need to write.
The entry point is the polyglot eval
interface.
This runs a special DSL, and returns Truffle interoperability objects which can then expose more methods.
Below are some examples using Ruby's polyglot interface, but any other JVM or a language implementation could be used instead.
Here is a basic working example, before going into the details:
library = Polyglot.eval('nfi', 'load "libSDL2.dylib"') # load a library
symbol = library['SDL_GetRevisionNumber'] # load a symbol from the library
signature = Polyglot.eval('nfi', '():UINT32') # prepare a signature
function = signature.bind(symbol) # bind the symbol to the signature to create a function
puts function.call # => 12373 # call the function
To load a library, a script written in the 'nfi
' language DSL is evaluated.
It returns an object that represents the loaded library.
library = Polyglot.eval('nfi', '...load command...')
The load command can be any of these forms:
default
load "filename"
load (flag | flag | ...) "filename"
The default
command returns a pseudo-library that contains all symbols already loaded in
the process, equivalent to RTLD_DEFAULT
in the Posix interface.
The load "filename"
command loads a library from a file.
You are responsible for any cross-platform concerns about library naming conventions and load paths.
The load (flag | flag | ...) "filename"
command allows you to specify flags to load the library.
For the default backend (backends will be described later), and when running on a Posix platform, the flags available are RTLD_GLOBAL
, RTLD_LOCAL
, RTLD_LAZY
, and RTLD_NOW
, which have the conventional Posix semantics.
The default is RTLD_NOW
if neither RTLD_LAZY
nor RTLD_NOW
were specified.
To load a symbol from a library, read the symbol as a property from the library object that was previously loaded.
symbol = library['symbol_name']
To get an executable object that you can call in order to invoke the native function, bind the symbol object that was previously loaded, by creating a signature object and calling the bind
method on it.
The signature object needs to match the native function's actual type signature.
signature = Polyglot.eval('nfi', '...signature...')
function = signature.bind(symbol)
The format of the signature is (arg, arg, ...) : return
, where arg
and return
are types.
Types can be one of the simple types:
VOID
UINT8
SINT8
UINT16
SINT16
UINT32
SINT32
UINT64
SINT64
FLOAT
DOUBLE
POINTER
STRING
OBJECT
ENV
Array types are formed by placing another type in square brackets.
For example [UINT8]
. These are C-style arrays.
Function pointer types are formed by writing a nested signature.
For example the signature of qsort
would be (POINTER, UINT64, UINT64, (POINTER, POINTER) : SINT32) : VOID
.
For a function with a signature with variadic arguments, you specify ...
where the variadic arguments start, but then you must specify the actual types that you will be calling the function with.
You may therefore need to bind the same symbol multiple times in order to call it with different types or a different number of arguments.
For example, to call printf
with %d %f
you would use the type signature (STRING, ...SINT32, DOUBLE) : SINT32
.
Type expressions can be nested arbitrarily deep.
Two additional special types, ENV
and OBJECT
, are described in the section on the native API, later in this document.
Types can be written in any case.
You are responsible for any mapping of types from a foreign language such as C into NFI types.
To call a native function, execute it.
return_value = function.call(...arguments...)
Using nested signatures, a function call can get function pointers as arguments.
The managed caller needs to pass a Polyglot executable object, that will be converted to a native function pointer.
When calling this function pointer from the native side, the execute
message is sent to the Polyglot object.
void native_function(int32_t (*fn)(int32_t)) {
printf("%d\n", fn(15));
}
signature = Polyglot.eval('nfi', '((SINT32):SINT32):VOID')
native_function = signature.bind(library['native_function'])
native_function.call(->(x) { x + 1 })
The arguments and return values of callback functions are converted the same as for regular function calls, with the conversion in the other direction, i.e., arguments are converted from native to managed, and return values are converted from managed to native.
Callback function pointers can themselves have function pointer arguments.
That works as you would expect: the function accepts a native function pointer as argument, and it is converted to a Truffle executable object.
Sending the execute
message to that object calls the native function pointer, same as calling a regular NFI function.
Function pointer types are also supported as return types.
You can optionally combine loading a library with loading symbols and binding them.
This is achieved with an extended load
command, which then returns an object with the already bound functions as methods.
These two examples are equivalent:
library = Polyglot.eval('nfi', 'load libSDL2.dylib')
symbol = library['SDL_GetRevisionNumber']
signature = Polyglot.eval('nfi', '():UINT32')
function = signature.bind(symbol)
puts function.call # => 12373
library = Polyglot.eval('nfi', 'load libSDL2.dylib { SDL_GetRevisionNumber():UINT32; }')
puts library.SDL_GetRevisionNumber # => 12373
The definitions in the curly braces {}
can contain multiple function bindings, so that many functions can be loaded from a library at once.
The load command can be prefixed by with
in order to select a specific NFI backend.
Multiple NFI backends are available.
The default is called native
, and will be used if there is no with
prefix, or the selected backend is not available.
Depending on the configuration of components you are running, available backends may include:
native
llvm
, which uses the GraalVM LLVM runtime to run the native codepanama
The Panama backend uses the Foreign Function and Memory APIs introduced by project Panama.
This backend only supports a subset of all the types. Specifically, it does not support STRING
, OBJECT
, ENV
, FP80
or array types.
Although less feature-complete, the backend is typically more performant.
It is available starting from JDK 22.
To build a native image that contains the Truffle NFI, it is sufficient to use the --language:nfi
argument, or specify Requires = language:nfi
in native-image.properties
.
It is possible to select what implementation to use for the native
backend using --language:nfi=<backend>
.
Note that the --language:nfi=<backend>
argument must come before any other arguments that might pull in the NFI as dependency via Requires = language:nfi
.
The first instance of language:nfi
wins and determines what backend will be built into the native image.
Available arguments for --language:nfi=<backend>
are:
libffi
(the default)none
Selecting the none
native backend will effectively disable access to native functions using the Truffle NFI.
This will break users of the NFI that rely on native access (e.g. the GraalVM LLVM Runtime, unless used with --llvm.managed
on EE).
The NFI can be used with unmodified, already compiled native code, but it can also be used with a Truffle-specific API being used by the native code.
The special type ENV
adds an additional parameter TruffleEnv *env
to the signature.
An additional simple type OBJECT
translates to an opaque TruffleObject
type.
The trufflenfi.h
header file provides declarations for working with these types, that can then be used by the native code called through the NFI.
See trufflenfi.h
for more documentation on this API.
This section describes in detail how argument values and return values are converted for all types in the function signature.
The following table shows the possible types in NFI signatures with their corresponding C language types on the native side, and what polyglot values these arguments map to on the managed side:
NFI type | C language type | Polyglot value |
---|---|---|
VOID |
void |
Polyglot object with isNull == true (only valid as return type). |
SINT8/16/32/64 |
int8/16/32/64_t |
Polyglot isNumber that fitsIn... the corresponding integer type. |
UINT8/16/32/64 |
uint8/16/32/64_t |
Polyglot isNumber that fitsIn... the corresponding integer type. |
FLOAT |
float |
Polyglot isNumber that fitsInFloat . |
DOUBLE |
double |
Polyglot isNumber that fitsInDouble . |
POINTER |
void * |
Polyglot object with isPointer == true or isNull == true . |
STRING |
char * (zero-terminated UTF-8 string) |
Polyglot isString . |
OBJECT |
TruffleObject |
Arbitrary object. |
[type] |
type * (array of primitive) |
Java host primitive array. |
(args):ret |
ret (*)(args) (function pointer type) |
Polyglot function with isExecutable == true . |
ENV |
TruffleEnv * |
nothing (injected argument) |
The following sections describe the type conversions in detail.
The type conversion behavior with function pointers can be slightly confusing, because the direction of the arguments is reversed. When in doubt, always try to figure out in which direction arguments or return values flow, from managed to native or from native to managed.
This type is only allowed as return type, and is used to denote functions that do not return a value.
Since in the Polyglot API, all executable objects have to return a value, a Polyglot object with isNull == true
will be returned from native functions that have a VOID
return type.
The return value of managed callback functions with return type VOID
will be ignored.
The primitive number types are converted as you might expect. The argument needs to be a Polyglot number, and its value needs to fit in the value range of the specified numeric type.
One thing to note is the handling of the unsigned integer types.
Even though the Polyglot API does not specify separate messages for values fitting in unsigned types, the conversion is still using the unsigned value ranges.
For example, the value 0xFF
passed from native to managed through a return value of type SINT8
will result in a Polyglot number -1
, which fitsInByte
.
But the same value returned as UINT8
results in a Polyglot number 255
, which does not fitsInByte
.
When passing numbers from managed code to native code, the signedness of the number is ignored, only the bits of the number are relevant.
So for example, passing -1
to an argument of type UINT8
is allowed, and the result on the native side is 255
, since it has the same bits as -1
.
The other way round, passing 255
to an argument of type SINT8
is also allowed, and the result on the native side is -1
.
Since in the current Polyglot API it is not possible to represent numbers outside of the signed 64-bit range, the UINT64
type is currently handled with signed semantics.
This is a known bug in the API, and will change in a future release.
This type is a generic pointer argument. On the native side, it does not matter what exact pointer type the argument is.
A polyglot object passed to POINTER
arguments will be converted to a native pointer if possible (using the isPointer
, asPointer
and toNative
messages as necessary).
An object with isNull == true
will be passed as a native NULL
.
POINTER
return values will produce a polyglot object with isPointer == true
.
The native NULL
pointer will additionally have isNull == true
.
This is a pointer type with special conversion semantics for strings.
Polyglot strings passed from managed to native using the STRING
type will be converted to a zero-terminated UTF-8 encoded string.
For STRING
arguments, the pointer is owned by the caller, and is guaranteed to stay alive for the duration of the call only.
The STRING
values returned from managed function pointers to a native caller are also owned by the caller.
They have to be freed with free
after use.
Polyglot pointer values or null values can also be passed to STRING
arguments.
The semantics is the same as for POINTER
arguments.
The user is responsible for ensuring that the pointer is a valid UTF-8 string.
The STRING
values passed from native functions to managed code behave like POINTER
return values, but in addition they have isString == true
.
The user is responsible for the ownership of the pointer and it might be necessary to free
the return value, depending on the semantics of the called native function.
After freeing the returned pointer, the returned polyglot string is invalid and reading it results in undefined behavior.
In that sense, the returned polyglot string is not a safe object, similar to a raw pointer.
It is recommented that the user of the NFI copies the returned string before passing it along to untrusted managed code.
This argument corresponds to the C type TruffleObject
.
This type is defined in trufflenfi.h
, and is an opaque pointer type.
A value of type TruffleObject
represents a reference to an arbitrary managed object.
Native code can do nothing with values of type TruffleObject
except pass them back to managed code, either through return values or passing them to callback function pointers.
The lifetime of TruffleObject
references needs to be managed manually.
See the documentation in trufflenfi.h
for API functions to manage the lifetime of TruffleObject
references.
A TruffleObject
passed as an argument is owned by the caller, and guaranteed to stay alive for the duration of the call.
A TruffleObject
reference returned from a callback function pointer is owned by the caller, and needs to be freed after use.
Returning a TruffleObject
from a native function does not transfer ownership (but there is an API function in trufflenfi.h
to do that).
This type is only allowed as an argument from managed code to a native function, and only arrays of primitive numeric types are supported.
On the managed side, only Java host objects containing a Java primitive array are supported. On the native side, the type is a pointer to the contents of the array. It is the user's responsibility to pass along the array length as a separate argument.
The pointer is valid for the duration of the native call only.
Modifications to the contents are propagated back to the Java array after returning from the call. The effects of concurrent access to the Java array during the native call are unspecified.
On the native side, a nested signature type corresponds to a function pointer with the given signature, calling back to managed code.
Polyglot executable objects passed from managed to native using a function pointer type are converted to a function pointer that can be called by the native code.
For function pointer arguments, the function pointer is owned by the caller, and is guaranteed to stay alive for the duration of the call only.
Function pointer return values are owned by the caller, and have to be freed manually.
See polyglot.h
for API functions to manage the lifetime of function pointer values.
Polyglot pointer values or null values can also be passed to function pointer arguments.
The semantics is the same as for POINTER
arguments.
The user is responsible for ensuring that the pointer is a valid function pointer.
Function pointer return types are the same as regular POINTER
return types, but in addition they are already bound to the given signature type.
They support the execute
message, and behave the same as regular NFI functions.
This type is a special argument of type TruffleEnv *
.
It is only valid as argument type, not as a return type.
It is an injected argument on the native side, there is no corresponding argument on the managed side.
When used as argument type of a native function, the native function will get an environment pointer on this position.
That environment pointer can be used to call API functions (see trufflenfi.h
).
The argument is injected, for example, if the signature is (SINT32, ENV, SINT32):VOID
.
This function object is expected to be called with two integer arguments, and the corresponding native function will be called with three arguments: first the first real argument, then the injected ENV
argument, and then the second real argument.
When the ENV
type is used as an argument type for a function pointer parameter, that function pointer must be called with a valid NFI environment as an argument.
If the caller already has an environment, threading it through to callback function pointers is more efficient than calling them without an ENV
argument.
Native functions must use the system's standard ABI. There is currently no support for alternative ABIs.