Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call user-defined functions #4542

Open
vanillajonathan opened this issue Jun 5, 2024 · 13 comments
Open

Call user-defined functions #4542

vanillajonathan opened this issue Jun 5, 2024 · 13 comments

Comments

@vanillajonathan
Copy link
Collaborator

What's up?

I would like to call a user-defined function (UDF) named get_magic_name and pass it the name column as parameter.

from animals
derive {
  magic_name = get_magic_name name
}

Error:

Error: 
   ╭─[:3:16]
   │
 3 │   magic_name = get_magic_name name
   │                ───────┬──────  
   │                       ╰──────── expected a function, but found `this.animals.get_magic_name`
───╯

Perhaps user-defined functions could be reached from a pseudo namespace called udf like:

from animals
derive {
  magic_name = udf.get_magic_name name
}
@max-sixty
Copy link
Member

Can we adjust the example so get_magic_name is defined?

@vanillajonathan
Copy link
Collaborator Author

But get_magic_name is not defined in the PRQL script because it is a user-defined function (UDF) that has been created on the database server using the CREATE FUNCTION SQL statement so PRQL doesn't know about it.

@m-span
Copy link
Collaborator

m-span commented Jun 6, 2024

Perhaps get_magic_name can be implemented using s-strings?

@max-sixty
Copy link
Member

But get_magic_name is not defined in the PRQL script because it is a user-defined function (UDF) that has been created on the database server using the CREATE FUNCTION SQL statement so PRQL doesn't know about it.

Ah I see. We could have a call_udf function, which I think could be an s-string...

Can you confirm what the SQL output would look like?

@vanillajonathan
Copy link
Collaborator Author

I suppose it would look like this:

SELECT
  *,
  get_magic_name(name) AS magic_name
FROM
  animals

@max-sixty
Copy link
Member

OK great.

I think this can be handled by an s-string at the moment.

If there were substantial demand we could add a call_udf function, but it might be so trivial that it wouldn't be worthwhile.

What do you think about adding an example to the docs?

@snth
Copy link
Member

snth commented Jun 6, 2024

Hi @vanillajonathan ,

I once answered a question on HackerNews which I think speaks to what you are looking for: https://news.ycombinator.com/item?id=37569946

Unfortunately the JSON DuckDB extension isn't loaded in the playground so the example doesn't produce Query Results but hopefully the SQL produced explains enough:

> prqlc compile <<EOF
let get = func path obj -> s"""{obj} -> {path}"""
let getstr = func path obj -> s"""{obj} ->> {path}"""
let extract = func obj path -> s"""json_extract({obj}, {path})"""

from [{data='{"duck": [1, 2, 3]}'}]
select { data | get '$.duck[0]', data | getstr '$.duck[1]', extract data '$.duck[2]' }
EOF
WITH table_0 AS (
  SELECT
    '{"duck": [1, 2, 3]}' AS data
)
SELECT
  data -> '$.duck[0]',
  data ->> '$.duck[1]',
  json_extract(data, '$.duck[2]')
FROM
  table_0

-- Generated by PRQL compiler version:0.9.4 (https://prql-lang.org)

I've been meaning to add this to the docs somewhere as I think JSON functionality is something that would speak to a lot of people. Will try and get to it.

@vanillajonathan
Copy link
Collaborator Author

I think this is tricky to use.

You cannot just declare an line s-string like:

from animals
derive {
  magic_name = s"get_magic_name" name
}

because you get the error:

Error: 
   ╭─[:3:16]
   │
 3 │   magic_name = s"get_magic_name" name
   │                ────────┬────────  
   │                        ╰────────── expected a function, but found `s"get_magic_name"`
───╯

So you have to do declare the function with a let statement before you can use it.

let get_magic_name = param_name -> s"""get_magic_name({param_name})"""

from animals
derive {
  magic_name = name | get_magic_name
}

which is a bit cumbersome. It would be nice to not have to do that. Would be easier:

from animals
derive {
  magic_name = name | udf.get_magic_name
}

@snth
Copy link
Member

snth commented Jun 7, 2024

Oh I see. That's an interesting idea.

So in that case it would have to infer how many parameters it is being passed and then generate AST node equivalent to the corresponding s-string. Not my area of expertise how feasible that is at the moment but I could see it working in principle.

We had a discussion elsewhere about a possible db namespace for database objects. That would also make it easier for the resolver whether it should find a name internally or just assume that it exists in the database. In that case I would suggest something like the following syntax:

from animals
derive {
  magic_name = name | db.fn.get_magic_name
}

That should indicate to the compiler that get_magic_name is a function that exists in the database and then it can simply treat it as such. One problem with that approach is that the return type becomes opaque to the compiler. That's not such a problem at the moment but as we try to formalise the type system that might be a problem. Forcing the let statement function definition allows you to provide the type annotations, I guess like a foreign function interface (FFI) in other languages. Maybe that could be a requirement for a "strict" mode which is relaxed otherwise.

@vanillajonathan
Copy link
Collaborator Author

Yeah!

@max-sixty
Copy link
Member

max-sixty commented Jun 7, 2024

I definitely see the case here.

I'm not sure the proposal quite fits — it uses a namespace to indicate a type. TBC, it's not crazy — maybe we store all the UDFs in a namespace. But in this case it works for any string, which isn't quite consistent — it's like a magic namespace where we can add anything — sounds like a type!

We could have a function like udf:

let udf = function_name param_name -> s"""{function_name}({param_name})"""

from animals
derive {
  magic_name = name | (udf get_magic_name)
}

...though we would need to implement something to allow passing a tuple of params; this only supports a single param...

@vanillajonathan
Copy link
Collaborator Author

Yeah, a udf function to call user-defined functions would be okay too, it would be better than what we have right now, because right now it is rather awkward having declare functions using s-strings prior to using them.

With risk of sounding whiny, PRQL is meant to make our lives easier but I feel that:

from animals
derive {
  magic_name = name | (udf get_magic_name)
}

looks harder to read, write and understand than SQL:

SELECT
  *,
  get_magic_name(name) AS magic_name
FROM
  animals

@kgutwin
Copy link
Collaborator

kgutwin commented Jun 7, 2024

A couple more thoughts to add to this interesting discussion!

The query above can be written fully equivalently as:

from animals
derive {
  magic_name = udf get_magic_name name
}

which is really close to the original request (just with udf inserted in the right spot). It also might be able to be treated by the compiler as sugar for udf.get_magic_name or db.fn.get_magic_name, which are related to the proposals above. If we're going down the sugar route, we could also consider funky syntax like &get_magic_name.

Also, if PRQL doesn't have awareness of the database schema, then it isn't going to actually know if get_magic_name is a UDF or a database-native function. That's either a good thing or a bad thing, depending how you look at it, because it allows writing:

from animals
derive {
  full_name = udf concat first_name " " last_name
}

which would compile to:

SELECT
  *,
  concat(first_name, ' ', last_name) AS full_name
FROM
  animals

This would end up being another "escape hatch" to regular SQL, just as S-strings are right now.


I'm not sure if I have a preference in any of this. I think my tendency would be to declare all of the UDFs that I care about in a module, which would allow for type checking and make things a bit more explicit - but I'm not sure this is the most user-friendly.

module fns {
  let get_magic_name = func name <text> -> <text> s"get_magic_name({_param.name})"
}

from animals
derive {
  magic_name = fns.get_magic_name name
}

The above snippet works in the playground right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants