Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: dynamically get python udf from scripts table #3774

Closed

Conversation

xxxuuu
Copy link
Contributor

@xxxuuu xxxuuu commented Apr 22, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

#2434
#2532

What's changed and what's your intention?

Add a version column to the scripts table and retrieve the latest script from the database before each script execution. If it is newer than the one in the local cache, recompile it.

This solves the problem where a script updated on one frontend in a cluster with multiple frontends does not take effect on the other frontends(because the script compilation cache cannot be updated, or the script is not registered).

BREAKING CHANGE:
Since the scripts are now obtained through scripts table, when executing via UDF in SQL, the script name must be used instead of the function name. This fixes the issue mentioned in #2532. However, in cases where the script name and function name are different, if the program is using the function name, it will no longer work.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Apr 22, 2024
@xxxuuu xxxuuu force-pushed the feat/invalidate-script-cache branch from ea881ab to 9a06553 Compare April 22, 2024 17:34
@tisonkun
Copy link
Collaborator

Thanks for your contribution!

You may take a look at #3777. I suppose we may implement PyUDF in a more structural and proven way, said PL/Python, instead of hacky interpreting decorators and making new tribal rules.

For example, instead of making version names, we should use statement like:

CREATE FUNCTION pymax (a integer, b integer)
  RETURNS integer
AS $$
  if (a is None) or (b is None):
    return None
  if a > b:
    return a
  return b
$$ LANGUAGE plpython3u;

And then a function named pymax is always stored among the cluster, and no the confusing "name of file" vs. "name of function" vs. "name of name in args" things.

@tisonkun
Copy link
Collaborator

tisonkun commented Apr 23, 2024

If you have other proposal, we can discuss in #3777. Generally it's good to discuss before implementing code, to avoid going in the wrong direction in the first place.

I found #2434 has a lazy consensus now. Perhaps we can revisit this direction with #3777 in count.

For example, in the PL/Python way, we can have DROP FUNCTION to allow users delete the function and create a new one: https://www.postgresql.org/docs/current/sql-dropfunction.html

This should be more reliable than patching version in an already hacky implementation.

@xxxuuu
Copy link
Contributor Author

xxxuuu commented Apr 23, 2024

Thanks for your contribution!

You may take a look at #3777. I suppose we may implement PyUDF in a more structural and proven way, said PL/Python, instead of hacky interpreting decorators and making new tribal rules.

For example, instead of making version names, we should use statement like:

CREATE FUNCTION pymax (a integer, b integer)
  RETURNS integer
AS $$
  if (a is None) or (b is None):
    return None
  if a > b:
    return a
  return b
$$ LANGUAGE plpython3u;

And then a function named pymax is always stored among the cluster, and no the confusing "name of file" vs. "name of function" vs. "name of name in args" things.

Yes, this is a better approach, and I've also seen some other DBMSs doing it similar way. I will consider close this PR and then reevaluate/discuss this feature.

@xxxuuu xxxuuu closed this Apr 23, 2024
@tisonkun
Copy link
Collaborator

@xxxuuu Thanks for your reply. I'm open to respond in #3777 and support any effort to implement PyUDF in a cleaner CREATE FUNCTION way. If you're interested in this feature, feel free to drop a comment there so that we can describe more details.

@tisonkun
Copy link
Collaborator

cc @xxxuuu @waynexia @discord9

After another round of this issue, compared with the CREATE/DROP FUNCTION, I found that we can easily solve the original issue by banning "update scripts" but instead add a delete script API. Thus the script cache can be invalidated on deleted and otherwise it's always the corrected compiled cached ones.

Even if we'd later reimplement the scripting in CREATE/DROP FUNCTION, such handling of a system scripts table should leave the same.

@xxxuuu
Copy link
Contributor Author

xxxuuu commented Apr 23, 2024

I found that we can easily solve the original issue by banning "update scripts" but instead add a delete script API. Thus the script cache can be invalidated on deleted and otherwise it's always the corrected compiled cached ones.

Scripts created on one frontend are not registered on another frontend, and there is a similar issue with deletion. We still need to solve this issue.

@tisonkun
Copy link
Collaborator

tisonkun commented Apr 23, 2024

Scripts created on one frontend are not registered on another frontend, and there is a similar issue with deletion. We still need to solve this issue.

This is quite surprising. IIUC the script table should not be different with other normal tables, and thus it should be visible to every frontend instances.

@xxxuuu
Copy link
Contributor Author

xxxuuu commented Apr 23, 2024

This is quite surprising. IIUC the script table should not be different with other normal tables, and thus it should be visible to every frontend instances.

scripts table is visible, but it is not accessed when executed through SQL. Therefore, the script remains unregistered. This PR includes the modifications for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants