Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fix type annotation on UDF #1807

Merged
merged 2 commits into from
Jan 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions daft/udf.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import functools
import inspect
import types
from typing import Callable
from typing import TYPE_CHECKING, Callable, Union

from daft.datatype import DataType
from daft.expressions import Expression
Expand All @@ -16,7 +16,10 @@
except ImportError:
_NUMPY_AVAILABLE = False

UserProvidedPythonFunction = Callable[..., Series]
if TYPE_CHECKING:
import numpy as np

UserProvidedPythonFunction = Callable[..., Union[Series, "np.ndarray", list]]


@dataclasses.dataclass(frozen=True)
Expand Down
2 changes: 1 addition & 1 deletion docs/source/10-min.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -834,7 +834,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For a full list of all Expression methods and operators, see: [Expressions API Docs](../api_docs/expressions.rst)"
"For a full list of all Expression methods and operators, see: [Expressions API Docs](api_docs/expressions.rst)"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/source/api_docs/expressions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ Example: ``e1.list.join(e2)``


Structs
******
*******

Operations on structs, accessible through the :meth:`Expression.image <daft.expressions.Expression.struct>` method accessor:

Expand Down
6 changes: 3 additions & 3 deletions docs/source/user_guide/poweruser/memory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ Spilling to disk is a mechanism that Daft uses to ensure workload completion in

There are some things you can do that will help with this.

1. Use machines with more available memory per-CPU to increase each Ray worker's available memory (e.g. `AWS EC2 r5 instances <https://duckdb.org/docs/api/python/spark_api.html>`_)
1. Use machines with more available memory per-CPU to increase each Ray worker's available memory (e.g. `AWS EC2 r5 instances <https://aws.amazon.com/ec2/instance-types/r5/>`_)
2. Use more machines in your cluster to increase overall cluster memory size
3. Use machines with attached local nvme SSD drives for higher throughput when spilling (e.g. `AWS EC2 r5d instances <https://duckdb.org/docs/api/python/spark_api.html>`_)
3. Use machines with attached local nvme SSD drives for higher throughput when spilling (e.g. AWS EC2 r5d instances)

For more troubleshooting, you may also wish to consult the `Ray documentation's recommendations for object spilling <https://docs.ray.io/en/latest/ray-core/objects/object-spilling.html>`_.

Expand All @@ -51,7 +51,7 @@ These OOMKills are often recoverable (Daft-on-Ray will take care of retrying wor

There are some options available to you.

1. Use machines with more available memory per-CPU to increase each Ray worker's available memory (e.g. `AWS EC2 r5 instances <https://aws.amazon.com/ec2/instance-types/r5/>`_)
1. Use machines with more available memory per-CPU to increase each Ray worker's available memory (e.g. AWS EC2 r5 instances)
2. Use more machines in your cluster to increase overall cluster memory size
3. Aggressively filter your data so that Daft can avoid reading data that it does not have to (e.g. ``df.where(...)``)
4. Request more memory for your UDFs (see: :ref:`resource-requests`) if your UDFs are memory intensive (e.g. decompression of data, running large matrix computations etc)
Expand Down
Loading