Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Refactor expressions docs layout #1816

Merged
merged 10 commits into from
Feb 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion daft/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
from daft.daft import ImageFormat, ImageMode, ResourceRequest
from daft.dataframe import DataFrame
from daft.datatype import DataType, TimeUnit
from daft.expressions import col, lit
from daft.expressions import Expression, col, lit

Check warning on line 69 in daft/__init__.py

View check run for this annotation

Codecov / codecov/patch

daft/__init__.py#L69

Added line #L69 was not covered by tests
from daft.io import from_glob_path, read_csv, read_iceberg, read_json, read_parquet
from daft.series import Series
from daft.udf import udf
Expand All @@ -85,6 +85,7 @@
"read_parquet",
"read_iceberg",
"DataFrame",
"Expression",
"col",
"DataType",
"ImageMode",
Expand Down
4 changes: 2 additions & 2 deletions daft/dataframe/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from __future__ import annotations

from .dataframe import DataFrame
from .dataframe import DataFrame, GroupedDataFrame

__all__ = ["DataFrame"]
__all__ = ["DataFrame", "GroupedDataFrame"]
3 changes: 2 additions & 1 deletion daft/dataframe/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -662,10 +662,11 @@ def repartition(self, num: Optional[int], *partition_by: ColumnInputType) -> "Da
random repartitioning will occur.

.. NOTE::

This function will globally shuffle your data, which is potentially a very expensive operation.

If instead you merely wish to "split" or "coalesce" partitions to obtain a target number of partitions,
you mean instead wish to consider using :meth:`DataFrame.into_parititions`<daft.DataFrame.into_partitions>
you mean instead wish to consider using :meth:`DataFrame.into_partitions <daft.DataFrame.into_partitions>`
which avoids shuffling of data in favor of splitting/coalescing adjacent partitions where appropriate.

Example:
Expand Down
47 changes: 37 additions & 10 deletions daft/expressions/expressions.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from __future__ import annotations

import builtins
import os
import sys
from datetime import date, datetime
from typing import TYPE_CHECKING, Callable, Iterable, Iterator, TypeVar, overload
Expand Down Expand Up @@ -28,6 +29,32 @@
from daft.io import IOConfig


# Implementation taken from: https://github.com/pola-rs/polars/blob/main/py-polars/polars/utils/various.py#L388-L399
# This allows Sphinx to correctly work against our "namespaced" accessor functions by overriding @property to
# return a class instance of the namespace instead of a property object.
accessor_namespace_property: type[property] = property
if os.getenv("DAFT_SPHINX_BUILD") == "1":
from typing import Any

# when building docs (with Sphinx) we need access to the functions

Check warning on line 39 in daft/expressions/expressions.py

View check run for this annotation

Codecov / codecov/patch

daft/expressions/expressions.py#L39

Added line #L39 was not covered by tests
# associated with the namespaces from the class, as we don't have
# an instance; @sphinx_accessor is a @property that allows this.
NS = TypeVar("NS")

class sphinx_accessor(property): # noqa: D101

Check warning on line 44 in daft/expressions/expressions.py

View check run for this annotation

Codecov / codecov/patch

daft/expressions/expressions.py#L44

Added line #L44 was not covered by tests
def __get__( # type: ignore[override]
self,
instance: Any,

Check warning on line 47 in daft/expressions/expressions.py

View check run for this annotation

Codecov / codecov/patch

daft/expressions/expressions.py#L46-L47

Added lines #L46 - L47 were not covered by tests
cls: type[NS],
) -> NS:
try:
return self.fget(instance if isinstance(instance, cls) else cls) # type: ignore[misc]
except (AttributeError, ImportError):
return self # type: ignore[return-value]

accessor_namespace_property = sphinx_accessor

Check warning on line 55 in daft/expressions/expressions.py

View check run for this annotation

Codecov / codecov/patch

daft/expressions/expressions.py#L52-L55

Added lines #L52 - L55 were not covered by tests


Check warning on line 57 in daft/expressions/expressions.py

View check run for this annotation

Codecov / codecov/patch

daft/expressions/expressions.py#L57

Added line #L57 was not covered by tests
def lit(value: object) -> Expression:
"""Creates an Expression representing a column with every value set to the provided value

Expand Down Expand Up @@ -72,47 +99,47 @@


class Expression:
_expr: _PyExpr
_expr: _PyExpr = None # type: ignore

def __init__(self) -> None:
raise NotImplementedError("We do not support creating a Expression via __init__ ")

@property
@accessor_namespace_property
def str(self) -> ExpressionStringNamespace:
"""Access methods that work on columns of strings"""
return ExpressionStringNamespace.from_expression(self)

@property
@accessor_namespace_property
def dt(self) -> ExpressionDatetimeNamespace:
"""Access methods that work on columns of datetimes"""
return ExpressionDatetimeNamespace.from_expression(self)

@property
@accessor_namespace_property
def float(self) -> ExpressionFloatNamespace:
"""Access methods that work on columns of floats"""
return ExpressionFloatNamespace.from_expression(self)

@property
@accessor_namespace_property
def url(self) -> ExpressionUrlNamespace:
"""Access methods that work on columns of URLs"""
return ExpressionUrlNamespace.from_expression(self)

@property
@accessor_namespace_property
def list(self) -> ExpressionListNamespace:
"""Access methods that work on columns of lists"""
return ExpressionListNamespace.from_expression(self)

@property
@accessor_namespace_property
def struct(self) -> ExpressionStructNamespace:
"""Access methods that work on columns of structs"""
return ExpressionStructNamespace.from_expression(self)

@property
@accessor_namespace_property
def image(self) -> ExpressionImageNamespace:
"""Access methods that work on columns of images"""
return ExpressionImageNamespace.from_expression(self)

@property
@accessor_namespace_property
def partitioning(self) -> ExpressionPartitioningNamespace:
"""Access methods that support partitioning operators"""
return ExpressionPartitioningNamespace.from_expression(self)
Expand Down Expand Up @@ -448,7 +475,7 @@
) -> Expression:
"""Treats each string as a URL, and downloads the bytes contents as a bytes column

..NOTE::
.. NOTE::
If you are observing excessive S3 issues (such as timeouts, DNS errors or slowdown errors) during URL downloads,
you may wish to reduce the value of ``max_connections`` (defaults to 32) to reduce the amount of load you are placing
on your S3 servers.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,19 @@ Configure Daft in various ways during execution.

daft.set_planning_config
daft.set_execution_config

I/O Configurations
******************

Configure behavior when Daft interacts with storage (e.g. credentials, retry policies and various other knobs to control performance/resource usage)

These configurations are most often used as inputs to Daft DataFrame reading I/O functions such as in :doc:`creation`.

.. autosummary::
:nosignatures:
:toctree: doc_gen/io_configs

daft.io.IOConfig
daft.io.S3Config
daft.io.GCSConfig
daft.io.AzureConfig
2 changes: 1 addition & 1 deletion docs/source/api_docs/creation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Data Catalogs
-------------

Apache Iceberg
^^^^^^^^^^^^^^
~~~~~~~~~~~~~~

.. autosummary::
:nosignatures:
Expand Down
73 changes: 41 additions & 32 deletions docs/source/api_docs/dataframe.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,15 @@ DataFrame
Data Manipulation
#################

Selecting Columns
*****************

.. autosummary::
:nosignatures:
:toctree: doc_gen/dataframe_methods

DataFrame.__getitem__

Manipulating Columns
********************

Expand All @@ -28,10 +37,10 @@ Manipulating Columns
:nosignatures:
:toctree: doc_gen/dataframe_methods

daft.DataFrame.select
daft.DataFrame.with_column
daft.DataFrame.exclude
daft.DataFrame.explode
DataFrame.select
DataFrame.with_column
DataFrame.exclude
DataFrame.explode

Filtering Rows
**************
Expand All @@ -43,10 +52,10 @@ Filtering Rows
:nosignatures:
:toctree: doc_gen/dataframe_methods

daft.DataFrame.distinct
daft.DataFrame.where
daft.DataFrame.limit
daft.DataFrame.sample
DataFrame.distinct
DataFrame.where
DataFrame.limit
DataFrame.sample

Reordering
**********
Expand All @@ -57,8 +66,8 @@ Reordering
:nosignatures:
:toctree: doc_gen/dataframe_methods

daft.DataFrame.sort
daft.DataFrame.repartition
DataFrame.sort
DataFrame.repartition

Combining
*********
Expand All @@ -69,8 +78,8 @@ Combining
:nosignatures:
:toctree: doc_gen/dataframe_methods

daft.DataFrame.join
daft.DataFrame.concat
DataFrame.join
DataFrame.concat

.. _df-aggregations:

Expand All @@ -85,13 +94,13 @@ Aggregations
:nosignatures:
:toctree: doc_gen/dataframe_methods

daft.DataFrame.groupby
daft.DataFrame.sum
daft.DataFrame.mean
daft.DataFrame.count
daft.DataFrame.min
daft.DataFrame.max
daft.DataFrame.agg
DataFrame.groupby
DataFrame.sum
DataFrame.mean
DataFrame.count
DataFrame.min
DataFrame.max
DataFrame.agg

Execution
#########
Expand All @@ -106,7 +115,7 @@ Materialization
:nosignatures:
:toctree: doc_gen/dataframe_methods

daft.DataFrame.collect
DataFrame.collect

Visualization
*************
Expand All @@ -117,7 +126,7 @@ Visualization
:nosignatures:
:toctree: doc_gen/dataframe_methods

daft.DataFrame.show
DataFrame.show


.. _df-write-data:
Expand All @@ -131,8 +140,8 @@ Writing Data
:nosignatures:
:toctree: doc_gen/dataframe_methods

daft.DataFrame.write_parquet
daft.DataFrame.write_csv
DataFrame.write_parquet
DataFrame.write_csv

Integrations
************
Expand All @@ -143,12 +152,12 @@ Integrations
:nosignatures:
:toctree: doc_gen/dataframe_methods

daft.DataFrame.to_arrow
daft.DataFrame.to_pandas
daft.DataFrame.to_torch_map_dataset
daft.DataFrame.to_torch_iter_dataset
daft.DataFrame.to_ray_dataset
daft.DataFrame.to_dask_dataframe
DataFrame.to_arrow
DataFrame.to_pandas
DataFrame.to_torch_map_dataset
DataFrame.to_torch_iter_dataset
DataFrame.to_ray_dataset
DataFrame.to_dask_dataframe

Schema and Lineage
##################
Expand All @@ -157,6 +166,6 @@ Schema and Lineage
:nosignatures:
:toctree: doc_gen/dataframe_methods

daft.DataFrame.explain
daft.DataFrame.schema
daft.DataFrame.column_names
DataFrame.explain
DataFrame.schema
DataFrame.column_names
Loading
Loading