Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add docs and example showing how to get the expression data type #9118

Merged
merged 3 commits into from
Feb 4, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions datafusion-examples/examples/expr_api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
use arrow::array::{BooleanArray, Int32Array};
use arrow::record_batch::RecordBatch;
use datafusion::arrow::datatypes::{DataType, Field, Schema, TimeUnit};
use datafusion::common::{DFField, DFSchema};
use datafusion::error::Result;
use datafusion::optimizer::simplify_expressions::{ExprSimplifier, SimplifyContext};
use datafusion::physical_expr::execution_props::ExecutionProps;
Expand All @@ -29,6 +30,7 @@ use datafusion_common::{ScalarValue, ToDFSchema};
use datafusion_expr::expr::BinaryExpr;
use datafusion_expr::interval_arithmetic::Interval;
use datafusion_expr::{ColumnarValue, ExprSchemable, Operator};
use std::collections::HashMap;
use std::sync::Arc;

/// This example demonstrates the DataFusion [`Expr`] API.
Expand All @@ -45,6 +47,7 @@ use std::sync::Arc;
/// 2. Evaluate [`Exprs`] against data: [`evaluate_demo`]
/// 3. Simplify expressions: [`simplify_demo`]
/// 4. Analyze predicates for boundary ranges: [`range_analysis_demo`]
/// 5. Get the types of the expressions: [`expression_type_demo`]
#[tokio::main]
async fn main() -> Result<()> {
// The easiest way to do create expressions is to use the
Expand All @@ -68,6 +71,9 @@ async fn main() -> Result<()> {
// See how to analyze ranges in expressions
range_analysis_demo()?;

// See how to get the type of the expression
r3stl355 marked this conversation as resolved.
Show resolved Hide resolved
expression_type_demo()?;

Ok(())
}

Expand Down Expand Up @@ -256,3 +262,38 @@ pub fn physical_expr(schema: &Schema, expr: Expr) -> Result<Arc<dyn PhysicalExpr

create_physical_expr(&expr, df_schema.as_ref(), &props)
}

fn expression_type_demo() -> Result<()> {
r3stl355 marked this conversation as resolved.
Show resolved Hide resolved
let expr = col("c");

// Using a schema where the column `foo` is of type Utf8
r3stl355 marked this conversation as resolved.
Show resolved Hide resolved
let schema = DFSchema::new_with_metadata(
vec![DFField::new_unqualified("c", DataType::Utf8, true)],
HashMap::new(),
)
.unwrap();
assert_eq!("Utf8", format!("{}", expr.get_type(&schema).unwrap()));

// Using a schema where the column `foo` is of type Int32
let schema = DFSchema::new_with_metadata(
vec![DFField::new_unqualified("c", DataType::Int32, true)],
HashMap::new(),
)
.unwrap();
assert_eq!("Int32", format!("{}", expr.get_type(&schema).unwrap()));

// Get the type of an expression that adds 2 columns. Adding an Int32
r3stl355 marked this conversation as resolved.
Show resolved Hide resolved
// and Float32 results in Float32 type
let expr = col("c1") + col("c2");
let schema = DFSchema::new_with_metadata(
vec![
DFField::new_unqualified("c1", DataType::Int32, true),
DFField::new_unqualified("c2", DataType::Float32, true),
],
HashMap::new(),
)
.unwrap();
assert_eq!("Float32", format!("{}", expr.get_type(&schema).unwrap()));

Ok(())
}
54 changes: 54 additions & 0 deletions datafusion/expr/src/expr_schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,60 @@ impl ExprSchemable for Expr {
///
/// Note: [DFSchema] implements [ExprSchema].
///
/// # Examples
///
/// ## Get the type of a single column expression using different schemas
///
/// ```
/// # use arrow::datatypes::DataType;
/// # use datafusion_common::{DFField, DFSchema};
/// # use datafusion_expr::{col, ExprSchemable};
/// # use std::collections::HashMap;
///
/// fn main() {
/// let expr = col("c");
///
/// // Using a schema where the column `c` is of type Utf8
/// let schema = DFSchema::new_with_metadata(
/// vec![DFField::new_unqualified("c", DataType::Utf8, true)],
/// HashMap::new(),
/// )
/// .unwrap();
/// assert_eq!("Utf8", format!("{}", expr.get_type(&schema).unwrap()));
///
/// // Using a schema where the column `c` is of type Int32
/// let schema = DFSchema::new_with_metadata(
/// vec![DFField::new_unqualified("c", DataType::Int32, true)],
/// HashMap::new(),
/// )
/// .unwrap();
/// assert_eq!("Int32", format!("{}", expr.get_type(&schema).unwrap()));
/// }
/// ```
///
/// ## Get the type of an expression that adds 2 columns. Adding an Int32
r3stl355 marked this conversation as resolved.
Show resolved Hide resolved
/// ## and Float32 results in Float32 type
///
/// ```
/// # use arrow::datatypes::DataType;
/// # use datafusion_common::{DFField, DFSchema};
/// # use datafusion_expr::{col, ExprSchemable};
/// # use std::collections::HashMap;
///
/// fn main() {
/// let expr = col("c1") + col("c2");
/// let schema = DFSchema::new_with_metadata(
/// vec![
/// DFField::new_unqualified("c1", DataType::Int32, true),
/// DFField::new_unqualified("c2", DataType::Float32, true),
/// ],
/// HashMap::new(),
/// )
/// .unwrap();
/// assert_eq!("Float32", format!("{}", expr.get_type(&schema).unwrap()));
/// }
/// ```
///
/// # Errors
///
/// This function errors when it is not possible to compute its
Expand Down
28 changes: 28 additions & 0 deletions docs/source/library-user-guide/working-with-exprs.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,34 @@ Projection: Int64(1) + Int64(1) AS added_one

I.e. the `add_one` UDF has been inlined into the projection.

## Getting the data type of the expression

The `arrow::datatypes::DataType` of the expression can be obtained by calling the `get_type` given the schema
r3stl355 marked this conversation as resolved.
Show resolved Hide resolved

```rust
use arrow_schema::DataType;
use datafusion::common::{DFField, DFSchema};
use datafusion::logical_expr::{col, ExprSchemable};
use std::collections::HashMap;

let expr = col("c1") + col("c2");
let schema = DFSchema::new_with_metadata(
vec![
DFField::new_unqualified("c1", DataType::Int32, true),
DFField::new_unqualified("c2", DataType::Float32, true),
],
HashMap::new(),
)
.unwrap();
print!("type = {}", expr.get_type(&schema).unwrap());
```

This results in the following output:

```text
type = Float32
```

## Conclusion

In this guide, we've seen how to create `Expr`s programmatically and how to rewrite them. This is useful for simplifying and optimizing `Expr`s. We've also seen how to test our rule to ensure it works properly.
Loading