Aggregation

Table of contents

Introduction
GROUP BY Clause
Aggregation
Aggregation Functions
- COUNT
- SUM
- AVG
- MAX
- MIN
- VAR_POP
- VAR_SAMP
- VARIANCE
- STDDEV_POP
- STDDEV_SAMP
- STD
- STDDEV
- DISTINCT COUNT Aggregation
HAVING Clause
FILTER Clause

Introduction

Aggregate functions operate on sets of values. They are often used with a GROUP BY clause to group values into subsets.

GROUP BY Clause

Description

The GROUP BY expression could be

Identifier
Ordinal
Expression

Identifier

The group by expression could be identifier:

os> SELECT gender, sum(age) FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+------------+
| gender   | sum(age)   |
|----------+------------|
| F        | 28         |
| M        | 101        |
+----------+------------+

Ordinal

The group by expression could be ordinal:

os> SELECT gender, sum(age) FROM accounts GROUP BY 1;
fetched rows / total rows = 2/2
+----------+------------+
| gender   | sum(age)   |
|----------+------------|
| F        | 28         |
| M        | 101        |
+----------+------------+

Expression

The group by expression could be expression:

os> SELECT abs(account_number), sum(age) FROM accounts GROUP BY abs(account_number);
fetched rows / total rows = 4/4
+-----------------------+------------+
| abs(account_number)   | sum(age)   |
|-----------------------+------------|
| 1                     | 32         |
| 13                    | 28         |
| 18                    | 33         |
| 6                     | 36         |
+-----------------------+------------+

Aggregation

Description

The aggregation could be used select.
The aggregation could be used as arguments of expression.
The aggregation could has expression as arguments.

Aggregation in Select

The aggregation could be used select:

os> SELECT gender, sum(age) FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+------------+
| gender   | sum(age)   |
|----------+------------|
| F        | 28         |
| M        | 101        |
+----------+------------+

Expression over Aggregation

The aggregation could be used as arguments of expression:

os> SELECT gender, sum(age) * 2 as sum2 FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+--------+
| gender   | sum2   |
|----------+--------|
| F        | 56     |
| M        | 202    |
+----------+--------+

Expression as argument of Aggregation

The aggregation could has expression as arguments:

os> SELECT gender, sum(age * 2) as sum2 FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+--------+
| gender   | sum2   |
|----------+--------|
| F        | 56     |
| M        | 202    |
+----------+--------+

COUNT Aggregations

Besides regular identifiers, COUNT aggregate function also accepts arguments such as * or literals like 1. The meaning of these different forms are as follows:

COUNT(field) will count only if given field (or expression) is not null or missing in the input rows.
COUNT(*) will count the number of all its input rows.
COUNT(1) is same as COUNT(*) because any non-null literal will count.

Aggregation Functions

COUNT

Description

Usage: Returns a count of the number of expr in the rows retrieved by a SELECT statement.

Example:

os> SELECT gender, count(*) as countV FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+----------+
| gender   | countV   |
|----------+----------|
| F        | 1        |
| M        | 3        |
+----------+----------+

SUM

Description

Usage: SUM(expr). Returns the sum of expr.

Example:

os> SELECT gender, sum(age) as sumV FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+--------+
| gender   | sumV   |
|----------+--------|
| F        | 28     |
| M        | 101    |
+----------+--------+

AVG

Description

Usage: AVG(expr). Returns the average value of expr.

Example:

os> SELECT gender, avg(age) as avgV FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+--------------------+
| gender   | avgV               |
|----------+--------------------|
| F        | 28.0               |
| M        | 33.666666666666664 |
+----------+--------------------+

MAX

Description

Usage: MAX(expr). Returns the maximum value of expr.

Example:

os> SELECT max(age) as maxV FROM accounts;
fetched rows / total rows = 1/1
+--------+
| maxV   |
|--------|
| 36     |
+--------+

MIN

Description

Usage: MIN(expr). Returns the minimum value of expr.

Example:

os> SELECT min(age) as minV FROM accounts;
fetched rows / total rows = 1/1
+--------+
| minV   |
|--------|
| 28     |
+--------+

VAR_POP

Description

Usage: VAR_POP(expr). Returns the population standard variance of expr.

Example:

os> SELECT var_pop(age) as varV FROM accounts;
fetched rows / total rows = 1/1
+--------+
| varV   |
|--------|
| 8.1875 |
+--------+

VAR_SAMP

Description

Usage: VAR_SAMP(expr). Returns the sample variance of expr.

Example:

os> SELECT var_samp(age) as varV FROM accounts;
fetched rows / total rows = 1/1
+--------------------+
| varV               |
|--------------------|
| 10.916666666666666 |
+--------------------+

VARIANCE

Description

Usage: VARIANCE(expr). Returns the population standard variance of expr. VARIANCE() is a synonym VAR_POP() function.

Example:

os> SELECT variance(age) as varV FROM accounts;
fetched rows / total rows = 1/1
+--------+
| varV   |
|--------|
| 8.1875 |
+--------+

STDDEV_POP

Description

Usage: STDDEV_POP(expr). Returns the population standard deviation of expr.

Example:

os> SELECT stddev_pop(age) as stddevV FROM accounts;
fetched rows / total rows = 1/1
+--------------------+
| stddevV            |
|--------------------|
| 2.8613807855648994 |
+--------------------+

STDDEV_SAMP

Description

Usage: STDDEV_SAMP(expr). Returns the sample standard deviation of expr.

Example:

os> SELECT stddev_samp(age) as stddevV FROM accounts;
fetched rows / total rows = 1/1
+-------------------+
| stddevV           |
|-------------------|
| 3.304037933599835 |
+-------------------+

STD

Description

Usage: STD(expr). Returns the population standard deviation of expr. STD() is a synonym STDDEV_POP() function.

Example:

os> SELECT stddev_pop(age) as stddevV FROM accounts;
fetched rows / total rows = 1/1
+--------------------+
| stddevV            |
|--------------------|
| 2.8613807855648994 |
+--------------------+

STDDEV

Description

Usage: STDDEV(expr). Returns the population standard deviation of expr. STDDEV() is a synonym STDDEV_POP() function.

Example:

os> SELECT stddev(age) as stddevV FROM accounts;
fetched rows / total rows = 1/1
+--------------------+
| stddevV            |
|--------------------|
| 2.8613807855648994 |
+--------------------+

DISTINCT COUNT Aggregation

To get the count of distinct values of a field, you can add a keyword DISTINCT before the field in the count aggregation. Example:

os> SELECT COUNT(DISTINCT gender), COUNT(gender) FROM accounts;
fetched rows / total rows = 1/1
+--------------------------+-----------------+
| COUNT(DISTINCT gender)   | COUNT(gender)   |
|--------------------------+-----------------|
| 2                        | 4               |
+--------------------------+-----------------+

HAVING Clause

Description

A HAVING clause can serve as aggregation filter that filters out aggregated values satisfy the condition expression given.

HAVING with GROUP BY

Aggregate expressions or its alias defined in SELECT clause can be used in HAVING condition.

It's recommended to use non-aggregate expression in WHERE although it's allowed to do this in HAVING clause.
The aggregation in HAVING clause is not necessarily same as that on select list. As extension to SQL standard, it's also not restricted to involve identifiers only on group by list.

Here is an example for typical use of HAVING clause:

os> SELECT
...  gender, sum(age)
... FROM accounts
... GROUP BY gender
... HAVING sum(age) > 100;
fetched rows / total rows = 1/1
+----------+------------+
| gender   | sum(age)   |
|----------+------------|
| M        | 101        |
+----------+------------+

Here is another example for using alias in HAVING condition. Note that if an identifier is ambiguous, for example present both as a select alias and an index field, preference is alias. This means the identifier will be replaced by expression aliased in SELECT clause:

os> SELECT
...  gender, sum(age) AS s
... FROM accounts
... GROUP BY gender
... HAVING s > 100;
fetched rows / total rows = 1/1
+----------+-----+
| gender   | s   |
|----------+-----|
| M        | 101 |
+----------+-----+

HAVING without GROUP BY

Additionally, a HAVING clause can work without GROUP BY clause. This is useful because aggregation is not allowed to be present in WHERE clause:

os> SELECT
...  'Total of age > 100'
... FROM accounts
... HAVING sum(age) > 100;
fetched rows / total rows = 1/1
+------------------------+
| 'Total of age > 100'   |
|------------------------|
| Total of age > 100     |
+------------------------+

FILTER Clause

Description

A FILTER clause can set specific condition for the current aggregation bucket, following the syntax aggregation_function(expr) FILTER(WHERE condition_expr). If a filter is specified, then only the input rows for which the condition in the filter clause evaluates to true are fed to the aggregate function; other rows are discarded. The aggregation with filter clause can be use in SELECT clause only.

FILTER with GROUP BY

The group by aggregation with FILTER clause can set different conditions for each aggregation bucket. Here is an example to use FILTER in group by aggregation:

os> SELECT avg(age) FILTER(WHERE balance > 10000) AS filtered, gender FROM accounts GROUP BY gender
fetched rows / total rows = 2/2
+------------+----------+
| filtered   | gender   |
|------------+----------|
| 28.0       | F        |
| 32.0       | M        |
+------------+----------+

FILTER without GROUP BY

The FILTER clause can be used in aggregation functions without GROUP BY as well. For example:

os> SELECT
...   count(*) AS unfiltered,
...   count(*) FILTER(WHERE age > 34) AS filtered
... FROM accounts
fetched rows / total rows = 1/1
+--------------+------------+
| unfiltered   | filtered   |
|--------------+------------|
| 4            | 1          |
+--------------+------------+

Distinct count aggregate with FILTER

The FILTER clause is also used in distinct count to do the filtering before count the distinct values of specific field. For example:

os> SELECT COUNT(DISTINCT firstname) FILTER(WHERE age > 30) AS distinct_count FROM accounts
fetched rows / total rows = 1/1
+------------------+
| distinct_count   |
|------------------|
| 3                |
+------------------+

Files

aggregations.rst

Latest commit

History

aggregations.rst

File metadata and controls

Aggregation

Description

Description

Description

Description

Description

Description

Description

Description

Description

Description

Description

Description