Grouping Related
Rows
Grouping is a
special type of sorting. With sorting, all the rows are sorted
by the columns specified. Grouping sorts the rows into groups
so that multi row functions can be specified at the group
level.
SQL>
select
avg(quantity)
from
sales;
AVG(QUANTITY)
-------------
1105B5
select
store_key,
avg(quantity)
from
sales
group by store_key;
The first example uses the average function
to provide us the average sales for all orders. The second
example provides the average quantity for each store. The
database sorts the rows into groups, with one group per store
key. It then feeds the groups, one at a time to the average
function that computes the average for that group. The GROUP BY
clause can be used with multiple columns and with multiple
functions.
SQL>
select
2 store_key,
3 book_key,
4 sum(quantity) total,
5 avg(quantity) Average,
6 count(quantity) Num
7 from
8 sales
9 group by store_key, book_key;
STOR
BOOK_K TOTAL AVERAGE NUM
---- ------ ---------- ---------- ----------
S101 B101 1000 1000 1
S101 B114 900 900 1
S101 B115 180 180 1
S101 B116 100 100 1
S102 B102 10 10 1
S102 B103 8900 8900 1
S102 B110 1400 700 2
S102 B111 1400 700 2
S102 B112 600 300 2
S102 B113 500 500 1
S102 B114 150 150 1
S102 B115 8800 8800 1
S102 B116 100 100 1
S103 B101 100 100 1
S103 B102 400 200 2
This example groups the retrieved records
by store_key and book_key. It then computes the average and
total quantities for each group. Lastly, it counts the number
of rows in that group.
When you use the GROUP BY clause, you must
include all columns in the SELECT clause. Otherwise, you will
get the error below (I removed the book_key).
SQL>
select
2 store_key,
3 book_key,
4 sum(quantity) total,
5 avg(quantity) Average,
6 count(quantity) Num
7 from
8 sales
9 group by store_key;
book_key,
*
ERROR at line 3:
ORA-00979: not a GROUP BY expression
Here, the database is saying that the
book_key is not part of the GROUP BY clause. You can, however,
include columns in the GROUP BY clause that are not in the
SELECT clause.
SQL>
select
2 sum(quantity) total,
3 avg(quantity) Average,
4 count(quantity) Num
5 from
6 sales
7 group by store_key, book_key;
TOTAL AVERAGE NUM
---------- ---------- ----------
1000 1000 1
900 900 1
180 180 1
100 100 1
10 10 1
8900 8900 1
1400 700 2
1400 700 2
600 300 2
…
All columns listed in the SELECT clause
that are not in a function must be included in the GROUP BY
clause. This is an ANSI
SQL requirement.
We need a way to limit the returned rows
after the grouping has been applied, such as rows with
sum(quantity) less than 500. We can’t use the WHERE clause
because it is used to limit the row set returned before the
grouping takes place, so when Oracle applies this limit, the
sum(quantity) has not yet been calculated.
SQL>
select
2 sum(quantity) Qty
3 from
4 sales
5 where sum(quantity) < 400
6 group by store_key, book_key;
where sum(quantity) < 400
*
ERROR at line 5:
ORA-00934: group function is not allowed here
To limit the returned rows after the
grouping, we use the HAVING
clause. The HAVING clause is applied after the grouping has
taken place.
SQL>
select
2 sum(quantity) Qty
3 from
4 sales
5 group by store_key, book_key
6 having sum(quantity) < 400;
QTY
----------
180
100
10
150
100
100
100
37 rows
selected.
Your SQL statement can have both a WHERE
clause and a HAVING
clause. The WHERE clause filters rows as they are selected from
the table, and before grouping, the HAVING clause filters rows
after the grouping.
Finally, if we want to order the rows that
are returned after grouping, we add the ORDER BY clause at the end.
SQL>
select
2 sum(quantity) Qty
3 from
4 sales
5 group by store_key, book_key
6 having sum(quantity) < 400
7 order by Qty;
QTY
----------
10
100
100
100
…
100
150
180
180
200
200
300
300
300
37 rows selected.
Because of our grouping, we can only order
the results set on an identifier that exists after the grouping,
in our case store_key, book_key or Qty. If we tried to use
order_number, we would get an error because the order_number has
no definition once we have grouped the rows.
Now that we have our results returned in
the order we want, it is time start including data from multiple
tables. For this, we need to join the tables together.