OLAP functions

ROLLUP / CUBE / GROUPING
CUME_DIST
FIRST_VALUE
LAST_VALUE
RATIO_TO_REPORT
ROW_NUMBER
NTILE
WIDTH_BUCKET
RANK
DENSE_RANK
PERCENT_RANK
PERCENTILE_CONT
LAG
LEAD
SUM, AVG, MIN, MAX, VARIANCE, STDDEV, COUNT
KEEP
OLAP Windows
- ROWS
- RANGE

ROLLUP / CUBE / GROUPING

ROLLUP will create subtotals, much like a CONTROL-BREAK report would. If you GROUP BY ROLLUP(a,b,c), it'll produce subtotals by A,B and A. So, you get the results equivalent to:

SELECT a,b,c, <aggregates>
  FROM t
 GROUP by a,b,c
UNION ALL
SELECT a,b,NULL,<aggregates>
  FROM t
 GROUP by a,b,NULL
UNION ALL
SELECT a,NULL,NULL,<aggregates>
  FROM t
 GROUP by a,NULL,NULL
UNION ALL
SELECT NULL,NULL,NULL,<aggregates>
  FROM t
 GROUP by NULL,NULL,NULL
/

but they will be nicely ordered, i.e.

SQL> SELECT
  2    a, b, SUM(b),
  3    DECODE(GROUPING(A), 1, 'SUM OF B FOR ALL A') title1,
  4    DECODE(GROUPING(B), 1, 'SUM OF B FOR A = ' || a) title2
  5  FROM   t
  6  GROUP BY ROLLUP (B,A)
  7  /

         A          B     SUM(B) TITLE1               TITLE2
---------- ---------- ---------- -------------------- --------------------
         1         10         10
         1         20         20
         1                    30                      SUM OF B FOR A = 1
         2         20         20
         2                    20                      SUM OF B FOR A = 2
         3         30         30
         3                    30                      SUM OF B FOR A = 3
         4         40         40
         4                    40                      SUM OF B FOR A = 4
                             120 SUM OF B FOR ALL A   SUM OF B FOR A =

10 rows selected.

*DEV NOTE*
Note, the use of the GROUPING function, which we can use to determine if the current row is an additional row (i.e. aggregated row) which is the grouped "extra" row(s) from the OLAP operation.

CUBE will create every combination -- you'll get the aggregates by:

a,b,c
a,b
a,c
b,c
a
b
c
NULL

However, whereas ROLLUP will guarantee a nice "report", i.e. aggregate totals at the "end" of the result set, CUBE does not.

CUME_DIST

CUME_DIST(x) is the number of values in the set (S) coming before and including x in the specified order divided by the number of rows in S.

The following example shows this :

SQL> SELECT a, CUME_DIST() OVER (ORDER BY a)
  2  FROM t;

         A CUME_DIST()OVER(ORDERBYA)
---------- -------------------------
        10                .333333333
        20                .666666667
        30                         1

3 rows selected.

This can be explained as that 2/3 of the values come before (or include) the value 20, etc.

FIRST_VALUE

FIRST_VALUE allows you access to the first row in the current partition window, i.e.

SQL> SELECT a, FIRST_VALUE(a) OVER ()
  2  FROM t;

         A FIRST_VALUE(A)OVER()
---------- --------------------
        10                   10
        20                   10

2 rows selected.

LAST_VALUE

LAST_VALUE allows you access to the last row in the current partition window, i.e.

SQL> SELECT a, LAST_VALUE(a) OVER ()
  2  FROM t;

         A LAST_VALUE(A)OVER()
---------- --------------------
        10                   20
        20                   20

2 rows selected.

*IMPORTANT*
Note, that it is important to specify the "windowing clause" of the OLAP functions when using functions such as LAST_VALUE which have to operate on the whole set "further on" than the current row, when specifying ORDER BY clauses etc in the OVER () statement.

The reason is that the default windowing clause is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, and this will not take into account rows after the current row. For example :

SQL> SELECT a, LAST_VALUE(a) OVER (ORDER BY a)
  2    FROM t;

         A LAST_VALUE(A)OVER(ORDERBYA)
---------- ---------------------------
        10                          10
        20                          20

See how the value seems incorrect. What is actually being executed is :

SQL> SELECT a, LAST_VALUE(a) OVER (ORDER BY a
  2                                 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
  3    FROM t;

         A LAST_VALUE(A)OVER(ORDERBYAROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW)
---------- ---------------------------------------------------------------------
        10                                                                    10
        20                                                                    20

You need to correctly think about the OLAP window in this situation.

Note, in this case, the correct statement is :

SQL> SELECT a, LAST_VALUE(a) OVER (ORDER BY a
  2                                 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) x
  3    FROM t;

         A          X
---------- ----------
        10         20
        20         20

RATIO_TO_REPORT

The RATIO_TO_REPORT function computes the ratio of a value to the sum of a set of values. If expr evaluates to NULL, RATIO_TO_REPORT also evaluates to NULL, but it is treated as zero for computing the sum of values for the denominator.

Its syntax is:

RATIO_TO_REPORT ( expr ) OVER ( [query_partition_clause] )

SQL> CREATE TABLE t ( a NUMBER(10) );

Table created.

SQL> INSERT INTO t VALUES (10);

1 row created.

SQL> /

1 row created.

SQL> /

1 row created.

SQL> SELECT a, RATIO_TO_REPORT(a) OVER ()
  2  FROM T;

         A RATIO_TO_REPORT(A)OVER()
---------- ------------------------
        10               .333333333
        10               .333333333
        10               .333333333

3 rows selected.

*DEV NOTE*
You can mimic the functionality of RATIO_TO_REPORT, by using SUM, (which is what RATIO_TO_REPORT will do anyway), but it'll be quicker to do the division in a built-in (and it'll be easier to read and understand what's going on), i.e.

SQL> SELECT a, a / SUM(a) OVER ()
  2  FROM t;

         A A/SUM(A)OVER()
---------- --------------
        10     .333333333
        10     .333333333
        10     .333333333

3 rows selected.

Note, that the comparison between SUM and RATIO_TO_REPORT is not strictly speaking correct (although it explains the situation well), the difference is that RATIO_TO_REPORT will return NULL if the SUM(a) OVER () is zero, so the correct comparison (to prevent the ORA-01476: divisor is equal to zero error) is :

RATIO_TO_REPORT(a) OVER () = a / DECODE(SUM(a) OVER (), 0, NULL, SUM(a) OVER ())

which assumes that a divided by NULL = NULL (which it is in most database platforms), but to be strictly correct,

RATIO_TO_REPORT(a) OVER () = DECODE(SUM(a) OVER (), 0, NULL, a / SUM(a) OVER ())

which, again, DRAMATICALLY shows the improvement in readability of RATIO_TO_REPORT over SUM.

ROW_NUMBER

ROW_NUMBER is a mechanism for doing ROWNUM-type operations but which can be then partitioned over the current partition window. For example, let's say I had a table, and I wanted a increasing counter, but which is then "reset" for a given value of A, i.e.

SQL> SELECT * FROM T;

A
----------
X
X
Y
Y

4 rows selected.

SQL> SELECT
  2    A,
  3    ROWNUM,
  4    ROW_NUMBER() OVER (PARTITION BY A ORDER BY A) partitioned,
  5    ROW_NUMBER() OVER (ORDER BY A) non_partitioned
  6  FROM T
  7  /

A              ROWNUM PARTITIONED NON_PARTITIONED
---------- ---------- ----------- ---------------
X                   1           1               1
X                   2           2               2
Y                   3           1               3
Y                   4           2               4

4 rows selected.

Note, that a partition window over the entire result set will match ROWNUM.

So, when do you use ROWNUM vs ROW_NUMBER()?

here is how you decide :

If you need the TOP-N rows out of a set (eg: show me the highest paid person in the COMPANY), ROWNUM is the best thing to use.
If you need the TOP-N rows out of groups within a set (eg: show me the highest paid person in each department), ROW_NUMBER() (or dense_rank whatever) is the ONLY thing to use (ROWNUM cannot do it)

NTILE

It divides an ordered dataset into a number of buckets indicated by expr and assigns the appropriate bucket number to each row. The buckets are numbered 1 through expr, and expr must resolve to a positive constant for each partition.

The number of rows in the buckets can differ by at most 1.

The remainder values (the remainder of number of rows divided by buckets) are distributed 1 per bucket, starting with bucket 1. If expr is greater than the number of rows, a number of buckets equal to the number of rows will be filled, and the remaining buckets will be empty.

Syntax:
NTILE( expr ) [ OVER ( query_partition_clause ORDER_BY_clause ) ]

Let's say I have a table of data and I utilise a basic NTILE with 2 buckets on that set, I get :

SQL> SELECT a, NTILE(2) over ( order by a )
  2  FROM temp;

A          NTILE(2)OVER(ORDERBYA)
---------- ----------------------
1                               1
2                               1
3                               2
4                               2

We can see that 4 values have been spread over 2 "buckets".

SQL> SELECT a, NTILE(3) over ( order by a )
  2  from temp;

A          NTILE(3)OVER(ORDERBYA)
---------- ----------------------
1                               1
2                               1
3                               2
4                               3

4 rows selected.

*DEV NOTE*
This can be a bit confusing to get your head around, but fortunately there's an easy way to understand it. Take your result set, say 1,1,2,3,4,4,5. Now, if I apply an NTILE(n) on this, work out what the highest integer (say n=3) that will factor into the number of rows (x). In this case, x=7 and the highest integer that will factor into 7 three times is 2, since 2+2+2=6. Now, we have 1 remaining (7-6). This is now placed in the left (or top) most bucket, so we have the following :

1 1 2 3 4 4 5
1 1 1 2 2 3 3
    ^
    remainder value

And hence to prove it :

SQL> SELECT a, NTILE(3) OVER (ORDER BY a)
  2  FROM t;

         A NTILE(3)OVER(ORDERBYA)
---------- ----------------------
         1                      1
         1                      1
         2                      1
         3                      2
         4                      2
         4                      3
         5                      3

7 rows selected.

Note, that NTILE (in OLAP parlance) produces equi-height histograms. For equi-width histograms, use WIDTH_BUCKET.

WIDTH_BUCKET

The WIDTH_BUCKET function lets you construct equiwidth histograms, in which the histogram range is divided into intervals that have identical size. (Compare this function with NTILE, which creates equiheight histograms.)

Ideally each bucket is a "closed-open" interval of the real number line. For example, a bucket can be assigned to scores between 10.00 and 19.999... to indicate that 10 is included in the interval and 20 is excluded. This is sometimes denoted [10, 20).

Syntax

WIDTH_BUCKET(expr, min_value, max_value, num_of_buckets)

Example

SQL> SELECT a, WIDTH_BUCKET(a, 1, 21, 5)
  2  FROM t
  3  /

         A WIDTH_BUCKET(A,1,21,5)
---------- ----------------------
        10                      3
        10                      3
        20                      5
        15                      4
        15                      4

5 rows selected.

*DEV NOTE*
This can be just as confusing as NTILE to understand (probably because they do very similar things), but the way to think about this one is :

Say I have the above example. In t, a has values of 10,10,15,15,20, and I want to "split" these values over 5 buckets. Now, min_value is inclusive, but max_value is exclusive, so :

max_value (21) - min_value (1) = 20 / num_buckets (5) = 4.

Therefore, the range of values in each histogram is 4 (well, actually.. 3.999999...., since m -> n is exclusive of n)

Therefore,
  histogram 0 = < 1
  histogram 1 = 1 -> 4.9999...
  histogram 2 = 5 -> 8.9999...
  histogram 3 = 9 -> 12.9999...
  histogram 4 = 13 -> 16.9999...
  histogram 5 = 17 -> 20.9999...
  histogram 6 = >= 21

And, hence, value of 10 is in histogram (bucket) #3, 15 is in bucket #4 and 20 is in bucket #5.

RANK

It computes the rank of each row returned from a query with respect to the other rows returned by the query, based on the values of the value_exprs in the order_by_clause. Rows with equal values for the ranking criteria receive the same rank. Oracle then adds the number of tied rows to the tied rank to calculate the next rank. Therefore, the ranks may not be consecutive numbers (if you need an element of consecutivity, see DENSE_RANK).

Syntax
RANK() OVER ( query_partition_clause order_by_clause)

Example

SQL> SELECT deptno, ename, sal, comm, RANK() OVER (PARTITION BY deptno ORDER BY sal DESC, comm) as rk
  2  FROM emp;

DEPTNO     ENAME             SAL       COMM         RK
---------- ---------- ---------- ---------- ----------
        10 KING             5000                     1
        10 CLARK            2450                     2
        10 MILLER           1300                     3
        20 SCOTT            3000                     1
        20 FORD             3000                     1
        20 JONES            2975                     3
        20 ADAMS            1100                     4
        20 SMITH             800                     5
        30 BLAKE            2850                     1
        30 ALLEN            1600        300          2
        30 TURNER           1500          0          3
        30 WARD             1250        500          4
        30 MARTIN           1250       1400          5
        30 JAMES             950                     6

In the above query, In case of DEPTNO 20 both SCOTT and FORD have the same salary, So both are assigned rank 1. Resulting rank 2 is skipped and rank 3 is assigned to JONES.

DENSE_RANK

DENSE_RANK works in much the same way as RANK, but where RANK can (and will) return non-consecutive rankings, the rankings from DENSE_RANK will always be consecutive, i.e.

SQL> SELECT a, RANK() OVER (ORDER BY a), DENSE_RANK() OVER (ORDER BY a)
  2  FROM t;

         A RANK()OVER(ORDERBYA) DENSE_RANK()OVER(ORDERBYA)
---------- -------------------- --------------------------
        10                    1                          1
        20                    2                          2
        20                    2                          2
        30                    4                          3

PERCENT_RANK

PERCENT_RANK returns the percentage rank of a row relative to the set of values.

It is calculated as :(rank of row in partition - 1) / (number of rows in partition - 1)

Therefore, PERCENT_RANK returns values in the range of 0 - 1. Row(s) with a RANK of 1 will have a PERCENT_RANK of 0, i.e.

SQL> SELECT a, PERCENT_RANK() OVER ( ORDER BY a ), RANK() OVER (ORDER BY a)
  2  FROM t
  3  /

         A PERCENT_RANK()OVER(ORDERBYA) RANK()OVER(ORDERBYA)
---------- ---------------------------- --------------------
        10                            0                    1
        20                           .5                    2
        30                            1                    3

3 rows selected.

PERCENTILE_CONT

>From the Oracle 9.2 documentation :

PERCENTILE_CONT is an inverse distribution function that assumes a continuous distribution model. It takes a percentile value and a sort specification, and returns an interpolated value that would fall into that percentile value with respect to the sort specification. Nulls are ignored in the calculation.

Syntax
PERCENTILE_CONT ( x ) WITHIN GROUP ( order_by_clause )
                      [ OVER ([ query_partition_clause ]) ]

What this basically boils down to is that given a value (x), this function will work out (for the current partition window) what value would hypothetically fit into that percentile value, so for example, the following answers the question of what value would be at the 50th percentile in the following set :

SQL> SELECT a FROM t;

         A
----------
        10
        20
        30
        40

4 rows selected.

SQL> SELECT PERCENTILE_CONT(0.5) WITHIN GROUP ( ORDER BY a )
  2  FROM t
  3  /

PERCENTILE_CONT(0.5)WITHINGROUP(ORDERBYA)
-----------------------------------------
                                       25

1 row selected.

*IMPORTANT*
Because this is a statistical distribution, the "first" row over the ordered set is at 0th percentile, the "last" row is the 100th percentile, i.e.

SQL> SELECT PERCENTILE_CONT(0) WITHIN GROUP ( order by a )
  2  FROM t;

PERCENTILE_CONT(0)WITHINGROUP(ORDERBYA)
---------------------------------------
                                     10

1 row selected.

SQL> SELECT PERCENTILE_CONT(1) WITHIN GROUP ( order by a )
  2  FROM t
  3  /

PERCENTILE_CONT(1)WITHINGROUP(ORDERBYA)
---------------------------------------
                                     40

1 row selected.

LAG

LAG allows you to access information from prior rows in the result set WITHOUT having to perform a self-join. In combination with LEAD, it offers potential for MASSIVE performance improvements and functionality.

Syntax
LAG(<column> [, n]) OVER ( query_partition_clause ORDER BY clause)

n - number of rows to "lag" by. Defaults to 1.

Example

SQL> SELECT
  2    a,
  3    LAG(a) OVER (ORDER BY a)
  4  FROM
  5    ( SELECT 'x' a FROM dual
  6      UNION ALL
  7      SELECT 'y' a FROM dual
  8      UNION ALL
  9      SELECT 'z' a FROM dual
 10    )
 11  /

A L
- -
x
y x
z y

3 rows selected.

LEAD

LEAD allows you to access information from subsequent rows in the result set WITHOUT having to perform a self-join. In combination with LAG, it offers potential for MASSIVE performance improvements and functionality.

Syntax
LEAD(<column> [, n]) OVER ( query_partition_clause order_by_clause)

n - number of rows to "lead" by. Defaults to 1.

Example

SQL> SELECT
  2    a,
  3    LEAD(a) OVER (ORDER BY a)
  4  FROM
  5    ( SELECT 'x' a FROM dual
  6      UNION ALL
  7      SELECT 'y' a FROM dual
  8      UNION ALL
  9      SELECT 'z' a FROM dual
 10    )
 11  /

A L
- -
x y
y z
z

SUM, AVG, MIN, MAX, VARIANCE, STDDEV, COUNT

Everybody knows these functions within SQL, but they can now be used in combination with the windowing functionality (i.e. OVER) to achieve great functionality.

Syntax

function( [ALL | DISTINCT] expr ) OVER (query_partition_clause ORDER BY clause)

There good for an EXTREMELY wide variety of uses, but the following example shows the use of SUM() OVER (..) for computing running totals.

Example

-- This is the data set.

SQL> SELECT * FROM T;

A
----------
1
2
3
3

SQL> SELECT
  2    a,
  3    SUM(a) OVER (ORDER BY a, rowid)
  4  FROM t
  5  /

         A SUM(A)OVER(ORDERBYA)
---------- --------------------
         1                    1
         2                    3
         3                    6
         3                    9

3 rows selected.

Note, the use of ROWID to ensure the sum is worked out PER ROW (not PER a) (see below).

Note, if the analytic OVER expression does not describe a unique set of column(s), then the function will be applied over the ENTIRE set of matching rows, i.e. if the above example was ( ORDER BY a ), i.e. no ROWID, then the result is :

SQL> SELECT a, SUM(a) OVER (ORDER BY a)
  2  FROM t;

A          SUM(A)OVER(ORDERBYA)
---------- --------------------
1                             1
2                             3
3                             9   <-- Since, A has only 3 distinct values (and 4 rows), the SUM is the sum of ALL As
3                             9

*IMPORTANT*
The query_partition_clause has to be used if you want the values to be "reset" for a changing column, consider :

SQL> SELECT A FROM T
  2  ORDER BY A;

         A
----------
        10
        10
        10
        20
        30

If I want running totals but "reset" for every value of A, the PARTITION BY clause is used, i.e.

SQL> SELECT
  2    A,
  3    SUM(A) OVER (PARTITION BY A ORDER BY A,ROWID)
  4  FROM
  5    T;

         A SUM(A)OVER(PARTITIONBYAORDERBYA,ROWID)
---------- --------------------------------------
        10                                     10
        10                                     20
        10                                     30
        20                                     20
        30                                     30

KEEP

KEEP is not an analytical function in itself, it is an option to OLAP aggregate function(s), in much the same way as OVER, however, it is worth mentioning seperately, because it enables very complex logic to be encapsulated within a much simpler SQL statement. It allows you to instruct the query to consider only a subset of the data for consideration in the OLAP aggregate function.

*IMPORTANT*
Note, you can only use the KEEP syntax with the OLAP-enabled aggregate functions, i.e. SUM, AVG, MIN, MAX, VARIANCE, STDDEV, COUNT.

Here's an example. Let's say we have a table of employees.

SQL> SELECT * FROM t;

     EMPNO     DEPTNO     SALARY   COMM_PCT
---------- ---------- ---------- ----------
         1         10       1000         40
         2         10       2000         20
         3         10       5000         30
         4         20       4000         40
         5         20       8000         50
         6         20       8000         60

6 rows selected.

Now, we ask the question, show me the minimum salaries for each department :

SQL> SELECT
  2    deptno,
  3    MIN(salary),
  4  FROM t
  5  GROUP BY deptno
  6  /

    DEPTNO MIN(SALARY)
---------- -----------
        10        1000
        20        4000

But, now, lets say I wanted to ask the question, "show me the minimum salaries, BUT only out of those set of salaries which occur first when ranked by the commission percentage" using the KEEP syntax, it's relatively straightforward to do :

SQL> SELECT
  2    deptno,
  3    MIN(salary) KEEP (DENSE_RANK FIRST ORDER BY comm_pct)
  4  FROM t
  5  GROUP BY deptno
  6  /

    DEPTNO MIN(SALARY)KEEP(DENSE_RANKFIRSTORDERBYCOMM_PCT)
---------- -----------------------------------------------
        10                                            2000
        20                                            4000

So, now the result of department 10 changes. Only the rows where comm_pct ranked highest were taken, the following query shows the results of the ranking :

SQL> SELECT
  2    deptno,
  3    comm_pct,
  4    DENSE_RANK() OVER (PARTITION BY deptno ORDER BY comm_pct)
  5  FROM t;

    DEPTNO   COMM_PCT DENSE_RANK()OVER(PARTITIONBYDEPTNOORDERBYCOMM_PCT)
---------- ---------- --------------------------------------------------
        10         20                                                  1
        10         30                                                  2
        10         40                                                  3
        20         40                                                  1
        20         50                                                  2
        20         60                                                  3

6 rows selected.

As you can see, because the commision percentage column with a value of 20 is ranked highest, only this record is taken into account, and the MINIMUM of all records with a comm_pct = 20, is 2000.

Of course, you can specify LAST instead of FIRST, and show the lowest ranked records, i.e.

SQL> SELECT * FROM t;

     EMPNO     DEPTNO     SALARY   COMM_PCT
---------- ---------- ---------- ----------
         1         10       1000         40
         2         10       2000         20
         3         10       5000         30
         4         20       4000         40
         5         20       8000         50
         6         20       8000         60

6 rows selected.

SQL> SELECT
  2    deptno,
  3    MIN(salary) KEEP (DENSE_RANK LAST ORDER BY comm_pct)
  4  FROM t
  5  GROUP BY deptno
  6  /

    DEPTNO MIN(SALARY)KEEP(DENSE_RANKLASTORDERBYCOMM_PCT)
---------- ----------------------------------------------
        10                                           1000
        20                                           8000

*DEV NOTE*
Note, that according to the documentation http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96540/functions45a.htm, only DENSE_RANK can currently be used in the KEEP clause.

This is borne out by the error message when trying to use other functions, i.e.

SQL> SELECT
  2    deptno,
  3    MIN(salary) KEEP (RANK LAST ORDER BY comm_pct)
  4  FROM t
  5  GROUP BY deptno
  6  /
SELECT
*
ERROR at line 1:
ORA-02000: missing DENSE_RANK

OLAP Windows

By default, OLAP functions operate over the whole set (or partition window) defined in the OVER () clause. You can, however, specify that the operation only occurs over a "window" of data (different than the partition window).

ROWS

ROWS can be used to specify how many rows "backwards" and "forwards" through the current set, i.e. a "physical" offset.

The following SQL asks the question, "how many rows exist from 1 row back from the current row to the end of the set?".

SQL> SELECT COUNT(a) OVER (ORDER BY a
  2                         ROWS BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING)
  3  FROM mc1;

COUNT(X_Y)OVER(ORDERBYX_YROWSBETWEEN1PRECEDINGANDUNBOUNDEDFOLLOWING)
--------------------------------------------------------------------
                                                                   3
                                                                   3
                                                                   2
3 rows selected.

SQL> SELECT COUNT(a)  OVER (ORDER BY a
  2                          ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING)
  3  FROM mc1;

COUNT(A)OVER(ORDERBYAROWSBETWEEN2PRECEDINGAND1FOLLOWING)
--------------------------------------------------------
                                                       2
                                                       3
                                                       3

3 rows selected.

You can use CURRENT ROW to specify that the current row defines the end of the set.

SQL> SELECT COUNT(a) OVER (ORDER BY a
  2                         ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
  3  FROM mc1;

COUNT(A)OVER(ORDERBYAROWSBETWEEN1PRECEDINGANDCURRENTROW)
--------------------------------------------------------
                                                       1
                                                       2
                                                       2

3 rows selected.

*IMPORTANT*

If you omit the windowing clause from any OLAP function, the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This is important to note when using a function such as LAST_VALUE which need to operate not just up to the CURRENT ROW, but actually UNBOUNDED FOLLOWING. For further information, see LAST_VALUE.

RANGE

RANGE can be used to specify the set of rows which are within 1 "value" of the value of the current row. Obviously, this can only be used on numeric or date datatypes.

This SQL asks the question, "how many rows are within 1 of the current value going back, and within 1 of the current value going forward?"

SQL> SELECT a, COUNT(a) OVER (ORDER BY a
  2                           RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING)
  3   FROM mc1;

         A COUNT(A)OVER(ORDERBYARANGEBETWEEN1PRECEDINGAND1FOLLOWING)
---------- ---------------------------------------------------------
         9                                                         2
        10                                                         3
        11                                                         3
        12                                                         2
        20                                                         1

5 rows selected.

The following SQL uses CURRENT ROW to ask "how many rows are within 1 of the current value going backwards?"

SQL> SELECT a, COUNT(a) OVER (ORDER BY a
  2                           RANGE BETWEEN 1 PRECEDING AND CURRENT ROW)
  3  FROM mc1;

         A COUNT(A)OVER(ORDERBYARANGEBETWEEN1PRECEDINGANDCURRENTROW)
---------- ---------------------------------------------------------
         9                                                         1
        10                                                         2
        11                                                         2
        12                                                         2
        20                                                         1

5 rows selected.

Using RANGE on DATE datatypes treats it in DAYS, by default.

SQL> SELECT c, COUNT(c) OVER (ORDER BY c
  2                           RANGE BETWEEN 1 PRECEDING
  3                                     AND CURRENT ROW) range_count
  4  FROM mc1;

C         RANGE_COUNT
--------- -----------
15-OCT-04           1
16-OCT-04           2
17-OCT-04           2

3 rows selected.

In order to get a more granular level of time, you need to use the INTERVAL syntax.

For example, here is how to go one hour or one second back :

SQL> SELECT c, COUNT(c) OVER (ORDER BY c
  2                           RANGE BETWEEN INTERVAL '1' HOUR PRECEDING
  3                                     AND CURRENT ROW) range_count
  4  FROM mc1;

C         RANGE_COUNT
--------- -----------
15-OCT-04           1
16-OCT-04           1
17-OCT-04           1

SQL> SELECT c, COUNT(c) OVER (ORDER BY c
  2                           RANGE BETWEEN INTERVAL '1' SECOND PRECEDING
  3                                     AND CURRENT ROW) range_count
  4  from mc1;

C         RANGE_COUNT
--------- -----------
15-OCT-04           1
16-OCT-04           1
17-OCT-04           1

3 rows selected.

Note, the INTERVAL syntax is quite extensive, you can for example specify 4 years and 2 months, by specifying INTERVAL '4-2' YEAR TO MONTH, or 11 hours and 20 minutes via INTERVAL '11:20' HOUR TO MINUTE.

*IMPORTANT*
Note, most of the INTERVAL types (DAY, MONTH, YEAR etc.) have an optional precision which defaults to 2. You have to manual specify this if greater than 2, i.e. 100 years has to be specified as :

SQL> SELECT c, COUNT(c) OVER (ORDER BY c
  2                           RANGE BETWEEN INTERVAL '100' YEAR(3) PRECEDING
  3                                     AND CURRENT ROW)
  4  FROM mc1;

Trying to omit the precision results in ORA-1873.

SQL> SELECT c, COUNT(c) OVER (ORDER BY c
  2                           RANGE BETWEEN INTERVAL '100' YEAR PRECEDING
  3                                      AND CURRENT ROW)
  4  FROM mc1;

SELECT c, COUNT(c) OVER (ORDER BY c RANGE BETWEEN INTERVAL '100' YEAR PRECEDING
                           AND CURRENT ROW)
                                                           *
ERROR at line 1:
ORA-01873: the leading precision of the interval is too small

See http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96540/sql_elements3a.htm#38599 for further information on INTERVAL literals.