Cardinality


Introduction

One of the most important things to look at when trying to determine the reasons behind a particular access path is the value of the estimated cardinality value in the CBO execution plan (for more information on the various methods for obtaining the execution path of a given SQL query, see
SQL tracing).

You can see this in a few places, but probably the easiest is via an EXPLAIN PLAN or AUTOTRACE output. In 9i and below, this is shown as the "card" column in the explain plan output, i.e.
SQL> SELECT /*+ FIRST_ROWS */ a FROM t;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=2 Card=82 Bytes=574)
   1    0   TABLE ACCESS (FULL) OF 'T1' (Cost=2 Card=82 Bytes=574)
At 10g, it's been renamed to "Rows" :
SQL> SELECT * FROM t;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    82 |   574 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    |    82 |   574 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------
This is the number of rows that Oracle expects that step in the plan to evaluate. For a stricter definition of cardinality, see this Wikipedia entry, or see this AskTom thread.

Trying to get these values as close to "reality" as possible means that Oracle can make better decisions when coming up with an access path.

Default cardinality for database objects

The following table demonstrates the estimated cardinalities (using a 8K blocksize) of various objects which have had no statistics generated for them :

Object Type Estimated Cardinality
Heap Table 82
Global Temporary Table 8168
Index-Organized Table 1
System Generated Materialized View
(such as the output of the TABLE operator)
8168

Showing this is relatively easy, although, you have to be careful when dealing with databases such as 10g, in that the default value of an initialisation parameter, optimizer_dynamic_sampling, is 2 which means that Oracle will effectively analyze the table before selecting from it, causing the "correct" values to be shown, instead of the values displayed here. To replicate the "lack of statistics" testing, ensure that :
SQL> alter session set optimizer_dynamic_sampling = 0;

Session altered.
Example 1 : A heap table (and the FIRST_ROWS hint to force the CBO, although you won't need this at 10g) :
SQL> SELECT /*+ FIRST_ROWS */ a FROM t1;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=2 Card=82 Bytes=574)
   1    0   TABLE ACCESS (FULL) OF 'T1' (Cost=2 Card=82 Bytes=574)
or, at 10g :
SQL> SELECT * FROM t;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    82 |   574 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    |    82 |   574 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------
Example 2 : an index-organised table (IOT) (note, use of IOTs automatically forces the CBO, so no need for any hint) :
SQL> select * from iot;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=1 Bytes=7)
   1    0   INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_62307' (UNIQUE) (Cost=5 Card=1 Bytes=7)
and, again at 10g :
SQL> select * from iot;

Execution Plan
----------------------------------------------------------
Plan hash value: 977529495

------------------------------------------------------------------------------------------
| Id  | Operation            | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                   |     1 |     7 |     9   (0)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| SYS_IOT_TOP_13657 |     1 |     7 |     9   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Cardinality mathematics

For table-type objects (i.e. heap tables, GTTs, system materialized views, etc.) without statistics, the value is calculated via the following formula :
Estimated Cardinality = CEIL(( Number of Blocks * ( Block Size - Cache Layer )) / Average Row Length )
This formula looks like a bizarre way of working out cardinality, but think about it. The formula for working out the size of a table is :
Table Size = Number of rows * Row size
A bit of algebra later gives us that
Number of rows (cardinality) = Table size / Row size
But since not every row is the same size, we can only work out the average row size, making the number of rows an approximation, therefore :
Estimated Cardinality = Size of table in bytes / Average size of a row in bytes.
The "size of the table" is the number of blocks used by the table multiplied by the size of a block (i.e. the block size minus any "block overhead" (known as the "cache layer").

Since we are dealing with no statistics here, Oracle has a series of default values for the terms in this formula, which are Therefore, an example in a 8K block size, for a heap table with no statistics is : Estimated cardinality = 1 * ( 8192 - 24 ) / 100 = 81.68 = 82
A collection used within a SQL statement via the TABLE operator will assume to have Estimated cardinality = 100 * ( 8192 - 24 ) / 100 = 8168

Influencing cardinality

There are many ways of influencing the cardinality, from generating the correct statistics in the first place to using the
CARDINALITY hint.

You can also utilise rownum, in certain circumstances, which *can* instruct Oracle of the cardinality you require, i.e.
SQL> SELECT /*+ FIRST_ROWS */ a
  2    FROM t1
  3   WHERE rownum <= 4;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=2 Card=4 Bytes=28)
   1    0   COUNT (STOPKEY)
   2    1     TABLE ACCESS (FULL) OF 'T1' (Cost=2 Card=82 Bytes=574)
Which, although doesn't change the cardinality of the table scan step, DOES affect the cardinality of the SQL statement.

Useful links