Colocation


Colocation

Colocation is a fancy name for making regularly ordered data physically located next to each other on the block. Having a good colocation policy can significantly reduce the amount of Logical IOs (LIOs), i.e. accesses into the buffer cache, for a given query.

There are many mechanisms for acheiving this, which one you choose depends on the majority of queries. If most of the queries in the system are join queries, then it may be a good candidate for
Standard Clusters, but this may negatively impact queries on the individual tables. Tables which are commonly accessed singly by column(s) may be good candidates for Hash Clusters, in which case, an index is not required, the "data is the index", which can benefit primary key lookups, since there's no index I/O, just a hash function lookup. As always, benchmark, since Oracle may not be able to take advantage of a cluster to do things that indexes can, such as fast full scans if the query requires a range of values, etc. etc.

However, for situations where clusters are not appropriate, and dependent on the type of query, it may benefit from having the data sorted upon insert, especially when it comes to index lookup-type operations.

Consider the following example :
SQL> CREATE TABLE temp_disorganised ( a NUMBER, b VARCHAR2(10) );

Table created.

SQL> INSERT INTO temp_disorganised
  2  SELECT
  3    dbms_random.random,
  4    SUBSTR(dbms_random.random, 1, 10)
  5  FROM all_objects;

49755 rows created.

SQL> CREATE INDEX temp_dis_ind ON temp_disorganised ( a );

Index created.

SQL> EXEC dbms_stats.gather_table_stats(user, 'TEMP_DISORGANISED');

PL/SQL procedure successfully completed.
Now, we'll issue a query which forces a index -> table lookup (i.e. select something not in the index). Note, I am using the INDEX hint here to force the use of the index, and 10g output of autotrace trimmed for brevity :
SQL> SELECT /*+ INDEX(temp_disorganised temp_dis_ind) */
  2    COUNT(b)
  3  FROM temp_disorganised
  4  WHERE a > 0
  5  /

  COUNT(B)
----------
     24915

--------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |     1 |    18 | 24830   (1)| 00:04:58 |
|   1 |  SORT AGGREGATE              |                   |     1 |    18 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEMP_DISORGANISED | 24880 |   437K| 24830   (1)| 00:04:58 |
|*  3 |    INDEX RANGE SCAN          | TEMP_DIS_IND      | 24880 |       |    67   (2)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      24820  consistent gets
          0  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
Okay, we have a large amount of consistent gets (logical IOs), this is due to the data being spread all around the blocks involved.

Now, let's try the same thing but with ordered data :
SQL> CREATE TABLE temp_sorted AS SELECT a, b FROM temp_disorganised ORDER BY a;

Table created.

SQL> CREATE INDEX temp_sort_ind ON temp_sorted ( a );

Index created.

SQL> SELECT /*+ INDEX(temp_sorted temp_sort_ind) */
  2    COUNT(b)
  3  FROM temp_sorted
  4  WHERE a > 0;

  COUNT(B)
----------
     24915

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |     1 |    20 |   139   (2)| 00:00:02 |
|   1 |  SORT AGGREGATE              |               |     1 |    20 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEMP_SORTED   | 20602 |   402K|   139   (2)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | TEMP_SORT_IND | 20602 |       |    66   (2)| 00:00:01 |
----------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        143  consistent gets
          0  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
Now, significantly less consistent gets, since the data is colocated with a correlation to the index, and the block array fetching of oracle now gets most of its data in less block "hits".

*IMPORTANT*
When doing index lookup operations, the number of logical I/Os (i.e. "probes" into the buffer cache) will be materially affected by two factors :

Array Size

The concept of an index lookup operation i) getting a ROWID from the index, ii) probing the block, iii) getting the next ROWID from the index etc. assumes an arraysize of 1. If arraysize is 2, then *effectively* 2 ROWIDs will be read from the index, then 1 logical IO will be incurred getting the block (assuming the two rows are on the same block, of course). In this way, proper array sizing can significantly reduce the number of logical IOs, here are the results of a previous query showing this.
SQL> set arraysize 1
SQL> SELECT /*+ INDEX(temp_disorganised temp_dis_ind) */
  2    b
  3  FROM temp_disorganised
  4  WHERE a > 0;

24915 rows selected.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      37326  consistent gets
          0  physical reads
          0  redo size
    2184445  bytes sent via SQL*Net to client
     137408  bytes received via SQL*Net from client
      12459  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      24915  rows processed

SQL> set arraysize 10
SQL> SELECT /*+ INDEX(temp_disorganised temp_dis_ind) */
  2    b
  3  FROM temp_disorganised
  4  WHERE a > 0;

24915 rows selected.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      27320  consistent gets
          0  physical reads
          0  redo size
     888899  bytes sent via SQL*Net to client
      27782  bytes received via SQL*Net from client
       2493  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      24915  rows processed
Of course, you do get diminishing returns after a certain "sweet spot" :
SQL> set arraysize 5000
SQL> SELECT /*+ INDEX(temp_disorganised temp_dis_ind) */
  2    b
  3  FROM temp_disorganised
  4  WHERE a > 0;

24915 rows selected.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      24824  consistent gets
          0  physical reads
          0  redo size
     565589  bytes sent via SQL*Net to client
        425  bytes received via SQL*Net from client
          6  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      24915  rows processed
Note, that I am no longer SELECTing COUNT(b), but am rather selecting just "b". This is because, the arraysize affects the number of rows returned to the "client". If I selected COUNT(b), then the "client" (in this case, SQL*Plus) would only ever get 1 row, and hence arraysize would not affect the results, i.e.
SQL> SET ARRAYSIZE 100
SQL> SELECT /*+ INDEX(temp_disorganised temp_dis_ind) */
  2    COUNT(b)
  3  FROM temp_disorganised
  4  WHERE a > 0;

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      24820  consistent gets
          0  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> SET ARRAYSIZE 5000
SQL> SELECT /*+ INDEX(temp_disorganised temp_dis_ind) */
  2    COUNT(b)
  3  FROM temp_disorganised
  4  WHERE a > 0;

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      24820  consistent gets
          0  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
Again, it's down to benchmarking which value suits your query the best.

Clustering Factor

The clustering factor of an index, described in great detail in
Metalink Note : 39836.1, basically, measures the correlation between the "ordered-ness" of your table data relative to your index. If there's a high correlation (i.e. the clustering factor is close to the number of blocks in the table) then the index is well ordered with respect to the table data. In this case, less block reads are likely to occur. For a brilliant (and easily accessible) page on clustering factors, see Howard Rogers' page. Just for interest, however, you can see the difference that having sorted data makes by looking at the clustering factors of the two indexes created in the previous examples :
SQL> SELECT table_name, num_rows, blocks
  2    FROM user_tables
  3   WHERE table_name IN ( 'TEMP_DISORGANISED', 'TEMP_SORTED' );

TABLE_NAME                       NUM_ROWS     BLOCKS
------------------------------ ---------- ----------
TEMP_DISORGANISED                   49755        180
TEMP_SORTED                         49755        172

SQL> SELECT index_name, clustering_factor
  2    FROM user_indexes
  3   WHERE index_name IN ( 'TEMP_SORT_IND', 'TEMP_DIS_IND' );

INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ -----------------
TEMP_DIS_IND                               49455
TEMP_SORT_IND                                160
Here, the TEMP_SORT_IND index clustering factor is MUCH closer to the number of blocks in the table, so you would expect significantly less block reads (which, of course, explains the huge difference in logical IO between the sorted and un-sorted data queries).

It should be obvious that, since the clustering factor is a measure of the number of blocks that Oracle expects to visit in any one operation, then the decision on whether the optimiser should use an index is NOT (as most people think) based on a certain proportion of the rows which will be selected, but is based on a certain proportion of blocks (which the clustering factor is used to try and estimate).

Table Partitioning

Table Partitioning can be a great tool for minimising the amount of data that a given query has to go through in order to resolve itself. It is especially useful on larger tables, but you have to be aware that the creation of a partition has to be done using a partition key, i.e. an identifier that is used to partition the data.

All mechanisms of table partitioning have three common goals : There are three main types, List, Range and Hash.

List Partitioning

List partitioning is a mechanism which was introduced at 9i, and is amazing why it took Oracle so long to introduce such a great feature, especially if you consider that partitioning was introduced at Oracle8.

The idea is simple, you specify that if the value of a column is in a set of values it is put into a certain partition, i.e.
SQL> CREATE TABLE t ( a VARCHAR2(10) )
  2  PARTITION BY LIST (a)
  3  (
  4    PARTITION a1 VALUES ('X'),
  5    PARTITION a2 VALUES ('Y')
  6  )
  7  /

Table created.

You specify the DEFAULT option for the ELSE clause, i.e.

SQL> CREATE TABLE t ( a VARCHAR2(10) )
  2  PARTITION BY LIST (a)
  3  (
  4    PARTITION t1 VALUES (50),
  5    PARTITION t2 VALUES (DEFAULT)
  6  )
  7  /

Table created.
Note, that trying to insert a value which does not map to a partition results in error, i.e.
SQL> INSERT INTO T VALUES ('Z');
INSERT INTO T VALUES ('Z')
            *
ERROR at line 1:
ORA-14400: inserted partition key does not map to any partition
*IMPORTANT*
NULL values will be placed into the DEFAULT partition for list partitions.

Range Partitioning

Range Partitioning allows you to specify partitions based on a range of data, i.e.
CREATE TABLE t
(
  t_id   INT
)
PARTITION BY RANGE (t_id)
(
  PARTITION t1 VALUES LESS THAN (500),
  PARTITION t2 VALUES LESS THAN (1000),
  PARTITION t3 VALUES LESS THAN (1500)
);
Note, the partition HAS to use the VALUES LESS THAN syntax, you cannot use GREATER THAN etc.

You can also use the MAXVALUE option to specify, effectively, an ELSE clause, i.e.
SQL> CREATE TABLE t ( a VARCHAR2(10) )
  2  PARTITION BY RANGE (a)
  3  (
  4    PARTITION t1 VALUES LESS THAN (10),
  5    PARTITION t2 VALUES LESS THAN (MAXVALUE)
  6  )
  7  /

Table created.
*IMPORTANT*
NULL values will be put into the MAXVALUE partition for range partitioning.

Hash Partitioning

HASH partitions basically apply a function to a partition key and then partitions by the output of that. You specify how many partition hash "buckets" to partition over, which will then assign all rows into one of n partitions based on the output of this hash function.

These are incredibly useful for unique identifiers etc, or where you don't have a range that you can easily derive.
SQL> CREATE TABLE t ( a VARCHAR2(10) )
  2  PARTITION BY HASH (a) PARTITIONS 8;

Table created.
*IMPORTANT*
It is "undefined" into which partition NULL values are placed.

Sub-Partitioning

Sub-partitions can be created where you may want to sub-partition an existing partition, i.e.
/* An example of RANGE - HASH "composite" partitioning */

SQL> CREATE TABLE t ( a VARCHAR2(10) )
  2  PARTITION BY RANGE (a)
  3  SUBPARTITION BY HASH(a) SUBPARTITIONS 8
  4  (
  5    PARTITION t1 VALUES LESS THAN (50)
  6  )
  7  /

Table created.

/* An example of RANGE - LIST composite partitioning */

SQL> CREATE TABLE t ( a VARCHAR2(10) )
  2  PARTITION BY RANGE (a)
  3  SUBPARTITION BY LIST (a)
  4  (
  5    PARTITION t1 VALUES LESS THAN (50)
  6    (
  7      SUBPARTITION s1 VALUES (10)
  8    )
  9  )
 10  /

Table created.

Selecting from a Partition

You can specify an individual partition to SELECT from, via the PARTITION option of the table clause, i.e.
SQL> CREATE TABLE t ( a NUMBER )
  2  PARTITION BY RANGE(a)
  3  (
  4    partition p1 VALUES LESS THAN (50),
  5    partition p2 VALUES LESS THAN (100)
  6  );

Table created.

SQL> insert into t VALUES (10);

1 row created.

SQL> insert into t VALUES (70);

1 row created.

SQL> SELECT * FROM t PARTITION (p1);

         A
----------
        10

SQL> SELECT * FROM t PARTITION (p2);

         A
----------
        70

Exchanging Partitions

Believe it or not, there is an argument which suggests that all tables should have at least one defined partition! Why, you may ask? Well, it's to do with how efficient it is to exchange partitions, and thus move the data from one table to another in bulk.

Here is an example to show an efficient mechanism of moving data from three tables (t1, t2, t3) into one partitioned table (T).

First of all, create the three tables that you want to move into the three partitions of a single table. In most systems, these will probably already exist in one form or another :
SQL> CREATE TABLE t1 AS SELECT sysdate dt, all_objects.* FROM all_objects;

Table created.

SQL> CREATE TABLE t2 AS SELECT ADD_MONTHS(sysdate,-12) dt, all_objects.* FROM all_objects;

Table created.

SQL> CREATE TABLE t3 AS SELECT ADD_MONTHS(sysdate,-24) dt, all_objects.* FROM all_objects;

Table created.
Now, create the partitioned table with the partitions for loading the three tables into (blank for now).
SQL> CREATE TABLE t
  2    PARTITION BY RANGE(dt) (
  3      PARTITION part2000 VALUES LESS THAN ( TO_DATE( '01-jan-2001', 'dd-mon-yyyy') ),
  4      PARTITION part2001 VALUES LESS THAN ( TO_DATE( '01-jan-2002', 'dd-mon-yyyy') ),
  5      PARTITION part2002 VALUES LESS THAN ( TO_DATE( '01-jan-2003', 'dd-mon-yyyy') )
  6    )
  7  AS
  8  SELECT sysdate dt, all_objects.* FROM all_objects WHERE 1=0;

Table created.
Now, exchange each of the relevant tables into the relevant partitions of t. Note, WITHOUT VALIDATION is used here to specify that Oracle does not have to validate that each row belongs in the relevant partition, you can specify WITH VALIDATION to instruct Oracle to validate the data.
SQL> ALTER TABLE t
  2  EXCHANGE PARTITION part2000
  3  WITH TABLE t3
  4  WITHOUT VALIDATION
  5  /

Table altered.

SQL> ALTER TABLE t
  2  EXCHANGE PARTITION part2001
  3  WITH TABLE t2
  4  WITHOUT VALIDATION
  5  /

Table altered.

SQL> ALTER TABLE t
  2  EXCHANGE PARTITION part2003
  3  WITH TABLE t1
  4  WITHOUT VALIDATION
  5  /

Table altered.

SQL> SELECT COUNT(*) FROM t;

  COUNT(*)
----------
     70848

SQL> SELECT COUNT(*) FROM t1;

  COUNT(*)
----------
         0
A partition exchange is simply an exchange of segments, no actual movement of data takes place. The change is purely in the data dictionary. This facilitates movement of large amounts of data very quickly.

*IMPORTANT*
Note, any global indexes (see below) defined on the table (t) will become unusable after this type of operation. To ensure that the global indexes are maintained during this operation, you need to use the UPDATE GLOBAL INDEXES option of the ALTER TABLE command, i.e.
ALTER TABLE t
EXCHANGE PARTITION p1
WITH TABLE t1
UPDATE GLOBAL INDEXES  <-- ensures that the global indexes are USABLE after the partition exchange
PARALLEL (DEGREE 4);   <-- updates the global indexes in parallel (if appropriate)

Local and Global Indexes

When dealing with partitions, you quickly come across the problem of how indexing strategies will be affected.

Consider a table, t, with 2 partitions. Each partition is, effectively, it's own segment, so what you have can be thought of as two tables, potentially in two seperate tablespaces. So, is a "standard" index effective on it? Possibly, possibly not, it completely depends on what actions occur upon the table. Oracle provides two mechanisms for indexing partitions, Global and Local Indexes.

Global Indexes can be thought of as a "standard" index with entries for rows in every partition, i.e. a single index covering all partitions.
Local Indexes can be thought of as many "mini"-indexes with rows for every partition (and only one partition), i.e. an Index "local" to the partition.

So, which one to use? Well, it depends on the actions on the table, the type of database etc. Most explanations compare the difference between the two "types" of database, i.e. OLTP and Data Warehouse / DSS. I'm going to be no different.

A data warehouse is characterised by : In these types of systems, partitioning can be very beneficial, when done correctly, since the amount of data that needs to be "scanned" in the sliding window, is now a much smaller subset of data within a few partitions. This is called partition elimination. However, global indexes are usually not appropriate for data warehouses, for the simple reason that partition-level operations are common, and partition maintenance invalidates global indexes. You have to rebuild them often. So, here, local indexes make more sense.

OLTP systems, on the other hand, are characterised by So, here, global indexes make sense, for the simple reason that data can only be partitioned by a single set of columns, but the operations that occur are in multiple different ways, not necessarily via the partition key. So, using local indexes on a non-partition key, will involve having to search "n" local indexes (unless, of course, you search on the partition key). Using global indexes will mean a scan of a single (albeit larger) index, but the searches on the small subset of data can be much more efficient, partition elimination therefore, effectively, occurs via the index.

Clusters

All clusters physically colocate data on disk. This can benefit certain types of queries.

Standard (B*Tree) Clusters

Standard Clusters are a mechanism for grouping common data physically on disk. This can greatly benefit systems where most queries have a common set of join columns, but there may be performance drawbacks on queries which access the tables seperately.

Note, though, that the use of so-called "single table" clusters, can be extremely beneficial to some queries, since data will be co-located on disk via the cluster key. This is subtly different than doing the same thing but via, say, an IOT, since there is no ordering in a cluster, only grouping by the cluster key.

They utilise B*Tree indexes to physically colocate the data.
SQL> CREATE CLUSTER a_b_cluster ( id NUMBER );

Cluster created.

SQL> CREATE TABLE a ( id NUMBER, col1  VARCHAR2(10) )
  2  CLUSTER a_b_cluster ( id );

Table created.

SQL> CREATE TABLE b ( id NUMBER, col2  VARCHAR2(10) )
  2  CLUSTER a_b_cluster ( id );

Table created.

/* The Cluster Index is REQUIRED before operations can be done on the cluster */

SQL> CREATE INDEX a_b_cluster_index ON CLUSTER a_b_cluster;

Index created.

SQL> SELECT 1
  2  FROM
  3    a,b
  4  WHERE a.id = b.id;

no rows selected
The following example EXPLAIN PLAN shows the operations typical when dealing with clusters :
SQL> EXPLAIN PLAN FOR
  2  SELECT a.id
  3  FROM a,b
  4  WHERE a.id = b.id
  5  /

Explained.

SQL> SELECT * FROM TABLE(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    26 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS         |      |     1 |    26 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL   | A    |     1 |    13 |     2   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS CLUSTER| B    |     1 |    13 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("A"."ID"="B"."ID")
The previous example shows tables with common column names, i.e. "id", but this does not have to be the case, they just, obviously, have to be the same datatype :
SQL> CREATE CLUSTER a_b_cluster ( id NUMBER );

Cluster created.

SQL> CREATE TABLE a ( id2  NUMBER )
  2  CLUSTER a_b_cluster (id2);

Table created.
Clusters are useful when you want data with the same cluster key values to be physically stored near each other.

Hash Clusters

Hash clusters are useful when you always access the data by primary key. For example, say you have some data in a table T and you ALWAYS query:
select * from T where id = :x;
That might be a good candidate for a hash cluster since there is no index needed. Oracle will hash the value of :x into a physical address and go right to the data. No index range scan, just a table access by the hash value.

Hash clusters differ from normal clusters in that they cluster the data via a hash function (which can be specified manually, with the HASH IS clause of the CREATE CLUSTER statement).
SQL> CREATE CLUSTER a_b_cluster ( id  NUMBER )
  2  HASHKEYS 500;

Cluster created.

SQL> CREATE TABLE a ( id NUMBER )
  2  CLUSTER a_b_cluster ( id );

Table created.

SQL> EXPLAIN PLAN FOR
  2  SELECT * FROM a WHERE id = 10;

Explained.

SQL> SELECT * FROM TABLE(dbms_xplan.display);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------

---------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    13 |     0   (0)|
|*  1 |  TABLE ACCESS HASH| A    |     1 |    13 |            |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ID"=10)