Colocation
Colocation
Colocation is a fancy name for making regularly ordered data physically located next to each other on the block.
Having a good colocation policy can significantly reduce the amount of Logical IOs (LIOs), i.e. accesses into the buffer cache, for a given query.
There are many mechanisms for acheiving this, which one you choose depends on the majority of queries.
If most of the queries in the system are join queries, then it may be a good candidate for
Standard Clusters, but this may negatively impact
queries on the individual tables. Tables which are commonly accessed singly by column(s) may be good
candidates for Hash Clusters, in which case, an index is
not required, the "data is the index", which can benefit primary key lookups, since there's no index
I/O, just a hash function lookup. As always, benchmark, since Oracle may not be able to take advantage
of a cluster to do things that indexes can, such as fast full scans if the query requires a
range of values, etc. etc.
However, for situations where clusters are not appropriate, and dependent on the type of query, it may
benefit from having the data sorted upon insert, especially when it comes to index lookup-type
operations.
Consider the following example :
SQL> CREATE TABLE temp_disorganised ( a NUMBER, b VARCHAR2(10) );
Table created.
SQL> INSERT INTO temp_disorganised
2 SELECT
3 dbms_random.random,
4 SUBSTR(dbms_random.random, 1, 10)
5 FROM all_objects;
49755 rows created.
SQL> CREATE INDEX temp_dis_ind ON temp_disorganised ( a );
Index created.
SQL> EXEC dbms_stats.gather_table_stats(user, 'TEMP_DISORGANISED');
PL/SQL procedure successfully completed.
Now, we'll issue a query which forces a index -> table lookup (i.e. select something not in the index).
Note, I am using the INDEX hint here to force the use of the index, and 10g output of autotrace trimmed
for brevity :
SQL> SELECT /*+ INDEX(temp_disorganised temp_dis_ind) */
2 COUNT(b)
3 FROM temp_disorganised
4 WHERE a > 0
5 /
COUNT(B)
----------
24915
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 18 | 24830 (1)| 00:04:58 |
| 1 | SORT AGGREGATE | | 1 | 18 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| TEMP_DISORGANISED | 24880 | 437K| 24830 (1)| 00:04:58 |
|* 3 | INDEX RANGE SCAN | TEMP_DIS_IND | 24880 | | 67 (2)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
24820 consistent gets
0 physical reads
0 redo size
413 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Okay, we have a large amount of consistent gets (logical IOs), this is due to the data being spread
all around the blocks involved.
Now, let's try the same thing but with ordered data :
SQL> CREATE TABLE temp_sorted AS SELECT a, b FROM temp_disorganised ORDER BY a;
Table created.
SQL> CREATE INDEX temp_sort_ind ON temp_sorted ( a );
Index created.
SQL> SELECT /*+ INDEX(temp_sorted temp_sort_ind) */
2 COUNT(b)
3 FROM temp_sorted
4 WHERE a > 0;
COUNT(B)
----------
24915
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 139 (2)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 20 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| TEMP_SORTED | 20602 | 402K| 139 (2)| 00:00:02 |
|* 3 | INDEX RANGE SCAN | TEMP_SORT_IND | 20602 | | 66 (2)| 00:00:01 |
----------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
143 consistent gets
0 physical reads
0 redo size
413 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Now, significantly less consistent gets, since the data is colocated with a correlation to the
index, and the block array fetching of oracle now gets most of its data in less block "hits".
*IMPORTANT*
When doing index lookup operations, the number of logical I/Os (i.e. "probes" into the buffer cache)
will be materially affected by two factors :
Array Size
The concept of an index lookup operation i) getting a ROWID from the index, ii) probing the block,
iii) getting the next ROWID from the index etc. assumes an arraysize of 1. If arraysize is 2, then
*effectively* 2 ROWIDs will be read from the index, then 1 logical IO will be
incurred getting the block (assuming the two rows are on the same block, of course).
In this way, proper array sizing can significantly reduce the number of logical IOs,
here are the results of a previous query showing this.
SQL> set arraysize 1
SQL> SELECT /*+ INDEX(temp_disorganised temp_dis_ind) */
2 b
3 FROM temp_disorganised
4 WHERE a > 0;
24915 rows selected.
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
37326 consistent gets
0 physical reads
0 redo size
2184445 bytes sent via SQL*Net to client
137408 bytes received via SQL*Net from client
12459 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
24915 rows processed
SQL> set arraysize 10
SQL> SELECT /*+ INDEX(temp_disorganised temp_dis_ind) */
2 b
3 FROM temp_disorganised
4 WHERE a > 0;
24915 rows selected.
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
27320 consistent gets
0 physical reads
0 redo size
888899 bytes sent via SQL*Net to client
27782 bytes received via SQL*Net from client
2493 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
24915 rows processed
Of course, you do get diminishing returns after a certain "sweet spot" :
SQL> set arraysize 5000
SQL> SELECT /*+ INDEX(temp_disorganised temp_dis_ind) */
2 b
3 FROM temp_disorganised
4 WHERE a > 0;
24915 rows selected.
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
24824 consistent gets
0 physical reads
0 redo size
565589 bytes sent via SQL*Net to client
425 bytes received via SQL*Net from client
6 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
24915 rows processed
Note, that I am no longer SELECTing COUNT(b), but am rather selecting just "b". This is because, the
arraysize affects the number of rows returned to the "client". If I selected COUNT(b), then
the "client" (in this case, SQL*Plus) would only ever get 1 row, and hence arraysize would not
affect the results, i.e.
SQL> SET ARRAYSIZE 100
SQL> SELECT /*+ INDEX(temp_disorganised temp_dis_ind) */
2 COUNT(b)
3 FROM temp_disorganised
4 WHERE a > 0;
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
24820 consistent gets
0 physical reads
0 redo size
413 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> SET ARRAYSIZE 5000
SQL> SELECT /*+ INDEX(temp_disorganised temp_dis_ind) */
2 COUNT(b)
3 FROM temp_disorganised
4 WHERE a > 0;
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
24820 consistent gets
0 physical reads
0 redo size
413 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Again, it's down to benchmarking which value suits your query the best.
Clustering Factor
The clustering factor of an index, described in great detail in
Metalink Note : 39836.1,
basically, measures the correlation between the "ordered-ness" of your table data relative to your index.
If there's a high correlation (i.e. the clustering factor is close to the number of blocks in the table)
then the index is well ordered with respect to the table data. In this case, less block reads are likely
to occur. For a brilliant (and easily accessible) page on clustering factors, see
Howard Rogers' page.
Just for interest, however, you can see the difference that having sorted data makes by looking at the
clustering factors of the two indexes created in the previous examples :
SQL> SELECT table_name, num_rows, blocks
2 FROM user_tables
3 WHERE table_name IN ( 'TEMP_DISORGANISED', 'TEMP_SORTED' );
TABLE_NAME NUM_ROWS BLOCKS
------------------------------ ---------- ----------
TEMP_DISORGANISED 49755 180
TEMP_SORTED 49755 172
SQL> SELECT index_name, clustering_factor
2 FROM user_indexes
3 WHERE index_name IN ( 'TEMP_SORT_IND', 'TEMP_DIS_IND' );
INDEX_NAME CLUSTERING_FACTOR
------------------------------ -----------------
TEMP_DIS_IND 49455
TEMP_SORT_IND 160
Here, the TEMP_SORT_IND index clustering factor is MUCH closer to the number of blocks in the table,
so you would expect significantly less block reads (which, of course, explains the huge difference
in logical IO between the sorted and un-sorted data queries).
It should be obvious that, since the clustering factor is a measure of the number of blocks
that Oracle expects to visit in any one operation, then the decision on whether the optimiser should
use an index is NOT (as most people think) based on a certain proportion of the rows
which will be selected, but is based on a certain proportion of blocks (which the clustering
factor is used to try and estimate).
Table Partitioning
Table Partitioning can be a great tool for minimising the amount of data that a given query has to go through in order to resolve itself.
It is especially useful on larger tables, but you have to be aware that the creation of a partition has to be done using a partition key,
i.e. an identifier that is used to partition the data.
All mechanisms of table partitioning have three common goals :
- Enhanced DML and Query Performance. Data can be "eliminated" from any scans etc.
- Increased Manageability. Individual partitions can be on seperate tablespaces and
backed up separately, for example
- Increased Availability. Partitions are individual entities. Taking a partition "offline" does not
make the object inaccessible.
There are three main types, List, Range and Hash.
List Partitioning
List partitioning is a mechanism which was introduced at 9i, and is amazing why it took Oracle so long to introduce such a great feature,
especially if you consider that partitioning was introduced at Oracle8.
The idea is simple, you specify that if the value of a column is in a set of values it is
put into a certain partition, i.e.
SQL> CREATE TABLE t ( a VARCHAR2(10) )
2 PARTITION BY LIST (a)
3 (
4 PARTITION a1 VALUES ('X'),
5 PARTITION a2 VALUES ('Y')
6 )
7 /
Table created.
You specify the DEFAULT option for the ELSE clause, i.e.
SQL> CREATE TABLE t ( a VARCHAR2(10) )
2 PARTITION BY LIST (a)
3 (
4 PARTITION t1 VALUES (50),
5 PARTITION t2 VALUES (DEFAULT)
6 )
7 /
Table created.
Note, that trying to insert a value which does not map to a partition results in error, i.e.
SQL> INSERT INTO T VALUES ('Z');
INSERT INTO T VALUES ('Z')
*
ERROR at line 1:
ORA-14400: inserted partition key does not map to any partition
*IMPORTANT*
NULL values will be placed into the DEFAULT partition for list partitions.
Range Partitioning
Range Partitioning allows you to specify partitions based on a range of data, i.e.
CREATE TABLE t
(
t_id INT
)
PARTITION BY RANGE (t_id)
(
PARTITION t1 VALUES LESS THAN (500),
PARTITION t2 VALUES LESS THAN (1000),
PARTITION t3 VALUES LESS THAN (1500)
);
Note, the partition HAS to use the VALUES LESS THAN syntax, you cannot use GREATER THAN etc.
You can also use the MAXVALUE option to specify, effectively, an ELSE clause, i.e.
SQL> CREATE TABLE t ( a VARCHAR2(10) )
2 PARTITION BY RANGE (a)
3 (
4 PARTITION t1 VALUES LESS THAN (10),
5 PARTITION t2 VALUES LESS THAN (MAXVALUE)
6 )
7 /
Table created.
*IMPORTANT*
NULL values will be put into the MAXVALUE partition for range partitioning.
Hash Partitioning
HASH partitions basically apply a function to a partition key and then partitions by the output of that.
You specify how many partition hash "buckets" to partition over, which will then assign all rows into
one of n partitions based on the output of this hash function.
These are incredibly useful for unique identifiers etc, or where you don't have a range that you can easily derive.
SQL> CREATE TABLE t ( a VARCHAR2(10) )
2 PARTITION BY HASH (a) PARTITIONS 8;
Table created.
*IMPORTANT*
It is "undefined" into which partition NULL values are placed.
Sub-Partitioning
Sub-partitions can be created where you may want to sub-partition an existing partition, i.e.
/* An example of RANGE - HASH "composite" partitioning */
SQL> CREATE TABLE t ( a VARCHAR2(10) )
2 PARTITION BY RANGE (a)
3 SUBPARTITION BY HASH(a) SUBPARTITIONS 8
4 (
5 PARTITION t1 VALUES LESS THAN (50)
6 )
7 /
Table created.
/* An example of RANGE - LIST composite partitioning */
SQL> CREATE TABLE t ( a VARCHAR2(10) )
2 PARTITION BY RANGE (a)
3 SUBPARTITION BY LIST (a)
4 (
5 PARTITION t1 VALUES LESS THAN (50)
6 (
7 SUBPARTITION s1 VALUES (10)
8 )
9 )
10 /
Table created.
Selecting from a Partition
You can specify an individual partition to SELECT from, via the PARTITION option of the table
clause, i.e.
SQL> CREATE TABLE t ( a NUMBER )
2 PARTITION BY RANGE(a)
3 (
4 partition p1 VALUES LESS THAN (50),
5 partition p2 VALUES LESS THAN (100)
6 );
Table created.
SQL> insert into t VALUES (10);
1 row created.
SQL> insert into t VALUES (70);
1 row created.
SQL> SELECT * FROM t PARTITION (p1);
A
----------
10
SQL> SELECT * FROM t PARTITION (p2);
A
----------
70
Exchanging Partitions
Believe it or not, there is an argument which suggests that all tables should have at least one
defined partition! Why, you may ask? Well, it's to do with how efficient it is to exchange partitions,
and thus move the data from one table to another in bulk.
Here is an example to show an efficient mechanism of moving data from three tables (t1, t2, t3) into
one partitioned table (T).
First of all, create the three tables that you want to move into the three partitions of a single table.
In most systems, these will probably already exist in one form or another :
SQL> CREATE TABLE t1 AS SELECT sysdate dt, all_objects.* FROM all_objects;
Table created.
SQL> CREATE TABLE t2 AS SELECT ADD_MONTHS(sysdate,-12) dt, all_objects.* FROM all_objects;
Table created.
SQL> CREATE TABLE t3 AS SELECT ADD_MONTHS(sysdate,-24) dt, all_objects.* FROM all_objects;
Table created.
Now, create the partitioned table with the partitions for loading the three tables into (blank for now).
SQL> CREATE TABLE t
2 PARTITION BY RANGE(dt) (
3 PARTITION part2000 VALUES LESS THAN ( TO_DATE( '01-jan-2001', 'dd-mon-yyyy') ),
4 PARTITION part2001 VALUES LESS THAN ( TO_DATE( '01-jan-2002', 'dd-mon-yyyy') ),
5 PARTITION part2002 VALUES LESS THAN ( TO_DATE( '01-jan-2003', 'dd-mon-yyyy') )
6 )
7 AS
8 SELECT sysdate dt, all_objects.* FROM all_objects WHERE 1=0;
Table created.
Now, exchange each of the relevant tables into the relevant partitions of t. Note, WITHOUT VALIDATION
is used here to specify that Oracle does not have to validate that each row belongs in the relevant
partition, you can specify WITH VALIDATION to instruct Oracle to validate the data.
SQL> ALTER TABLE t
2 EXCHANGE PARTITION part2000
3 WITH TABLE t3
4 WITHOUT VALIDATION
5 /
Table altered.
SQL> ALTER TABLE t
2 EXCHANGE PARTITION part2001
3 WITH TABLE t2
4 WITHOUT VALIDATION
5 /
Table altered.
SQL> ALTER TABLE t
2 EXCHANGE PARTITION part2003
3 WITH TABLE t1
4 WITHOUT VALIDATION
5 /
Table altered.
SQL> SELECT COUNT(*) FROM t;
COUNT(*)
----------
70848
SQL> SELECT COUNT(*) FROM t1;
COUNT(*)
----------
0
A partition exchange is simply an exchange of segments, no actual movement of data takes place.
The change is purely in the data dictionary. This facilitates movement of large amounts of data very
quickly.
*IMPORTANT*
Note, any global indexes (see below) defined on the table (t) will become unusable after this type of
operation. To ensure that the global indexes are maintained during this operation, you need to use the
UPDATE GLOBAL INDEXES option of the ALTER TABLE command, i.e.
ALTER TABLE t
EXCHANGE PARTITION p1
WITH TABLE t1
UPDATE GLOBAL INDEXES <-- ensures that the global indexes are USABLE after the partition exchange
PARALLEL (DEGREE 4); <-- updates the global indexes in parallel (if appropriate)
Local and Global Indexes
When dealing with partitions, you quickly come across the problem of how indexing strategies will be
affected.
Consider a table, t, with 2 partitions. Each partition is, effectively, it's own segment, so what
you have can be thought of as two tables, potentially in two seperate tablespaces. So, is a "standard"
index effective on it? Possibly, possibly not, it completely depends on what actions
occur upon the table. Oracle provides two mechanisms for indexing partitions, Global and Local Indexes.
Global Indexes can be thought of as a "standard" index with entries for rows in every partition, i.e. a
single index covering all partitions.
Local Indexes can be thought of as many "mini"-indexes with rows for every partition (and only one
partition), i.e. an Index "local" to the partition.
So, which one to use? Well, it depends on the actions on the table, the type of database etc.
Most explanations compare the difference between the two "types" of database, i.e. OLTP and Data
Warehouse / DSS. I'm going to be no different.
A data warehouse is characterised by :
- Large amounts of data being created in single batches, say nightly
- Majority of operations on the data being SELECTs (reports etc.), typically of "sliding windows" of
data.
- Large degree of indexing
In these types of systems, partitioning can be very beneficial, when done correctly, since the amount of
data that needs to be "scanned" in the sliding window, is now a much smaller subset of data within a few
partitions. This is called partition elimination. However, global indexes
are usually not appropriate for data warehouses, for the simple reason that partition-level operations
are common, and partition maintenance
invalidates global indexes. You have to rebuild them often. So, here, local indexes make more sense.
OLTP systems, on the other hand, are characterised by
- Many small read / write transactions occurring frequently
- Sliding windows are, in general, not required
- Fast access to the rows (or small number of rows) is important
So, here, global indexes make sense, for the simple reason that data can only be partitioned by a single set of columns, but the operations that
occur are in multiple different ways, not necessarily via the partition key. So, using local indexes on a non-partition key, will involve
having to search "n" local indexes (unless, of course, you search on the partition key).
Using global indexes will mean a scan of a single (albeit larger) index, but the searches on the small subset
of data can be much more efficient, partition elimination therefore, effectively, occurs via the index.
Clusters
All clusters physically colocate data on disk. This can benefit certain types of queries.
Standard (B*Tree) Clusters
Standard Clusters are a mechanism for grouping common data physically on disk. This can greatly benefit systems
where most queries have a common set of join columns, but there may be performance drawbacks on queries which
access the tables seperately.
Note, though, that the use of so-called "single table" clusters, can be
extremely beneficial to some queries, since data will be co-located on disk via the cluster key. This
is subtly different than doing the same thing but via, say, an IOT, since there is no ordering in a
cluster, only grouping by the cluster key.
They utilise B*Tree indexes to physically colocate the data.
SQL> CREATE CLUSTER a_b_cluster ( id NUMBER );
Cluster created.
SQL> CREATE TABLE a ( id NUMBER, col1 VARCHAR2(10) )
2 CLUSTER a_b_cluster ( id );
Table created.
SQL> CREATE TABLE b ( id NUMBER, col2 VARCHAR2(10) )
2 CLUSTER a_b_cluster ( id );
Table created.
/* The Cluster Index is REQUIRED before operations can be done on the cluster */
SQL> CREATE INDEX a_b_cluster_index ON CLUSTER a_b_cluster;
Index created.
SQL> SELECT 1
2 FROM
3 a,b
4 WHERE a.id = b.id;
no rows selected
The following example EXPLAIN PLAN shows the operations typical when dealing with clusters :
SQL> EXPLAIN PLAN FOR
2 SELECT a.id
3 FROM a,b
4 WHERE a.id = b.id
5 /
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 3 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 26 | 3 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL | A | 1 | 13 | 2 (0)| 00:00:01 |
|* 3 | TABLE ACCESS CLUSTER| B | 1 | 13 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("A"."ID"="B"."ID")
The previous example shows tables with common column names, i.e. "id", but this does not have to be
the case, they just, obviously, have to be the same datatype :
SQL> CREATE CLUSTER a_b_cluster ( id NUMBER );
Cluster created.
SQL> CREATE TABLE a ( id2 NUMBER )
2 CLUSTER a_b_cluster (id2);
Table created.
Clusters are useful when you want data with the same cluster key values to be physically stored near
each other.
Hash Clusters
Hash clusters are useful when you always access the data by primary key. For example, say you have
some data in a table T and you ALWAYS query:
select * from T where id = :x;
That might be a good candidate for a hash cluster since there is no index needed. Oracle will hash
the value of :x into a physical address and go right to the data. No index range scan, just a table
access by the hash value.
Hash clusters differ from normal clusters in that they cluster the data via a hash function
(which can be specified manually, with the HASH IS clause of the CREATE CLUSTER statement).
SQL> CREATE CLUSTER a_b_cluster ( id NUMBER )
2 HASHKEYS 500;
Cluster created.
SQL> CREATE TABLE a ( id NUMBER )
2 CLUSTER a_b_cluster ( id );
Table created.
SQL> EXPLAIN PLAN FOR
2 SELECT * FROM a WHERE id = 10;
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------
---------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 0 (0)|
|* 1 | TABLE ACCESS HASH| A | 1 | 13 | |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=10)