Deadlocks

What are deadlocks?
The "classic" deadlock
- What happens to both sessions?
- So, what's the fix?
Deadlocks with Autonomous Transactions
Deadlocks with Bitmap Indexes
Deadlocks with unindexed foreign keys
Detecting and identifying deadlocks
Further information

What are deadlocks?

A deadlock is the situation where you have two, or more, Oracle "sessions" (well, transactional "states") competing for mutually locked resources. Oracle deals with deadlocks pretty much immediately by raising an exception (ORA-00060) in one of the sessions.

There are a few reasons why your application may experience deadlocks, most of which are about application design. However, there are a few situations when, due to certain architectural design decisions, you may experience deadlocks simply due to the internal mechanisms of Oracle itself.

The "classic" deadlock

If session 1 is locking row 1, session 2 locks row 2, then session 1 attempts to lock row 2 (which will block since session 2 has the lock on that row), and then session 2 attempts to lock row 1 (which will block since session 1 has the lock on that row), then session 1 is waiting for session 2, and session 2 is waiting on session 1, which of course will never be resolved.

Here's an actual Oracle example. First of all, setup the environment :

SQL> CREATE TABLE t ( a varchar2(10) );

Table created.

SQL> insert into t values ( 'x' );

1 row created.

SQL> insert into t values ( 'y' );

1 row created.

SQL> commit;

Commit complete.

Now, session 1 attempts to delete row 'x', but doesn't commit :

SESS1> delete from t where a = 'x';

1 row deleted.

Okay, now session 2 attempts to delete row 'y', again, not committing :

SESS2> delete from t where a = 'y';

1 row deleted.

Now, since session 2 has not committed the delete against row 'y', session 1 can still "see" it (and attempt to delete it) :

SESS1> delete from t where a = 'y';

However, this will be blocked by session 2 (and hence wait), since row 'y' is locked, so session 1 is holding the lock on row 'x' but is waiting for row 'y'. Now, session 2 attempts to do it the "other way around", i.e. attempting to delete row 'x', while it retains the lock on row 'y' :

SESS2> delete from t where a = 'x';

this will be blocked by session 1 (and hence wait). So, the situation is tied, session 2 is waiting for session 1 and session 1 is waiting for session 2.

So, in reality, what happens in this situation? Well, obviously deadlocks just simply can't be allowed to occur, it's basically an infinite loop, the situation can't be resolved. So, Oracle steps in, automatically detecting the deadlock and resolving it by causing one of the sessions (can't actually guarantee which one) to receive an ORA-00060 error, i.e.

ORA-00060: deadlock detected while waiting for resource

What happens to both sessions?

So, we've seen what happens in terms of the deadlock, one of the sessions will error with the ORA-00060 error, but what about the other one? Well, this may be surprising to some people, but it carries on waiting, because the ORA-00060 error in the other session does not commit or rollback, and hence it retains it's previous lock(s), including those involved in the deadlock scenario. It is the responsibility of the client to commit or rollback if an ORA-00060 error is encountered.

So, what's the fix?

As with all deadlock scenarios, it's about avoidance rather than remedy. For the classic deadlock situation, you have to ensure that sessions do their operations in the same "order", i.e. if a process updates table t1 then t2, that any other process does so in the same order.

Deadlocks with Autonomous Transactions

As mentioned in the previous section, all deadlocks are variations on the same theme, i.e. multiple "sessions" competing for mutually locked resources. Since an Autonomous Transaction (AT) is basically a "session within a session", it's perfectly possible for it to occur here as well. This is maybe a little less obvious, since some people think that somehow they're part of the same "parent" session, but this is not the case. Note, however, that a deadlock is slightly more subtle when using ATs :

Showing this is trivial. First, create an AT process :

SQL> CREATE OR REPLACE PROCEDURE p_at
  2  AS
  3  PRAGMA AUTONOMOUS_TRANSACTION;
  4  BEGIN
  5    DELETE FROM t;
  6    COMMIT;
  7  END p_at;
  8  /

Procedure created.

Now, put a single row into a table :

SQL> insert into t values ( 'x' );

1 row created.

SQL> commit;

Commit complete.

Now, delete the row in the "parent" session :

SQL> delete from t ;

1 row deleted.

Now, since the "parent" session is locking the row in "t", any attempt to delete it via an AT from within the "parent" session, will result in ORA-00060, i.e.

SQL> exec p_at;
BEGIN p_at; END;

*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "MARTIN.P_AT", line 5
ORA-06512: at line 1

So, why does the AT simply not block on the 'x' row? Well, if it did, imagine the situation, "parent" session is waiting for the AT to "finish" (like any other PL/SQL call), but it never will since it's blocked waiting for it's parent session to commit or rollback.

Deadlocks with Bitmap Indexes

Bitmap indexes were designed to be used solely within data warehouses, i.e. where the vast majority of the database activity is reading data, and there's very little (or no) data modification, except for batch processes which occasionally re-populate the warehouse. Each "row" in the bitmap index contains references to potentially many different rowids, in contrast to a B*-tree index which references a single rowid. It should be obvious, therefore, that, since the transactional mechanism is the same for all database operations, that any DML on a table which impacts the bitmap index may end up locking (or attempting to lock) many different "rows" within the index. This is the key concept with deadlocks in bitmap indexes, you're not being deadlocked on the underlying table, but on the index blocks.

The ability of an INSERT statement to deadlock (which is usually impossible) is therefore explained in that the bitmap index is maintaining other rows as a result of the DML.

Fortunately, it's easy to detect these kind of deadlocks, because the trace file has a very particular format :

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-00080027-0000d2a1        12      37     X             15      35           S
TX-000a0016-0000d6d2        15      35     X             12      37           S
session 37: DID 0001-000C-00000002  session 35: DID 0001-000F-00000002
session 35: DID 0001-000F-00000002  session 37: DID 0001-000C-00000002
Rows waited on:
Session 35: no row
Session 37: no row

Note, the "no row" entries. This is the prime indicator that this is a bitmap index deadlock. There's very little you can do about this, apart from not use bitmap indexes in a DML environment.

Deadlocks with unindexed foreign keys

There are various attributes which can be specified when creating foreign key constraints. The ones of interest are UPDATE and DELETE CASCADE. These attributes define what happens to the child record(s) when various actions are performed on the parent record.

The upshot of the deadlock problem is that in these situations (i.e. using the UPDATE or DELETE CASCADE options of the FK constraint), without an index on the foreign key column(s), Oracle has no option but to attempt to get a table lock on the child table while it performs it's maintenance. In this situation, it's obvious that the potential for deadlocking is vastly increased for having to lock the whole child table.

Fortunately, resolving this is quite straightforward, you simply put an index on the foreign key column(s).

So, is there any way of being proactive about this, and identifying the table(s) / column(s) which are the subject of a foreign key constraint, but do not have indexes? Yes, Tom Kyte has written one and it's located here.

For more information about this problem (and other performance related issues regarding unindexed FKs), see http://asktom.oracle.com/~tkyte/unindex/index.html.

Detecting and identifying deadlocks

Well, the obvious way of detecting a deadlock is when you see the ORA-00060 error, however, sometimes ( especially with complex applications ), it may not be obvious what is actually causing the deadlock itself. Fortunately, whenever a deadlock occurs, Oracle creates a trace file (in user_dump_dest), with information which will allow you to see details about the deadlock, what the shared resource is, what the current SQL statement was, etc. etc.

Here's an extract from the trace file generated on a 10g Express Edition (XE) from the above first deadlock situation above. Note, the full trace file contains a lot more information for the developer / DBA to utilise to determine the exact cause of the situation which caused the deadlock.

Trace file extract from two distinct Oracle sessions

*** 2006-11-05 13:29:16.921
*** ACTION NAME:() 2006-11-05 13:29:16.906
*** MODULE NAME:(SQL*Plus) 2006-11-05 13:29:16.906
*** SERVICE NAME:(SYS$USERS) 2006-11-05 13:29:16.906
*** SESSION ID:(27.5) 2006-11-05 13:29:16.906
DEADLOCK DETECTED
[Transaction Deadlock]
Current SQL statement for this session:
delete from t where a = 'x'
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-00070023-0000009c        18      27     X             20      38           X
TX-00060021-0000008d        20      38     X             18      27           X
session 27: DID 0001-0012-00000011  session 38: DID 0001-0014-00000005
session 38: DID 0001-0014-00000005  session 27: DID 0001-0012-00000011
Rows waited on:
Session 38: obj - rowid = 0000354E - AAADVOAAEAAAAAPAAB
  (dictionary objn - 13646, file - 4, block - 15, slot - 1)
Session 27: obj - rowid = 0000354E - AAADVOAAEAAAAAPAAA
  (dictionary objn - 13646, file - 4, block - 15, slot - 0)

Note, the fragment (dictionary objn - 13646, file - 4, block - 15, slot - 1), the objn - 13646 relates to the object_id, which can be queried via dba_objects or similar.

Trace file extract for same session with Autonomous Transaction

*** 2006-11-05 14:04:20.328
*** ACTION NAME:() 2006-11-05 14:04:20.328
*** MODULE NAME:(SQL*Plus) 2006-11-05 14:04:20.328
*** SERVICE NAME:(SYS$USERS) 2006-11-05 14:04:20.328
*** SESSION ID:(27.26) 2006-11-05 14:04:20.328
DEADLOCK DETECTED
[Transaction Deadlock]
Current SQL statement for this session:
DELETE FROM T
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
209DD638         5  procedure MARTIN.P_AT
1D1C566C         1  anonymous block
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-00080003-00000091        18      27     X             18      27           X
session 27: DID 0001-0012-0000001E  session 27: DID 0001-0012-0000001E
Rows waited on:
Session 27: obj - rowid = 0000354E - AAADVOAAEAAAAAPAAA
  (dictionary objn - 13646, file - 4, block - 15, slot - 0)

As you can see, Oracle does a good job of giving as much information about the involved sessions (including "current" SQL statement, etc.).

Further Information

The issue of deadlocks (and how to deal with them) is quite an extensive subject, and there are plenty of web resources for understanding them.
Note, the Oracle Metalink links require a metalink account to access.