Using NULLs


Definition of NULL

NULL is probably the most controversial database (not just Oracle) construct / concept ever. E.F.Codd, inventor of relational theory, defines it as :

Rule 3: Systematic Treatment of Null Values

Null values (distinct from empty character string or a string of blank characters and distinct from zero or any other number) are supported in the fully relational DBMS for representing missing information in a systematic way, independent of data type.

The problem is when you try and use them. It's amazing how many (even experienced) developers forget these basic tenets.

1. Using NULL in an expression always returns NULL.

SQL> SELECT 1 + NULL FROM DUAL;

    1+NULL
----------


1 row selected.

2. Appending NULL to a non-null string does not affect the string.

SQL> SELECT 'XXX' || NULL FROM DUAL;

'XX
---
XXX

1 row selected.

3. Aggregate functions tend to "ignore" NULLs.

SQL> SELECT a
  2  FROM t;

         A
----------
        10
             <-- NULL value

2 rows selected.

SQL> SELECT SUM(a), AVG(a), MAX(a), MIN(a), COUNT(a), COUNT(*)
  2  FROM t;

    SUM(A)     AVG(A)     MAX(A)     MIN(A)   COUNT(A)   COUNT(*)
---------- ---------- ---------- ---------- ---------- ----------
        10         10         10         10          1          2

1 row selected.
All aggregate functions ignore the NULL entry except, of course, COUNT(*).

4. GROUP BY treats NULL as a seperate "value"

SQL> SELECT a, COUNT(a), COUNT(*)
  2  FROM t
  3  GROUP BY a;

         A   COUNT(A)   COUNT(*)
---------- ---------- ----------
        10          1          1
                    0          1

2 rows selected.

5. NULL has to be tested using the IS [NOT] NULL clause

NULL is NOT equal to NULL, nor is it equal to NULL. NULL implies "unknown", so it is "unknown" if NULL is equal to NULL, or NOT equal to NULL, hence you have to test for "nullity" via the IS NULL or IS NOT NULL clause, i.e.
/* An example to show that trying to equate NULL via any method will not work */

SQL> SELECT 1
  2  FROM   dual
  3  WHERE NULL <> NULL
  4  OR    NULL != NULL
  5  OR    NULL = NULL
  6  OR    NULL IN ( SELECT NULL FROM DUAL )
  7  OR    NULL NOT IN ( SELECT 'X' FROM dual );

no rows selected

SQL> SELECT 1
  2  FROM dual
  3  WHERE NULL IS NULL;

         1
----------
         1

1 row selected.

Note, that DECODE assumes that a NULL is the same as another NULL, i.e.

SQL> SELECT DECODE(NULL, NULL, 'NULL', 'NOT NULL') FROM dual;

DECO
----
NULL

1 row selected.

6. Completely NULL values will not be indexed

Think of it like this, when you

CREATE INDEX t_idx ON t(a,b)

it is "as if" it were creating the index on a || b.

If the result of a || b Is NOT NULL -- an entry will be made in the index structure. If it is NULL, it will not be in the index.

NULL columns may be stored in an index (a b*tree index) however, entirely NULL index entries are NOT stored in the b*tree.
Proof :

SQL> CREATE TABLE t ( a VARCHAR2(10), b VARCHAR2(10) );

Table created.

SQL> INSERT INTO t VALUES ( 'X', 'X' );

1 row created.

SQL> INSERT INTO t VALUES ( 'X', NULL );

1 row created.

SQL> INSERT INTO t VALUES ( NULL, 'X' );

1 row created.

SQL> INSERT INTO t VALUES ( NULL, NULL );

1 row created.

SQL> CREATE INDEX t_ind ON t ( a,b );

Index created.

SQL> ANALYZE INDEX t_ind VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT height, lf_rows, lf_blks, br_rows, br_blks, used_space, rows_per_key
  2  FROM   index_stats;

    HEIGHT    LF_ROWS    LF_BLKS    BR_ROWS    BR_BLKS USED_SPACE ROWS_PER_KEY
---------- ---------- ---------- ---------- ---------- ---------- ------------
         1          3          1          0          0         43            1

1 row selected.
Proving that 3 rows are created, all except the all NULL row.

Utilising the index for queries of the form "column IS NULL" will only use the index IF the optimiser determines that the query cannot return a row from the table where each column is NULL, i.e.
SQL> ANALYZE TABLE t COMPUTE STATISTICS;

Table analyzed.

SQL> SELECT a, b
  2  FROM   t
  3  WHERE  a IS NULL;

A          B
---------- ----------
           X


2 rows selected.

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=2 Bytes=4)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=2 Card=2 Bytes=4)

FULL TABLE SCAN occurs since the predicate does not remove the fact that B can be NULL

SQL> SELECT a, b
  2  FROM   t
  3  WHERE  a IS NULL
  4  AND    b = 'X';

A          B
---------- ----------
           X

1 row selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=2)
   1    0   INDEX (RANGE SCAN) OF 'T_IND' (NON-UNIQUE) (Cost=1 Card=1 Bytes=2)
INDEX scan now occurs, since B cannot be NULL

7. Default Datatype of NULL is VARCHAR2

NULL is a convention for describing the "absence" or "uncertainty" of data, isn't it? Yes and No. In Oracle, NULL still has a datatype, and by default it is VARCHAR2.

You can show this with the following example.
SQL> SELECT x, DECODE(x, 2, NULL, x)
  2  FROM
  3  ( SELECT 1 x FROM dual
  4    UNION ALL
  5    SELECT -2 x FROM dual );

         X DECODE(X,2,NULL,X)
---------- ----------------------------------------
         1 1
        -2 -2

2 rows selected.
So, as we see, SELECTing x by itself shows it's a number (right justified), but wrapping a DECODE around it with a NULL expression left justifies it and is therefore a string. But how does this tell us that NULL is a string? Simply, because of the way that DECODE is documented to work. When dealing with the return datatype of DECODE, Oracle looks at the first return value datatype and uses that, in this case NULL. NULL must therefore default to a VARCHAR2.

The "fix"? Be explicit, i.e.
SQL> SELECT x, DECODE(x, 2, TO_NUMBER(NULL), x)
  2  FROM
  3  ( SELECT 1 x FROM dual
  4    UNION ALL
  5    SELECT -2 x FROM dual );

         X DECODE(X,2,TO_NUMBER(NULL),X)
---------- -----------------------------
         1                             1
        -2                            -2

2 rows selected.

8. NULLs in Constraints

Foreign Keys

Remarkably, you can use NULL to "circumvent" foreign key constraints. This is not a feature of Oracle, but is defined in ANSI SQL-92.

Here's an example :
SQL> CREATE TABLE parent ( a  VARCHAR2(10) NOT NULL,
                           b  VARCHAR2(10) NOT NULL,
                           CONSTRAINT pk PRIMARY KEY (a,b) );

Table created.

SQL> CREATE TABLE child ( a  VARCHAR2(10) NOT NULL,
                          b  VARCHAR2(10),
                          CONSTRAINT fk FOREIGN KEY (a,b) REFERENCES parent(a,b) );

Table created.
Now, let's try inserting a row into the child table.
SQL> INSERT INTO child VALUES ('X', 'Y');
INSERT INTO child VALUES ('X', 'Y')
*
ERROR at line 1:
ORA-02291: integrity constraint (ORAUSER.FK) violated - parent key not found
We can't since there's no parent record, BUT we can insert a partially NULL value........
SQL> INSERT INTO child VALUES ('X', NULL );

1 row created.
Oracle's documentation (referenced above) mentions that the way to prevent this behaviour is to use NOT NULL or CHECK constraints.

Unique Constraints

Unique constraints will accept entries which are completely NULL, but partial NULLs are rejected, i.e.
SQL> CREATE TABLE t ( a VARCHAR2(10),
  2                   b VARCHAR2(10),
  3                   CONSTRAINT t_uniq UNIQUE (a,b) );

Table created.
Try and insert two rows which are partially NULL, and it is rejected, i.e.
SQL> INSERT INTO t VALUES ('X', NULL);

1 row created.

SQL> /
INSERT INTO t VALUES ('X', NULL)
*
ERROR at line 1:
ORA-00001: unique constraint (ORAUSER.T_UNIQ) violated
Try and insert rows which are completely NULL, and they're accepted, i.e.
SQL> INSERT INTO t VALUES (NULL, NULL);

1 row created.

SQL> /

1 row created.
Primary Keys

This is not a problem with Primary Keys, since by definition columns in a Primary Key, are NOT NULL, i.e.
SQL> CREATE TABLE t ( a VARCHAR2(10),
  2                   CONSTRAINT t_pk PRIMARY KEY (a) );

Table created.

SQL> DESC t;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 A                                         NOT NULL VARCHAR2(10)

9. NULL is NOT CHR(0)!

ASCII charts define CHR(0) as null, but this is not the same as Oracle's NULL, i.e.
SQL> SELECT 1
  2  FROM   dual
  3  WHERE  CHR(0) IS NULL
  4  /

no rows selected
CHR(0) is therefore NOT NULL, i.e.
SQL> SELECT 1
  2  FROM   dual
  3  WHERE  CHR(0) IS NOT NULL
  4  /

         1
----------
         1
*DEV NOTE*
You can use the DUMP command to see the internal structure of data, and this is further proof of the difference between the two :
SQL> SELECT *
  2  FROM ( SELECT 'NULL', DUMP(NULL)
  3         FROM   dual
  4         UNION ALL
  5         SELECT 'CHR(0)', DUMP(CHR(0))
  6         FROM   dual )
  7  /

'NULL' DUMP(NULL)
------ --------------
NULL   NULL
CHR(0) Typ=1 Len=1: 0

10. NULLs affect NOT IN expressions

You have to be extremely careful when dealing with NULLs when they occur as the result of a NOT IN operation (either subquery or value list).

For example, the outcome of the following query is what you'd expect :
SQL> SELECT *
  2    FROM dual
  3   WHERE 'x' IN ( NULL, 'x' );

D
-
X
But, the following query does not behave maybe as you'd expect :
SQL> SELECT *
  2    FROM dual
  3   WHERE 'x' NOT IN ( NULL, 'x' );

no rows selected
The "problem", however, often catches developers out when dealing with subqueries. Let's populate a table with a NULL and a NOT NULL value :
SQL> INSERT INTO t (a) VALUES (NULL);

1 row created.

SQL> INSERT INTO t (a) VALUES ('x');

1 row created.
Now, let's try querying on this table in a subquery :
SQL> SELECT *
  2    FROM dual
  3   WHERE 'x' IN ( SELECT a FROM t );

D
-
X
OK, as we'd expect, but watch what happens with NOT IN and a value that isn't in the subquery result set :
SQL> SELECT *
  2    FROM dual
  3   WHERE 'y' NOT IN ( SELECT a FROM t );

no rows selected
Why did this happen?

It's actually to do with an earlier
"rule", i.e. the issue of checking whether a value IS or IS NOT NULL.

The optimiser "rewrites" IN expressions in the following way :
x IN ( value1, value2 )

as

WHERE x = value1 OR x = value2
So, as you can see, the existence of the OR ensures that if any of the "values" are NULL, then it doesn't affect the checking of the others.

HOWEVER, a NOT IN expression is evaluated as :
x NOT IN ( value1, value2 )

as

WHERE x != value1 AND x != value2
Now, you should be able to see why NULL values affect this resultant expression, if ANY of the values are NULL, the whole expression returns "false", and hence no rows.

Therefore, you can say with certainty that any expression utilising NOT IN must not have any values which are NULL, otherwise, no rows will be returned / it will return "false", and is almost certainly a bug in your code.

11. LIKE ignores NULLs

As mentioned
above, the only way of searching for NULL values is to use the IS NULL clause. LIKE cannot return NULL values, even on an "open" query, i.e.
SQL> CREATE TABLE t ( a VARCHAR2(10) );

Table created.

SQL> INSERT INTO t VALUES (NULL);

1 row created.

SQL> INSERT INTO t VALUES ('x');

1 row created.

SQL> SELECT *
  2    FROM t
  3   WHERE a LIKE '%';

A
------------------------------
x

1 row selected.