Converting delimited lists to collections (and vice versa)

Delimited lists to collections
Collections to delimited lists
- Using SYS_CONNECT_BY_PATH
- Custom Aggregate Functions

It is a common requirement to be able to take a delimited list of values, say A,B,C,D, and treat this data like it was a set of rows in a table, or vice versa.

In an ideal world, this should never occur for many reasons. Firstly, the approach introduces a form of unnecessary limitation, i.e. the maximum length of a string, 32K-1 in PL/SQL for a VARCHAR2 etc. and secondly, the data construct is just wrong. You wouldn't store a set of rows in the database as a delimited string, so why is this any different?

Practically all environments which can interact with an Oracle database, such as VB, C++, Java or, indeed, Oracle Forms, has the concept of an array, or collection, which is a set of data elements, just like a relational table, and which can be mapped to an equivalent Oracle collection type.

I have worked on many systems where there is an inherent limitation in functionality simply because of approaches like this. It's as simple as this: If you want to have a system with no logical limitation in the number of data elements passed to a given process, then forget the following mechanisms! They are simply the wrong way to approach the problem.

Using DBMS_UTILITY.COMMA_TO_TABLE

One method is to take advantage of a DBMS_UTILITY procedure called COMMA_TO_TABLE (there is also a TABLE_TO_COMMA which does the reverse). Note, DBMS_UTILITY assumes a comma delimiter, if you delimit by something else, you will need to do a REPLACE beforehand.

SQL> DECLARE
  2    l_tab   dbms_utility.uncl_array;
  3    l_tablen   number;
  4  BEGIN
  5    dbms_utility.comma_to_table('A,B,C', l_tablen, l_tab);
  6    dbms_output.put_line('TABLE LENGTH : '|| l_tablen);
  7    dbms_output.put_line('TABLE COUNT  : '|| l_tab.COUNT);
  8    for i in 1..l_tablen
  9    loop
 10      dbms_output.put_line(l_tab(i));
 11    end loop;
 12  END;
 13  /
TABLE LENGTH : 3
TABLE COUNT  : 4
A
B
C

PL/SQL procedure successfully completed.

Notice the difference between the .COUNT and the value of the output parameter (l_tablen). Basically, the end of the collection always has a NULL "extra" row. Basically, always use the value of the output parameter when looping through the collection (or of course .COUNT-1), to ensure you don't experience any unwanted side-effects. Why the extra row? No idea! But, I do stress, read the following important caveat about this approach in general.

*IMPORTANT CAVEAT*
COMMA_TO_TABLE (and the reverse TABLE_TO_COMMA) are not written for this purpose! They both use the NAME_TOKENIZE function (again found in DBMS_UTILITY). They are written primarily for use within replication internally by Oracle, and parse IDENTIFIERS rather than strings, and as such have to be valid Oracle object names. Trying to pass numbers, elements greater than 30 characters, reserved words etc. will not work, i.e.

SQL> DECLARE
  2    l_tab   dbms_utility.uncl_array;
  3    l_tablen   number;
  4  BEGIN
  5    dbms_utility.comma_to_table('1,2,3', l_tablen, l_tab);
  6    dbms_output.put_line('TABLE LENGTH : '|| l_tablen);
  7    dbms_output.put_line('TABLE COUNT  : '|| l_tab.COUNT);
  8    for i in 1..l_tablen
  9    loop
 10      dbms_output.put_line(l_tab(i));
 11    end loop;
 12  END;
 13  /
DECLARE
*
ERROR at line 1:
ORA-00931: missing identifier
ORA-06512: at "SYS.DBMS_UTILITY", line 125
ORA-06512: at "SYS.DBMS_UTILITY", line 160
ORA-06512: at "SYS.DBMS_UTILITY", line 202
ORA-06512: at line 5

SQL> DECLARE
  2    l_tab   dbms_utility.uncl_array;
  3    l_tablen   number;
  4  BEGIN
  5    dbms_utility.comma_to_table('this_is_a_long_element_greater_than_30_characters,B,C', l_tablen, l_tab);
  6    dbms_output.put_line('TABLE LENGTH : '|| l_tablen);
  7    dbms_output.put_line('TABLE COUNT  : '|| l_tab.COUNT);
  8    for i in 1..l_tablen
  9    loop
 10      dbms_output.put_line(l_tab(i));
 11    end loop;
 12  END;
 13  /
DECLARE
*
ERROR at line 1:
ORA-00972: identifier is too long
ORA-06512: at "SYS.DBMS_UTILITY", line 125
ORA-06512: at "SYS.DBMS_UTILITY", line 160
ORA-06512: at "SYS.DBMS_UTILITY", line 202
ORA-06512: at line 5

For this reason alone, I would not recommend using this approach which, while fast (and you may be doing this on elements less than 30 characters, and hence "get away with it"), it is limited and, frankly, I wouldn't trust it to always stay this way.

Using a Table Function

A nice approach for converting a delimited list of a collection is via a custom PL/SQL Table Function which you can then SELECT from via the TABLE operator, i.e.

/* Create the output TYPE, here using a VARCHAR2(100) nested table type */

SQL> CREATE TYPE test_type AS TABLE OF VARCHAR2(100);
  2  /

Type created.

/* Now, create the function.*/

SQL> CREATE OR REPLACE FUNCTION f_convert(p_list IN VARCHAR2)
  2    RETURN test_type
  3  AS
  4    l_string       VARCHAR2(32767) := p_list || ',';
  5    l_comma_index  PLS_INTEGER;
  6    l_index        PLS_INTEGER := 1;
  7    l_tab          test_type := test_type();
  8  BEGIN
  9    LOOP
 10      l_comma_index := INSTR(l_string, ',', l_index);
 11      EXIT WHEN l_comma_index = 0;
 12      l_tab.EXTEND;
 13      l_tab(l_tab.COUNT) := SUBSTR(l_string, l_index, l_comma_index - l_index);
 14      l_index := l_comma_index + 1;
 15    END LOOP;
 16    RETURN l_tab;
 17  END f_convert;
 18  /

Function created.

/* Prove it works */

SQL> SELECT * FROM TABLE(f_convert('AAA,BBB,CCC,D'));

COLUMN_VALUE
--------------------------------------------------------------------------------
AAA
BBB
CCC
D

4 rows selected.

A disadvantage of this type of approach is that the whole collection is built-up and then passed back to the calling routine. This may have memory considerations, in which case, you may need to utilise something like pipelined functions.

Using a Pipelined Function

An alternative to a standard PL/SQL function (from 9i onwards) is the use of pipelined functions.

SQL> CREATE TYPE test_type AS TABLE OF VARCHAR2(100);
  2  /

Type created.

SQL> CREATE OR REPLACE FUNCTION f_convert2(p_list IN VARCHAR2)
  2    RETURN test_type
  3  PIPELINED
  4  AS
  5    l_string       LONG := p_list || ',';
  6    l_comma_index  PLS_INTEGER;
  7    l_index        PLS_INTEGER := 1;
  8  BEGIN
  9    LOOP
 10      l_comma_index := INSTR(l_string, ',', l_index);
 11      EXIT WHEN l_comma_index = 0;
 12      PIPE ROW ( SUBSTR(l_string, l_index, l_comma_index - l_index) );
 13      l_index := l_comma_index + 1;
 14    END LOOP;
 15    RETURN;
 16  END f_convert2;
 17  /

Function created.

SQL> SELECT * FROM TABLE(f_convert('AAA,BBB,CCC,D'));

COLUMN_VALUE
--------------------------------------------------------------------------------
AAA
BBB
CCC
D

4 rows selected.

The advantage of pipelined functions is that the PIPE ROW statement means that the whole collection is not populated all at once, as in Table Functions. In that respect, it is much more memory efficient.

It is often required to be able to take a list of values from a table, and convert them into a delimited list of values. There are many ways to skin this particular cat, from pure SQL to PL/SQL or Java stored procedures, to name but three. I will outline some of the more common methods here.

Using SYS_CONNECT_BY_PATH

If a pure SQL approach is what you require, then you can use SYS_CONNECT_BY_PATH, which is a function which can be used in hierarchical queries, and returns a "path" representation of the current "row", delimited by a given value. Note, that this does mean that you need some way of "ordering" the data, since you need to use CONNECT BY.

Note, to highlight the theory, see the intermediate stages. Firstly, set up the data :

SQL> SELECT * FROM x;

A            B
--- ----------
A            1
B            2
C            3

Secondly, write a query using SYS_CONNECT_BY_PATH :

SQL> SELECT SYS_CONNECT_BY_PATH(a, ',')
  2  FROM x
  3  START WITH b = 1
  4  CONNECT BY b = PRIOR b + 1;

SYS_CONNECT_BY_PATH(A,',')
--------------------------------------------------------------------------------
,A
,A,B
,A,B,C

OK, so we now understand what SYS_CONNECT_BY_PATH is intended to do, we just need to now get the path corresponding to the last row and trim off that leading comma. This example uses MAX and LTRIM.

SQL> SELECT
  2    MAX(LTRIM(SYS_CONNECT_BY_PATH(a, ','), ',')) str_path
  3  FROM x
  4  START WITH b = 1
  5  CONNECT BY b = PRIOR b + 1
  6  /

STR_PATH
--------------------------------------------------------------------------------
A,B,C

Custom Aggregate Functions

In 9i, you can create a custom aggregate function, available now after Oracle published the interface via TYPE methods.

SQL> SELECT *
  2  FROM t;

A
----------
X
Y

SQL> CREATE OR REPLACE TYPE string_agg_type AS OBJECT
  2  (
  3     total VARCHAR2(4000),
  4
  5     STATIC FUNCTION
  6          ODCIAggregateInitialize(sctx IN OUT string_agg_type )
  7          RETURN NUMBER,
  8
  9     MEMBER FUNCTION
 10          ODCIAggregateIterate(self IN OUT string_agg_type ,
 11                               value IN VARCHAR2 )
 12          RETURN NUMBER,
 13
 14     MEMBER FUNCTION
 15          ODCIAggregateTerminate(self IN string_agg_type,
 16                                 returnValue OUT  varchar2,
 17                                 flags IN number)
 18          RETURN NUMBER,
 19
 20     MEMBER FUNCTION
 21          ODCIAggregateMerge(self IN OUT string_agg_type,
 22                             ctx2 IN string_agg_type)
 23          RETURN NUMBER
 24  );
 25  /

Type created.

SQL> CREATE OR REPLACE TYPE BODY string_agg_type
  2  IS
  3
  4  STATIC FUNCTION ODCIAggregateInitialize(sctx IN OUT string_agg_type)
  5    RETURN NUMBER
  6  IS
  7  BEGIN
  8      sctx := string_agg_type( null );
  9      RETURN ODCIConst.Success;
 10  END;
 11
 12  MEMBER FUNCTION ODCIAggregateIterate(self IN OUT string_agg_type,
 13                                       value IN varchar2 )
 14    RETURN NUMBER
 15  IS
 16  BEGIN
 17      self.total := self.total || ',' || value;
 18      RETURN ODCIConst.Success;
 19  END;
 20
 21  MEMBER FUNCTION ODCIAggregateTerminate(self IN string_agg_type,
 22                                         returnValue OUT varchar2,
 23                                         flags IN number)
 24    RETURN NUMBER
 25  IS
 26  BEGIN
 27      returnValue := ltrim(self.total,',');
 28      return ODCIConst.Success;
 29  END;
 30
 31  MEMBER FUNCTION ODCIAggregateMerge(self IN OUT string_agg_type,
 32                                     ctx2 IN string_agg_type)
 33    RETURN NUMBER
 34  IS
 35  BEGIN
 36      self.total := self.total || ctx2.total;
 37      return ODCIConst.Success;
 38  END;
 39
 40  end;
 41  /

Type body created.

SQL> CREATE or replace FUNCTION stragg(input VARCHAR2)
  2    RETURN varchar2
  3  PARALLEL_ENABLE AGGREGATE USING string_agg_type;
  4  /

Function created.

Now, you can use this new aggregate function to perform the operations you require, i.e.

SQL> SELECT stragg(a)
  2  FROM   t;

STRAGG(A)
-------------------------
X,Y