Converting delimited lists to collections (and vice versa)
- Delimited lists to collections
- Collections to delimited lists
It is a common requirement to be able to take a delimited list of values, say A,B,C,D, and treat this data like it was a set of rows in a
table, or vice versa.
In an ideal world, this should never occur for many reasons. Firstly, the approach introduces a form of unnecessary limitation, i.e. the maximum length of a string, 32K-1 in PL/SQL for a VARCHAR2 etc. and secondly, the data construct is just wrong. You wouldn't store a set of rows in the database as a
delimited string, so why is this any different?
Practically all environments which can interact with an Oracle database, such as VB, C++, Java or, indeed, Oracle Forms, has the concept of an
array, or collection, which is a set of data elements, just like a relational table, and which can be mapped to an equivalent
Oracle collection type.
I have worked on many systems where there is an inherent limitation in functionality simply because of approaches like this. It's as simple as this: If you want to have a system with no logical limitation in the number of data elements passed to a given process, then forget the following mechanisms!
They are simply the wrong way to approach the problem.
Using DBMS_UTILITY.COMMA_TO_TABLE
One method is to take advantage of a DBMS_UTILITY procedure called COMMA_TO_TABLE (there is also a TABLE_TO_COMMA which does the reverse).
Note, DBMS_UTILITY assumes a comma delimiter, if you delimit by something else, you will need to do a REPLACE beforehand.
SQL> DECLARE
2 l_tab dbms_utility.uncl_array;
3 l_tablen number;
4 BEGIN
5 dbms_utility.comma_to_table('A,B,C', l_tablen, l_tab);
6 dbms_output.put_line('TABLE LENGTH : '|| l_tablen);
7 dbms_output.put_line('TABLE COUNT : '|| l_tab.COUNT);
8 for i in 1..l_tablen
9 loop
10 dbms_output.put_line(l_tab(i));
11 end loop;
12 END;
13 /
TABLE LENGTH : 3
TABLE COUNT : 4
A
B
C
PL/SQL procedure successfully completed.
Notice the difference between the .COUNT and the value of the output parameter (l_tablen). Basically, the end of the collection
always has a NULL "extra" row. Basically, always use the value of the output parameter when looping through the collection (or of course .COUNT-1), to ensure you don't experience any unwanted side-effects. Why the extra row? No idea! But, I do stress, read the following important caveat about this approach in general.
*IMPORTANT CAVEAT*
COMMA_TO_TABLE (and the reverse TABLE_TO_COMMA) are not written for this purpose! They both use the NAME_TOKENIZE function (again found in
DBMS_UTILITY). They are written primarily for use within replication internally by Oracle, and parse IDENTIFIERS rather than strings,
and as such have to be valid Oracle object names. Trying to pass numbers, elements greater than 30 characters, reserved words etc. will not
work, i.e.
SQL> DECLARE
2 l_tab dbms_utility.uncl_array;
3 l_tablen number;
4 BEGIN
5 dbms_utility.comma_to_table('1,2,3', l_tablen, l_tab);
6 dbms_output.put_line('TABLE LENGTH : '|| l_tablen);
7 dbms_output.put_line('TABLE COUNT : '|| l_tab.COUNT);
8 for i in 1..l_tablen
9 loop
10 dbms_output.put_line(l_tab(i));
11 end loop;
12 END;
13 /
DECLARE
*
ERROR at line 1:
ORA-00931: missing identifier
ORA-06512: at "SYS.DBMS_UTILITY", line 125
ORA-06512: at "SYS.DBMS_UTILITY", line 160
ORA-06512: at "SYS.DBMS_UTILITY", line 202
ORA-06512: at line 5
SQL> DECLARE
2 l_tab dbms_utility.uncl_array;
3 l_tablen number;
4 BEGIN
5 dbms_utility.comma_to_table('this_is_a_long_element_greater_than_30_characters,B,C', l_tablen, l_tab);
6 dbms_output.put_line('TABLE LENGTH : '|| l_tablen);
7 dbms_output.put_line('TABLE COUNT : '|| l_tab.COUNT);
8 for i in 1..l_tablen
9 loop
10 dbms_output.put_line(l_tab(i));
11 end loop;
12 END;
13 /
DECLARE
*
ERROR at line 1:
ORA-00972: identifier is too long
ORA-06512: at "SYS.DBMS_UTILITY", line 125
ORA-06512: at "SYS.DBMS_UTILITY", line 160
ORA-06512: at "SYS.DBMS_UTILITY", line 202
ORA-06512: at line 5
For this reason alone, I would not recommend using this approach which, while fast (and you may be doing this on elements less than 30 characters,
and hence "get away with it"), it is limited and, frankly, I wouldn't trust it to always stay this way.
Using a Table Function
A nice approach for converting a delimited list of a collection is via a custom PL/SQL Table Function which you can then SELECT from via the TABLE
operator, i.e.
/* Create the output TYPE, here using a VARCHAR2(100) nested table type */
SQL> CREATE TYPE test_type AS TABLE OF VARCHAR2(100);
2 /
Type created.
/* Now, create the function.*/
SQL> CREATE OR REPLACE FUNCTION f_convert(p_list IN VARCHAR2)
2 RETURN test_type
3 AS
4 l_string VARCHAR2(32767) := p_list || ',';
5 l_comma_index PLS_INTEGER;
6 l_index PLS_INTEGER := 1;
7 l_tab test_type := test_type();
8 BEGIN
9 LOOP
10 l_comma_index := INSTR(l_string, ',', l_index);
11 EXIT WHEN l_comma_index = 0;
12 l_tab.EXTEND;
13 l_tab(l_tab.COUNT) := SUBSTR(l_string, l_index, l_comma_index - l_index);
14 l_index := l_comma_index + 1;
15 END LOOP;
16 RETURN l_tab;
17 END f_convert;
18 /
Function created.
/* Prove it works */
SQL> SELECT * FROM TABLE(f_convert('AAA,BBB,CCC,D'));
COLUMN_VALUE
--------------------------------------------------------------------------------
AAA
BBB
CCC
D
4 rows selected.
A disadvantage of this type of approach is that the whole collection is built-up and then passed back to the calling routine. This may have memory
considerations, in which case, you may need to utilise something like pipelined functions.
Using a Pipelined Function
An alternative to a standard PL/SQL function (from 9i onwards) is the use of pipelined functions.
SQL> CREATE TYPE test_type AS TABLE OF VARCHAR2(100);
2 /
Type created.
SQL> CREATE OR REPLACE FUNCTION f_convert2(p_list IN VARCHAR2)
2 RETURN test_type
3 PIPELINED
4 AS
5 l_string LONG := p_list || ',';
6 l_comma_index PLS_INTEGER;
7 l_index PLS_INTEGER := 1;
8 BEGIN
9 LOOP
10 l_comma_index := INSTR(l_string, ',', l_index);
11 EXIT WHEN l_comma_index = 0;
12 PIPE ROW ( SUBSTR(l_string, l_index, l_comma_index - l_index) );
13 l_index := l_comma_index + 1;
14 END LOOP;
15 RETURN;
16 END f_convert2;
17 /
Function created.
SQL> SELECT * FROM TABLE(f_convert('AAA,BBB,CCC,D'));
COLUMN_VALUE
--------------------------------------------------------------------------------
AAA
BBB
CCC
D
4 rows selected.
The advantage of pipelined functions is that the PIPE ROW statement means that the whole collection is not populated all at once, as in
Table Functions. In that respect, it is much more memory efficient.
It is often required to be able to take a list of values from a table, and convert them into a delimited list of
values. There are many ways to skin this particular cat, from pure SQL to PL/SQL or Java stored procedures, to name but three.
I will outline some of the more common methods here.
Using SYS_CONNECT_BY_PATH
If a pure SQL approach is what you require, then you can use SYS_CONNECT_BY_PATH, which is a function which can be used in hierarchical queries, and returns a "path" representation
of the current "row", delimited by a given value. Note, that this does mean that you need some way of "ordering" the data, since you need to use
CONNECT BY.
Note, to highlight the theory, see the intermediate stages. Firstly, set up the data :
SQL> SELECT * FROM x;
A B
--- ----------
A 1
B 2
C 3
Secondly, write a query using SYS_CONNECT_BY_PATH :
SQL> SELECT SYS_CONNECT_BY_PATH(a, ',')
2 FROM x
3 START WITH b = 1
4 CONNECT BY b = PRIOR b + 1;
SYS_CONNECT_BY_PATH(A,',')
--------------------------------------------------------------------------------
,A
,A,B
,A,B,C
OK, so we now understand what SYS_CONNECT_BY_PATH is intended to do, we just need to now get the path corresponding to the last row and trim off
that leading comma. This example uses MAX and LTRIM.
SQL> SELECT
2 MAX(LTRIM(SYS_CONNECT_BY_PATH(a, ','), ',')) str_path
3 FROM x
4 START WITH b = 1
5 CONNECT BY b = PRIOR b + 1
6 /
STR_PATH
--------------------------------------------------------------------------------
A,B,C
Custom Aggregate Functions
In 9i, you can create a custom aggregate function, available now after Oracle published the interface via TYPE methods.
SQL> SELECT *
2 FROM t;
A
----------
X
Y
SQL> CREATE OR REPLACE TYPE string_agg_type AS OBJECT
2 (
3 total VARCHAR2(4000),
4
5 STATIC FUNCTION
6 ODCIAggregateInitialize(sctx IN OUT string_agg_type )
7 RETURN NUMBER,
8
9 MEMBER FUNCTION
10 ODCIAggregateIterate(self IN OUT string_agg_type ,
11 value IN VARCHAR2 )
12 RETURN NUMBER,
13
14 MEMBER FUNCTION
15 ODCIAggregateTerminate(self IN string_agg_type,
16 returnValue OUT varchar2,
17 flags IN number)
18 RETURN NUMBER,
19
20 MEMBER FUNCTION
21 ODCIAggregateMerge(self IN OUT string_agg_type,
22 ctx2 IN string_agg_type)
23 RETURN NUMBER
24 );
25 /
Type created.
SQL> CREATE OR REPLACE TYPE BODY string_agg_type
2 IS
3
4 STATIC FUNCTION ODCIAggregateInitialize(sctx IN OUT string_agg_type)
5 RETURN NUMBER
6 IS
7 BEGIN
8 sctx := string_agg_type( null );
9 RETURN ODCIConst.Success;
10 END;
11
12 MEMBER FUNCTION ODCIAggregateIterate(self IN OUT string_agg_type,
13 value IN varchar2 )
14 RETURN NUMBER
15 IS
16 BEGIN
17 self.total := self.total || ',' || value;
18 RETURN ODCIConst.Success;
19 END;
20
21 MEMBER FUNCTION ODCIAggregateTerminate(self IN string_agg_type,
22 returnValue OUT varchar2,
23 flags IN number)
24 RETURN NUMBER
25 IS
26 BEGIN
27 returnValue := ltrim(self.total,',');
28 return ODCIConst.Success;
29 END;
30
31 MEMBER FUNCTION ODCIAggregateMerge(self IN OUT string_agg_type,
32 ctx2 IN string_agg_type)
33 RETURN NUMBER
34 IS
35 BEGIN
36 self.total := self.total || ctx2.total;
37 return ODCIConst.Success;
38 END;
39
40 end;
41 /
Type body created.
SQL> CREATE or replace FUNCTION stragg(input VARCHAR2)
2 RETURN varchar2
3 PARALLEL_ENABLE AGGREGATE USING string_agg_type;
4 /
Function created.
Now, you can use this new aggregate function to perform the operations you require, i.e.
SQL> SELECT stragg(a)
2 FROM t;
STRAGG(A)
-------------------------
X,Y