REGEX to Split a Comma-Separated String into Rows

September 3, 2018

in SQL

REGEX to Split String into Rows

A typical scenario that involves splitting a comma-separated string into rows:

  • in a UI-driven query, the user would enter a list of IDs into an input box, and
  • the application should retrieve details for the input list of IDs

Search Input boxThe count of IDs in the list can vary — the user may enter a single value or multiple, and the underlying SQL should fetch the details for all of them.

This sounds simple enough on the face of it: bind a parameter to a comma-separated list of values such as ‘A1,A2,A4’ and then look for corresponding rows using SQL for those IDs.

There’s a catch though.

Value in String != Value in List

Let’s say this is the table data:

SQL> select * from customer;

ID NAME
-- --------------------
A1 Jim
A2 Michael
A3 Pam
A4 Kevin
A5 Angela

…and we need to get the data for ids A1,A2 and A4 from the table.

The user would enter A1,A2,A4 into the search screen and the application would apply it in SQL to match the data in the table.

Directly plugging the values from the search screen’s input box into the query does not work:

SQL> -- Customers with key filter
SQL> -- Wrong SQL!
SQL> select *
  2  from customer
  3  where id in ('A1,A2,A4');

no rows selected

The SQL should instead be looking into a set of discrete values via the IN clause, as in…

SQL> -- Customers with key filter
SQL> -- That's the SQL we want!
SQL> select *
  2  from customer
  3  where id in ('A1','A2','A4');

ID NAME
-- --------------------
A1 Jim
A2 Michael
A4 Kevin

The Classic String to Rows Conundrum

What was entered into the input box was a string: ‘A1, A2, A4’.

‘A1, A2, A4’ as-is in SQL IN will be interpreted as one unit of text — not as a list of discrete values ‘A1’, ‘A2’, and ‘A4’.

We need a way to convert the comma-separated text to rows.
How do we do it?
There are several ways. This article shows you a simple one, using a combination of REGEX_SUBSTR and CONNECT BY.

SQL: String to Rows Solution

The SQL that does get the result we seek:

-- Customers with key filter
-- REGEXP_SUBSTR and CONNECT BY to convert string to rows
select  *
from customer 
where id in
  (select regexp_substr('A1,A2,A4','[^,]+', 1, level) 
   from dual 
   connect BY regexp_substr('A1,A2,A4', '[^,]+', 1, level) 
   is not null);

When run:

SQL> -- Customers with key filter
SQL> -- REGEXP_SUBSTR and CONNECT BY to convert string to rows
SQL> select  *
  2  from customer
  3  where id in
  4    (select regexp_substr('A1,A2,A4','[^,]+', 1, level)
  5     from dual
  6     connect BY regexp_substr('A1,A2,A4', '[^,]+', 1, level)
  7     is not null);

ID NAME
-- --------------------
A1 Jim
A2 Michael
A4 Kevin

The trick is in the subquery that converts the input string ‘A1, A2, A4’ to the list of discrete values ‘A1’, ‘A2’, and ‘A4’.

Let’s examine the solution more closely.

String to Rows Solution: A Closer Look

The building blocks of the solution are:

(1) REGEX_SUBSTR for pattern matching

REGEXP_SUBSTR is a regular expression function in Oracle that finds a substring matching a regular expression pattern in a given text.

Syntax:
regex_substr(text, pattern, position, occurrence)

where

  • text: Source string in which the pattern is to be searched.
    ‘A1, A2, A3’ in this example.
  • pattern: regular expression that matches the substring to be returned.
    [^,]+ in this example, which matches any character sequence in the text that does not contain a comma.
  • position:  Character of text where Oracle should begin the search. In this example, we will begin the search from 1.
  • occurrence:  Nth occurrence of pattern in source string to be returned.

So for example, to get the 2nd occurrence of pattern in text which does not contain a comma:

SQL> -- REGEXP_SUBSTR to pick the 2nd value from string
SQL> select regexp_substr('A1,A2,A4','[^,]+', 1, 2)
  2  from dual;

RE
--
A2

To convert the entire string to rows, *all* occurrences of the substrings within commas are to be returned — so we need a way to iterate through 1..last occurrences of substrings in the SQL. That’s where CONNECT BY comes into play…

(2) CONNECT BY for 1..N row generation

Recursive CONNECT BY along with LEVEL can be used as a basic 1..N row generator.

The value returned by level can be used to indicate occurrence in REGEX_SUBSTR.

SQL> -- 1..N generator using CONNECT BY
SQL> select level
  2  from dual
  3  connect by level <= 4;

     LEVEL
----------
         1
         2
         3
         4

Putting both together, we get…

REGEX_SUBSTR + CONNECT BY for delimiter-separated string —> rows conversion

SQL> -- REGEXP_SUBSTR and CONNECT BY to convert string to rows
SQL> select regexp_substr('A1,A2,A4','[^,]+', 1, level)
  2  from dual
  3  connect BY regexp_substr('A1,A2,A4', '[^,]+', 1, level)
  4  is not null;

REGEXP_SUBSTR('A1,A2,A4'
------------------------
A1
A2
A4

Summary

A delimiter-separated string can be converted to a set of rows in Oracle SQL, with the combination of the regex function REGEX_SUBSTR and recursion via CONNECT BY. This feature can be used for splitting a single input string with comma-separated query parameters, into a list of values.

and

{ 8 comments… read them below or add one }

1 ronald August 29, 2019 at 5:34 pm

Good tutorial
very easy to understand the use of the syntax
Thank you very much

2 Zabih May 3, 2020 at 7:12 pm

thanks.
it is what i need exactly… 🙂

3 Virendra July 23, 2020 at 7:19 pm

Exactly what was needed !
Detailed, accurate and perfect explanation. Thank you for sharing this. May you keep posting such amazing tutorials. (y) 🙂

4 Hayden Hudson January 8, 2021 at 9:41 pm

If you have APEX 5.1 or higher installed in the your database, you can avoid using regex or connect by, using apex_string.split :

select column_value
from table(apex_string.split(‘A1,A2,A4’, ‘,’))

5 Raja March 24, 2021 at 5:03 pm

I tried the below query with 25000 characters, then I got the below error, can you please help me
ORA-01460: unimplemented or unreasonable conversion requested
01460. 00000 – “unimplemented or unreasonable conversion requested”
*Cause:
*Action:

SQL> select regexp_substr(:p,'[^,]+’, 1, level)
from dual
connect BY regexp_substr(:p, ‘[^,]+’, 1, level)
is not null;

6 Manisha Padhy March 25, 2021 at 11:05 am

Is there a way where instead of static string we can use another sql query to fetch the string from a different table.

7 Abhijeet Patil July 27, 2021 at 10:07 pm

I have STRING values as below:
‘AB,BC,CD,DE, EF,FE’

I want to convert it into
AB
BC
CD
DE, EF –Note the space after comma
FE

Please note the value in main string has data as DE, EF (one space between DE and EF) which is getting converted as below with my REGEXP_SUBSTR
I am getting
AB
BC
CD
DE
EF —Note the initial space
FE

Regex I used:
SELECT REGEXP_SUBSTR(‘AB,BC,CD,DE, EF,FE’, ‘[^,]+’, 1, LEVEL) FROM DUAL
CONNECT BY REGEXP_SUBSTR(‘AB,BC,CD,DE, EF,FE’, ‘[^,]+’, 1, LEVEL) IS NOT NULL;

8 Naresh March 17, 2022 at 3:26 pm

It is taking too long time if we amend where condition along with connect by

Leave a Comment

Previous post:

Next post: