Regular Expressions in Oracle Made Easy

April 13, 2010

in SQL

What is a regular expression?

A regular expression (also called regex or regexp for short) is a sequence of characters that describes a pattern in text.

Regular Expressions in Oracle

Some examples of regular expressions:

p..t => A dot stands for a single character. This regular expression will match words that start with a ‘p’, end with a ‘t’ and have any two characters in between them (since there are two dots within).

So, it will match words like ‘part’, ‘post’, ‘pact’, but not ‘cost’, ‘pat’, ‘pull’.

p*t = A * matches zero or more occurrences of the previous subexpression. This will match pt, ppt, pppt, and so on, but not ptt, pct.

See the page on Metacharacters Supported in Regular Expressions to get the full set of metacharacters.

What role do regular expressions play in Oracle?

Regular expressions are a powerful tool for selecting text of a certain pattern. Sure we have LIKE in SQL queries, but that’s hardly adequate. LIKE can get you all phone numbers containing the string “445”, but can it get you all phone numbers of the format “(DDD) DDD-DDDD”?

LIKE cannot. At least not as simply and intuitively as regular expressions can. Let’s see how.

Case Study: Contact details validation

Consider a table that stores emails and phone numbers of superheroes (we can but dream).

SQL> select * from t_superheros;

 ID NAME                   EMAIL                           PHONE
--- ------------- ------------------ ---------------
 1 Superman           superman@kmail.com (655) 981-4455
 2 Batman               batmankmail.com       6a5-981-4455
 3 Spiderman         spiderman@com           657-8154455
 4 Flash Gordon   flashgordon@com       (99) 981-4455
 5 Phantom             phantom@kmail.com   (999) 981-4455

Now, the expected phone number format is “(DDD) DDD-DDDD” but only Superman and Phantom have their phone numbers in that format. Batman’s has a letter instead of a digit, Spiderman’s is missing brackets and -, Flash Gordon’s is a number short within the brackets.

How can such invalid formats be tracked? Let’s construct the regexp for it.

To match a digit, we use \d. To match a fixed count of digits, we add the count in braces immediately after. So, DDD will be represented by \d {3}.

To match ( ), we need to use the escape character “\” before ( and ), to indicate that the bracket is used literally and not as a grouping expression. So, to match (DDD) we need the regexp \(\d{3}\).

Finally, to say that the entire string has to be the phone number in this format and nothing else, we put the expression within the markers ^ (start of line) and $ (end of line).

Putting it all together, the regular expression for “(DDD) DDD-DDDD” turns out to be ‘^\(\d{3}\) \d{3}-\d{4}$’. Use this in the query and – bingo!

SQL> select *
 2   from t_superheros
 3   where regexp_like
 4               (phone
 5               ,'^\(\d{3}\) \d{3}-\d{4}$');

 ID NAME                   EMAIL                           PHONE
--- ------------- ------------------ ---------------
 1 Superman           superman@kmail.com (655) 981-4455
 5 Phantom             phantom@kmail.com   (999) 981-4455

Exercise: In the table above, the email ids are not all in correct format either. How will you validate emailid with a regular expression?

For further reading

Photo by cackhanded

{ 4 comments… read them below or add one }

1 Prashant Pawar November 3, 2011 at 12:46 pm

Very Useful thing….with good explanation…Thanks

2 Cornelia May 6, 2014 at 2:04 am

It’s a plseuare to find someone who can identify the issues so clearly

3 Rithu August 12, 2015 at 5:25 pm

Good Explanation. Thx u so much

4 sachin mishra April 28, 2016 at 4:40 pm

I really liked the way you explain the whole concept. It is Very lucid and helpful.
thank you.

Leave a Comment

Previous post:

Next post: