Hierarchical Queries: A QuickStart Guide

June 28, 2010

in Keywords, Pseudocolumns, SQL

Hierarchical Queries in Oracle

A hierarchical query is one that works on data with a "tree" relationship.

An Example of Hierarchical Data
The employee-manager relationship in SCOTT’s famous EMP table. Each employee record has a manager’s id associated with it. In effect, there is a "tree" of data relationships

At each level, I can get the employee-manager relationship as below:

SQL> select empno, ename, mgr mgrno,
  2        (select e2.ename from emp e2
  3         where e1.mgr = e2.empno) mgrname
  4  from emp e1
  5  /

     EMPNO ENAME         MGRNO MGRNAME
---------- -------- ---------- ----------
      7369 SMITH          7902 FORD
      7499 ALLEN          7698 BLAKE
      7521 WARD           7698 BLAKE
      7566 JONES          7839 KING
      7654 MARTIN         7698 BLAKE
      7698 BLAKE          7839 KING
      7782 CLARK          7839 KING
      7788 SCOTT          7566 JONES
      7839 KING
      7844 TURNER         7698 BLAKE
      7876 ADAMS          7788 SCOTT
      7900 JAMES          7698 BLAKE
      7902 FORD           7566 JONES
      7934 MILLER         7782 CLARK

14 rows selected.

But what if I want to select –

  1. the entire hierarchy of employees under JONES?
  2. the line of reporting above MILLER, till the topmost manager?

These queries require a special way of traversing through the "tree". The next sections introduces you to the SQL keywords that you will need to write such a query, and finally the query.

Processing Hierarchical Data: START WITH.CONNECT BY

The START WITH.CONNECT BY clause can select hierarchical data, to answer the above two requirements.

The structure of a hierarchical query is:

SELECT <data>
FROM <table>
START WITH <root>
CONNECT BY PRIOR <condition>

START WITH specifies the topmost node of the tree or in other words, where to start parsing from.

CONNECT BY specifies the relationship between the parent and the child.

PRIOR achieves the recursive condition to traverse the tree.

Let’s see the query in action.

SQL to get the entire hierarchy below a node

In the EMP table example, the query to get the entire hierarchy of employees under JONES:

SQL> select empno, ename
  2     , sys_connect_by_path(ename, ' -> ') path
  3  from emp
  4  start with ename = 'JONES'    -- the root node
  5  connect by prior empno = mgr  -- the recursive condition
  6  / 

     EMPNO ENAME    PATH
---------- -------- -----------------------------------------
      7566 JONES     -> JONES
      7788 SCOTT     -> JONES -> SCOTT
      7876 ADAMS     -> JONES -> SCOTT -> ADAMS
      7902 FORD      -> JONES -> FORD
      7369 SMITH     -> JONES -> FORD -> SMITH

SYS_CONNECT_BY_PATH(column, char) returns the path of a column value from root to node. The column values in the path are separated by the "char"  specified in the SQL, such as -> in this case.

SQL to get the entire hierarchy above a node

In the EMP table example, the query to get the line of reporting above MILLER, till the topmost manager:

SQL> select sys_connect_by_path(ename, ' -> ') line_of_reporting
  2  from emp
  3  start with ename = 'MILLER'    -- the root node
  4* connect by prior mgr = empno   -- the recursive condition
SQL> / 

LINE_OF_REPORTING
------------------------------------------------------------
 -> MILLER
 -> MILLER -> CLARK
 -> MILLER -> CLARK -> KING

Almost like the previous one with a crucial difference: the connect by prior condition is reversed.

A Memory Aid

To get the reporting hierarchy below an employee, should the condition be written as

connect by prior mgr = empno;

or

connect by prior empno = mgr?

This will not be confusing once you commit this pneumonic to memory:

When the tree to be queried has a heavy bottom (i.e. reporting hierarchy below an employee, starts with one node and branches to many), then the heavy side must be put later.

By this logic, the condition for querying the hierarchy below a node is: connect by prior empno = mgr

Summary

This article covers:

  • The meaning of hierarchical data
  • The type of query to process hierarchical data
  • Keywords START WITH.CONNECT BY and others that feature in hierarchical queries
  • SQL to get all nodes below a specific node in a hierarchy
  • SQL to get all nodes above a specific node in a hierarchy

In the next article, we’ll talk about the LEVEL pseudocolumn in hierarchical queries.

Photo by shapeshift

Leave a Comment

Previous post:

Next post: