Flatten Hierarchical Data using a Single SQL

August 25, 2014

in Pseudocolumns, SQL

Hierarchical Data in a Relational Table

Oracle has some neat ways of handling hierarchical data.

A single Oracle SQL can find the entire tree of data above or below a node in a hierarchy using the CONNECT BY clause.

What if the requirement is to flatten hierarchical data?

For example, given a table containing the employee reporting hierarchy of an organization (image alongside), get a single SQL to return the 4 types of employee roles as columns:

  • PROJECT_MANAGER (Level 1)
    • DBA (Level 2)
    • TEAM_LEAD (Level 2)
      • TEAM_MEMBER (Level 3)

This is trickier than transposing rows as columns in a non-hierarchical table, and needs a little extra to get the result. But it is nowhere near as tough as it appears. No multiple self-joins, no recursive CTE.

This post will show you how to flatten hierarchical data using a single SQL in Oracle.

To build the SQL, let’s populate the hierarchical table EMPL with some data, as below. Hierarchical Data in a Relational Table
Simple SELECT on the data:

SQL> -- Data in EMPL table
SQL> select * from empl;

EMP_NAME         EMP_ROLE        MANAGER_EMP_NA
---------------- --------------- --------------
Peter Matthews   PROJECT_MANAGER
Tom Ledford      TEAM_LEAD       Peter Matthews
Timothy Leigh    TEAM_LEAD       Peter Matthews
David Botham     DBA             Peter Matthews
Trevor Malcolm   TEAM_MEMBER     Tom Ledford
Toby Mayer       TEAM_MEMBER     Tom Ledford
Trevor McDermott TEAM_MEMBER     Timothy Leigh
Thor Martin      TEAM_MEMBER     Timothy Leigh

9 rows selected.

The expected result:

PROJECT_MANAGER DBA             TEAM_LEAD       TEAM_MEMBER
--------------- --------------- --------------- ---------------
Peter Matthews
Peter Matthews                  Tom Ledford
Peter Matthews                  Tom Ledford     Trevor Malcolm
Peter Matthews                  Tom Ledford     Toby Mayer
Peter Matthews                  Timothy Leigh
Peter Matthews                  Timothy Leigh   Trevor McDermott
Peter Matthews                  Timothy Leigh   Thor Martin
Peter Matthews  David Botham

We get to the expected result using a combination of SYS_CONNECT_BY_PATH and regular expressions.

The Magic SQL

SQL> select emplevel1 project_manager
  2  , decode(rolelevel2, 'DBA', emplevel2) dba
  3  , decode(rolelevel2, 'TEAM_LEAD', emplevel2) team_lead
  4  , emplevel3 team_member
  5  from
  6    (
  7     select
  8       regexp_substr(role_path, '[^/]+', 1, 1) rolelevel1
  9     , regexp_substr(emp_path, '[^/]+', 1, 1) emplevel1
 10     , regexp_substr(role_path, '[^/]+', 1, 2) rolelevel2
 11     , regexp_substr(emp_path, '[^/]+', 1, 2) emplevel2
 12     , regexp_substr(role_path, '[^/]+', 1, 3) rolelevel3
 13     , regexp_substr(emp_path, '[^/]+', 1, 3) emplevel3
 14     from
 15       (
 16       select
 17         sys_connect_by_path (emp_role, '/') role_path
 18       , sys_connect_by_path (emp_name, '/') emp_path
 19       from empl
 20       start with manager_emp_name is null
 21         connect by prior
 22         emp_name = manager_emp_name
 23        )
 24     );

PROJECT_MANAGER DBA             TEAM_LEAD       TEAM_MEMBER
--------------- --------------- --------------- ---------------
Peter Matthews
Peter Matthews                  Tom Ledford
Peter Matthews                  Tom Ledford     Trevor Malcolm
Peter Matthews                  Tom Ledford     Toby Mayer
Peter Matthews                  Timothy Leigh
Peter Matthews                  Timothy Leigh   Trevor McDermott
Peter Matthews                  Timothy Leigh   Thor Martin
Peter Matthews  David Botham

8 rows selected.

Making Sense of That SQL

1. The first step is to concatenate hierarchical data in a single “/”-delimited string. This gives us:

SQL>-- Concatenate hierarchical data
SQL>-- in a single /-delimited string
SQL>select
  2   sys_connect_by_path (emp_role, '/') role_path
  3 , sys_connect_by_path (emp_name, '/') emp_path
  4  from empl
  5  start with manager_emp_name is null
  6  connect by prior
  7    emp_name = manager_emp_name;

ROLE_PATH                              EMP_PATH
-------------------------------------- -----------------------------------------
/PROJECT_MANAGER                       /Peter Matthews
/PROJECT_MANAGER/TEAM_LEAD             /Peter Matthews/Tom Ledford
/PROJECT_MANAGER/TEAM_LEAD/TEAM_MEMBER /Peter Matthews/Tom Ledford/Trevor Malcolm
/PROJECT_MANAGER/TEAM_LEAD/TEAM_MEMBER /Peter Matthews/Tom Ledford/Toby Mayer
/PROJECT_MANAGER/TEAM_LEAD             /Peter Matthews/Timothy Leigh
/PROJECT_MANAGER/TEAM_LEAD/TEAM_MEMBER /Peter Matthews/Timothy Leigh/Trevor McDermott
/PROJECT_MANAGER/TEAM_LEAD/TEAM_MEMBER /Peter Matthews/Timothy Leigh/Thor Martin
/PROJECT_MANAGER/DBA                   /Peter Matthews/David Botham

9 rows selected.

So employee names are concatenated in order of reporting hierarchy, as emp_path = emp1/emp2/…/empN, where 1 is the topmost node in the hierarchy. Similarly, the reporting path is role_path = role1/role2/…/roleN

2. Next, separate out the elements from the string using regular expressions. The Nth match of “/” in the strings role_path and emp_path   finds the values for text in rolelevelN and emplevelN.

SQL> -- Regexp to separate out discrete roles and
SQL> -- employees, by level
SQL> select
  2    regexp_substr(role_path, '[^/]+', 1, 1) rolelevel1
  3  , regexp_substr(emp_path, '[^/]+', 1, 1) emplevel1
  4  , regexp_substr(role_path, '[^/]+', 1, 2) rolelevel2
  5  , regexp_substr(emp_path, '[^/]+', 1, 2) emplevel2
  6  , regexp_substr(role_path, '[^/]+', 1, 3) rolelevel3
  7  , regexp_substr(emp_path, '[^/]+', 1, 3) emplevel3
  8  from
  9  (
 10    -- Concatenate hierarchical data
 11    -- in a single /-delimited string
 12    select
 13      sys_connect_by_path (emp_role, '/') role_path
 14    , sys_connect_by_path (emp_name, '/') emp_path
 15    from empl
 16    start with manager_emp_name is null
 17    connect by prior
 18      emp_name = manager_emp_name
 19  );

ROLELEVEL1      EMPLEVEL1       ROLELEVEL2 EMPLEVEL2     ROLELEVEL3  EMPLEVEL3
--------------- --------------- ---------- ------------- ----------- ----------------
PROJECT_MANAGER Peter Matthews
PROJECT_MANAGER Peter Matthews  TEAM_LEAD  Tom Ledford
PROJECT_MANAGER Peter Matthews  TEAM_LEAD  Tom Ledford   TEAM_MEMBER Trevor Malcolm
PROJECT_MANAGER Peter Matthews  TEAM_LEAD  Tom Ledford   TEAM_MEMBER Toby Mayer
PROJECT_MANAGER Peter Matthews  TEAM_LEAD  Timothy Leigh
PROJECT_MANAGER Peter Matthews  TEAM_LEAD  Timothy Leigh TEAM_MEMBER Trevor McDermott
PROJECT_MANAGER Peter Matthews  TEAM_LEAD  Timothy Leigh TEAM_MEMBER Thor Martin
PROJECT_MANAGER Peter Matthews  DBA        David Botham

8 rows selected.

In this data set, we have a decision to make at Level 2, since the DBA as well as the TEAM_LEAD report to the PROJECT_MANAGER. Level 1 can always be taken as PROJECT_MANAGER, Level 3 as TEAM_MEMBER.

As with PIVOT queries on non-hierarchical data, here too we need to be sure that the column headers of flattened data are fixed.

3. Finally, make an inline view of the SQL above, and DECODE on rolelevelN to choose the right column headers.

SQL> -- Transposed columns by rolelevelN
SQL> select emplevel1 project_manager
  2  , decode(rolelevel2, 'DBA', emplevel2) dba
  3  , decode(rolelevel2, 'TEAM_LEAD', emplevel2) team_lead
  4  , emplevel3 team_member
  5  from
  6    (
  7     -- Regexp to separate out discrete roles and
  8     -- employees, by level
  9     select
 10       regexp_substr(role_path, '[^/]+', 1, 1) rolelevel1
 11     , regexp_substr(emp_path, '[^/]+', 1, 1) emplevel1
 12     , regexp_substr(role_path, '[^/]+', 1, 2) rolelevel2
 13     , regexp_substr(emp_path, '[^/]+', 1, 2) emplevel2
 14     , regexp_substr(role_path, '[^/]+', 1, 3) rolelevel3
 15     , regexp_substr(emp_path, '[^/]+', 1, 3) emplevel3
 16     from
 17       (
 18       -- Concatenate hierarchical data
 19       -- in a single /-delimited string
 20       select
 21         sys_connect_by_path (emp_role, '/') role_path
 22       , sys_connect_by_path (emp_name, '/') emp_path
 23       from empl
 24       start with manager_emp_name is null
 25         connect by prior
 26         emp_name = manager_emp_name
 27        )
 28     );

PROJECT_MANAGER DBA             TEAM_LEAD       TEAM_MEMBER
--------------- --------------- --------------- ---------------
Peter Matthews
Peter Matthews                  Tom Ledford
Peter Matthews                  Tom Ledford     Trevor Malcolm
Peter Matthews                  Tom Ledford     Toby Mayer
Peter Matthews                  Timothy Leigh
Peter Matthews                  Timothy Leigh   Trevor McDermott
Peter Matthews                  Timothy Leigh   Thor Martin
Peter Matthews  David Botham

8 rows selected.

Leave a Comment

Previous post:

Next post: