An approach for comparing two tables for differences in data, using a pure SQL solution:

In a recent project, a shadow table had to be compared periodically with its main source table to identify the differences between the two tables.

The nature of differences fell into one of these buckets:

**Insert Required [INS]:**If the source had rows not present in the shadow table**Update Required [UPD]:**If one or more columns in a row (identified by a primary key) had changed values**Delete Required [DEL]:**If the shadow table had rows not present in the source table

Comparing two tables for differences (INS/UPD/DEL) needed a different solution from a simple MERGE SQL statement. Here an upsert was not to be executed, only the nature of differences to be identified.

I came upon this post on Data Detection in Oracle SQL, which outlines a number of good solutions for similar use cases in a data warehouse environment.

We followed the DECODE method – the key reasons for the choice were its simplicity and elegance. New Oracle developers were able to quickly understand and support this approach.

Here’s a detailed breakdown of the solution, with sample SOURCE and TARGET table scripts that you could copy to test out in your own environment.

### Case Study Setup Scripts

#### 1. Create tables SOURCE and TARGET

-- SOURCE: Main table against which -- the shadow table is to be compared create table source ( c1 varchar2(2) -- Primary Key , c2 varchar2(2) , c3 varchar2(2) ); -- TARGET: Shadow table which is to -- be compared against SOURCE create table target ( c1 varchar2(2) -- Primary Key , c2 varchar2(2) , c3 varchar2(2) );

#### 2. Populate data

Using INSERT ALL

-- Data for main table SOURCE insert all into source(c1,c2,c3) values('1','x','y') into source(c1,c2,c3) values('2','xx','y') into source(c1,c2,c3) values('3','x','y') select * from dual; -- Data for shadow table TARGET insert all into target(c1,c2,c3) values('1','x','y') into target(c1,c2,c3) values('2','x','y') into target(c1,c2,c3) values('4','x','yy') select * from dual;

What we have now is representative data in which

- One row has a changed value in SOURCE and needs to be updated in TARGET
*[UPD]* - One row is new in SOURCE and needs to be inserted into TARGET
*[INS]* - One row is missing in SOURCE and needs to be deleted from TARGET
*[DEL]*

### Compare two tables for differences: SQL Solution

A single SQL can give us the required result:

select case when t.c1 is null then 'INS' when s.c1 is null then 'DEL' else 'UPD' end dml_flag , nvl (s.c1, t.c1) c1 , s.c2, s.c3 from source s full join target t on (s.c1 = t.c1) where decode (s.c2, t.c2, 0, 1) + decode (s.c3, t.c3, 0, 1) > 0;

When run:

SQL> select case 2 when t.c1 is null then 'INS' 3 when s.c1 is null then 'DEL' 4 else 'UPD' 5 end dml_flag 6 , nvl (s.c1, t.c1) c1 7 , s.c2, s.c3 8 from source s 9 full join target t 10 on (s.c1 = t.c1) 11 where decode (s.c2, t.c2, 0, 1) 12 + decode (s.c3, t.c3, 0, 1) > 0; DML_FLAG C1 C2 C3 --------- -- -- -- UPD 2 xx y DEL 4 INS 3 x y

### Understanding the SQL solution step-by-step

Read on for help with understanding how the SQL above works and how to apply it to your situation…

#### Step 1: FULL JOIN

The SQL first does a** full join** between SOURCE and TARGET based on the Primary Key (C1). A full join returns rows from both tables, with nulls if a match is not found in the other table.

This is the result of a basic full join between SOURCE and TARGET with no other filters/conditions:

#### Step 2: Filter out unchanged rows

After the full join, the SQL filters out rows in which no change is detected. This is done by

I. applying a **DECODE** on *every corresponding attribute* *pair* in SOURCE and TARGET – to give 0 if no change in value, 1 if there is a change – using this format:

DECODE (.[columnX] , [target].[columnX] , 0 –- Value if .[columnX] = [target.columnX] , 1)

For N columns to be compared, there would be N such DECODES.

II. **adding up** the result of DECODEs

III. **checking the final sum** of the result of DECODEs against zero. A non-zero sum means that at least one attribute showed a variation in values between SOURCE and TARGET, and so the row qualifies for update.

This SQL gives us a listing of rows that have some sort of difference (INS/UPD/DEL) in data between SOURCE and TARGET:

select * from source s full join target t on (s.c1 = t.c1) where decode (s.c2, t.c2, 0, 1) + decode (s.c3, t.c3, 0, 1) > 0;

When run:

SQL> select * 2 from source s 3 full join target t 4 on (s.c1 = t.c1) 5 where decode (s.c2, t.c2, 0, 1) 6 + decode (s.c3, t.c3, 0, 1) > 0; C1 C2 C3 C1 C2 C3 -- -- -- -- -- -- 2 xx y 2 x y 4 x yy 3 x y

#### Step 3: Derive DML_FLAG

The SQL finally applies a searched CASE expression on the SQL formed in Step 2, to derive the value of DML_FLAG:

- If the full join found no rows in TARGET, then DML_FLAG = INS
- If the full join found no rows in SOURCE, then DML_FLAG = DEL
- If the full join found rows in both TARGET and SOURCE, then the row can only be showing up in the Step 2 SQL if at least one attribute has a variation in values. That is, DML_FLAG = UPD.

Putting it all together gives us the final SQL:

select case when t.c1 is null then 'INS' when s.c1 is null then 'DEL' else 'UPD' end dml_flag , nvl (s.c1, t.c1) c1 , s.c2, s.c3 from source s full join target t on (s.c1 = t.c1) where decode (s.c2, t.c2, 0, 1) + decode (s.c3, t.c3, 0, 1) > 0;

When run:

SQL> select case 2 when t.c1 is null then 'INS' 3 when s.c1 is null then 'DEL' 4 else 'UPD' 5 end dml_flag 6 , nvl (s.c1, t.c1) c1 7 , s.c2, s.c3 8 from source s 9 full join target t 10 on (s.c1 = t.c1) 11 where decode (s.c2, t.c2, 0, 1) 12 + decode (s.c3, t.c3, 0, 1) > 0; DML_FLAG C1 C2 C3 --------- -- -- -- UPD 2 xx y DEL 4 INS 3 x y

{ 0 comments… add one now }