DISTINCT and How NOT To Use It

March 15, 2011

in Joins, Performance, SQL, Subqueries

DISTINCT Keyword in Oracle

The DISTINCT keyword placed next to SELECT restricts the result to unique rows from a query.

DISTINCT is also a much abused keyword, often used as a quick fix to bad queries. Take this example from Oracle Applications:

-- Problem
SELECT p.party_id
     , p.party_name
FROM hz_parties p
   , hz_cust_accounts c
WHERE c.party_id=p.party_id
AND upper(p.party_name) LIKE 'BUSINESS WOR%'

  PARTY_ID PARTY_NAME
---------- ------------------
      4429 Business World
      4429 Business World

You see that the query returns two rows. Since you’re only interested in one unique answer, how would you correct the query?

The lazy developer approach:

-- Wrong Approach
SELECT DISTINCT p.party_id
     , p.party_name
FROM hz_parties p
   , hz_cust_accounts c
WHERE c.party_id=p.party_id
AND upper(p.party_name) LIKE 'BUSINESS WOR%'

PARTY_ID   PARTY_NAME
---------- ------------------
      4429 Business World

Some developers simply stick a DISTINCT in, to suppress duplicate rows from an incorrect query. But is this a wise fix?

DISTINCT Abuse and its Consequences

You may have got the right answer with the lazy developer approach BUT:

  • Your query will not perform optimally – DISTINCT causes an additional sort of data.
  • It is possible that one of the joins in the query needs to be an EXISTS subquery, or the join condition is incomplete, or the data model is faulty. This needs to be investigated and the root cause fixed. By using DISTINCT, you are ignoring the problem, not resolving it.

How to Correct The Query Above

That query happened to need an EXISTS subquery in place of a join. The following modified query would work correctly, returning only unique rows:

-- Correct Approach
SELECT p.party_id
     , p.party_name 
FROM hz_parties p
WHERE upper(p.party_name) LIKE 'BUSINESS WOR%'
AND EXISTS ( SELECT null 
             FROM hz_cust_accounts c
             WHERE  c.party_id=p.party_id)

  PARTY_ID PARTY_NAME
---------- ------------------
      4429 Business World

Summary

DISTINCT makes a query return unique rows only, and is often used injudiciously to suppress duplicate rows being returned by a bad query. Don’t use DISTINCT to cover up errors. Fix the problem at the source: run the query without DISTINCT, investigate why the duplicates occur and correct it.

{ 3 comments… read them below or add one }

1 xavier August 26, 2011 at 10:27 pm

Makes sense. So is there a right time to use ‘Distinct’?

2 oratabler August 26, 2011 at 10:48 pm

When we actually want to list unique rows, and the data in question may be repeated.

e.g. To list all the database usernames that own objects:

select distinct owner from all_objects;

In a database with thousands of objects, skipping “distinct” would mean some painful sifting through the result set.

3 Trisha November 19, 2011 at 2:21 pm

hi,
I have a doubt on joining tables. Suppose , in a table there are some 5 rows , and in another table 0 rows ( no records) , then wht shud be the output of cartisian product of the two tables ?
5 or 25 or null?

Leave a Comment

Previous post:

Next post: