I'm trying to find a list of twins in my source database - reason being, twins share a surname, a date of birth, and a postcode. Unfortunately, those are exactly the criteria that the destination database uses to identify and reject duplicates <sigh>. So it's handy to have a list of twins beforehand and do some manual jiggerypokery to make things work as desired.
(In an ideal world, perhaps our skilled team of Oracle experts would modify the destination database so that it considered the forename as well, and said to itself "Hey, twins! Maybe I won't reject this guy after all." But that puts us into a world of development changes and testing and pain - it's easier and quicker to leave the damn thing alone and manually wrestle with the input data.)
So the first step is to identify surnames which occur more than once in the data. Notice that I'm grouping by Surname and Date of Birth:
Select mb1.surname
From tblbods mb1
Group BY mb1.surname, mb1.dob
having count(mb1.surname) > 1
Which gives us:
Surname |
SMITH |
JONES |
FORTESCUE-SMYTHE |
MURPHY |
MCDONALD |
PATEL |
Actually it gives us an awful lot more than this - there are 200,000 records in the table. But I thought you would get bored if I listed more, plus I have had to change all the names and dates of birth so as to anonymise the data.
Second step is to do the same for dates of birth which occur more than once.
Select mb2.dob
From tblbods mb2
Group BY mb2.surname, mb2.dob
having count(mb2.DOB) > 1
DoB |
10/05/1993 |
11/06/1996 |
11/06/2007 |
25/12/2005 |
11/07/2010 |
14/02/2001 |
And then finally I want to select data where the surname is in the list of surnames from Step 1, and the date of birth is in the list of DoBs from Step 2.
Select
mb.BodID, mb.Title, mb.Forename, mb.Surname,
mb.relationship, mb.sex as Gender, mb.DoB
from
-- Table of individual members
tblbods mb
where mb.surname in
(
-- Step 1 - surnames which occur more than once
Select mb1.surname
From tblbods mb1
Group BY mb1.surname , mb1.dob
having count(mb1.surname) > 1
)
and mb.dob in
(
-- Step 2 - Dates of Birth which occur more than once
Select mb2.dob
From tblbods mb2
Group BY mb2.surname, mb2.dob
having count(mb2.DOB) > 1
)
And the end result is:
BodID | Title | Forename | Surname | relationship | Gender | DoB |
523189 | MISS | MARIA | SMITH | Child | F | 11/07/2010 |
523190 | MISS | ALANA | SMITH | Child | F | 11/07/2010 |
531637 | MR | JASPER | FORTESCUE-SMYTHE | Child | M | 14/02/2001 |
531638 | MR | SIMON | FORTESCUE-SMYTHE | Child | M | 14/02/2001 |
466662 | MASTER | RAHUL | PATEL | Child | M | 10/05/1993 |
466664 | MISS | SAMIRA | PATEL | Child | F | 10/05/1993 |
519007 | MASTER | JOSEPH | MURPHY | Child | M | 11/06/2007 |
519008 | MISS | ANNETTE | MURPHY | Child | F | 11/06/2007 |
339731 | MISS | PATRICIA | JONES | Child | F | 11/06/1996 |
339736 | MISS | IMOGEN | JONES | Child | F | 11/06/1996 |
419768 | MASTER | WINSTON | MCDONALD | Child | M | 25/12/2005 |
419769 | MASTER | DOUGLAS | MCDONALD | Child | M | 25/12/2005 |
Simples! Hope you find this useful.
No comments:
Post a Comment