I've been wrestling with a list of predominantly UK phone numbers for the last couple of days. The data has come from a list of all the calls made to and from an organisation in the past year. In particular, I want to identify the dialling code so that I can identify what geographical areas people are calling from.
First port of call is usually Google - I certainly don't want to write some code if I can just copy something that someone else has already done. But sadly every entry I could find was about US phone numbers, which are easy peasy - area dialling codes are 3 digit across North America. How rational! Here in the UK dialling codes may be three, four, five or six digits long. So if you have North American phone numbers to sort out, most of this article will be irrelevant.
Let's start with a bit of research -
here and
here. "In the United Kingdom
, area codes are two, three, four, or, rarely, five digits long (after the initial zero). Regions with shorter area codes, typically large cities, permit the allocation of more telephone numbers as the local number portion has more digits. Local customer numbers are four to eight figures long. The total number of digits is ten, but in a very few areas the total may be nine digits (after the initial zero). The "area code" is also referred to as an "STD (code)" (subscriber trunk dialling) or a "dialling code" in the UK." So, basically, it's a tangle of different standards. OK, so I need some code which checks the first few digits and identifies which one is which.
Data Cleansing
But before we get on to that, let's do some data cleansing. I have a lot of what look like perfectly good phone numbers with a 92 prefix. Like most firms, it's 9 to dial out, so I don't know where 92 comes from. It's easy to deal with though- just a straightforward REPLACE() function.
replace(LEFT(strddi,2), '92', '')
means when the leftmost two characters of the DDI string are 92, replace them with a blank.
-- identify Direct Dialled numbers with 92 prefix - apparent error
update [CUSTOMER].[PhoneCalls]
Set strddi = replace(LEFT(strddi,2), '92', '')
WHERE LEFT(strddi,2) = '92'
AND strDDIAreaCode IS NULL
AND LEN(strddi) > 5;
There's another issue with incoming mobile phone numbers - in the UK, these start with 07, but somehow they have been recorded as starting with 7. Replace does the job.
-- identify mobile numbers omitting 0 prefix - apparent error
-- change to UK standard 0
update [CUSTOMER].[PhoneCalls]
Set strcli = replace(LEFT(strCLI,1), '7', '07')
WHERE LEFT(strCLI,1) = '7'
AND strCLIAreaCode IS NULL
AND LEN(strCLI) = 10;
Then we have the incoming calls from UK numbers, but specifying the full international dialling code 0044. Fine if you are based somewhere else, but really not necessary if you are in the UK and your caller is in the UK.
-- identify UK numbers with an international code 0044
-- change to UK standard 0
update [CUSTOMER].[PhoneCalls]
Set strcli = replace(LEFT(strcli,4), '0044', '0')
WHERE LEFT(strcli,4) = '0044'
AND strCLIAreaCode IS NULL
AND LEN(strcli) > 5;
Having dealt with UK codes that think they are worldwide, I can now go on and identify the legitimate international codes. I could pick out the French calls (0033), German calls (0049) and so on, but I only have a very small percentage of non-UK calls so I'm just going to treat all these Johnny Foreigners the same and lump them together as "International" calls. If you do want to be more accurate with your international calls, click
here for a list of the codes.
-- identify international codes 00
update [CUSTOMER].[PhoneCalls]
set strCLIAreaCode = 'International'
WHERE LEFT(strcli,2) = '00'
AND strCLIAreaCode IS NULL
AND LEN(strCLI) > 5;
And use the same technique to identify Mobile numbers:
-- identify mobile phone numbers 07
update [CUSTOMER].[PhoneCalls]
set strCLIAreaCode = 'Mobile'
WHERE LEFT(strcli,2) = '07'
AND strCLIAreaCode IS NULL
AND LEN(strCLI) > 5;
Freefone numbers are free only to people calling from Landlines, although I understand that there are
plans to make them free to people using mobile phones too.
-- identify Freefone phone numbers 0800 etc
update [CUSTOMER].[PhoneCalls]
set strCLIAreaCode = 'Freefone'
WHERE LEFT(strcli,4) in ('0800', '0500', '0808')
AND strCLIAreaCode IS NULL
AND LEN(strCLI) > 5;
There is a string of special rate numbers beginning with 08 - once upon a time you could identify 0845 as local rate and 0870 as national rate, but the list has proliferated and now you can't really tell how much it is going to cost.
-- identify special rate phone numbers 0845 0870 etc
update [CUSTOMER].[PhoneCalls]
set strCLIAreaCode = 'Special rate'
WHERE LEFT(strcli,2) = '08'
AND strCLIAreaCode IS NULL
AND LEN(strCLI) > 5;
One thing you can be sure of - a call to an 09 number is going to be outrageously expensive...
-- identify premium rate phone numbers 09
update [CUSTOMER].[PhoneCalls]
set strCLIAreaCode = 'Premium rate'
WHERE LEFT(strcli,2) = '09'
AND strCLIAreaCode IS NULL
AND LEN(strCLI) > 5;
Identifying UK Dialling Codes
So - on to the main point of this article. UK dialling codes - area codes - whatever you want to call them, allow you to identify where in the country a caller is based. Aberdeen is 01224, York is 01904. But very often the dialling code and the subscriber number are held in a single field e.g. 01169158424.
Taking this example, it could in theory be divided thus:
011 6915 8424
0116 915 8424
01169 158424
011691 58424
A dialling code can be anything from 3 digits to 6 digits - a subscriber number can be anything from 4 to 8 digits.
So here's how to tackle the problem. There are only five cities with three digit dialling codes, ten with four digits, and 12 with six - the rest have five. So thanks to a bit of research in Wikipedia (see the links I posted earlier) I was able to construct the following CASE statement:
-- UK Dialling codes may have 3, 4 , 5 or 6 digits
-- pick appropriate code
--CLI
update [CUSTOMER].[PhoneCalls]
set strCLIAreaCode =
(
CASE
-- 3 digit dialling codes e.g. London, Belfast
WHEN LEFT(STRcLI,3) IN
('020', '023', '024', '028', '029' )
THEN LEFT(strcli,3)
-- 4 digit dialling codes e.g. Bristol, Leicester
WHEN LEFT(STRcLI,4) IN
('0118', '0117', '0116', '0115', '0114', '0113',
'0121', '0131', '0141', '0151', '0161', '0191')
THEN LEFT(strcli,4)
-- 6 digit dialling codes e.g. Langholm, Keswick
WHEN LEFT(STRcLI,6) IN
('013873', '015242', '015394', '015395',
'015396', '016973', '016974', '016977',
'017683', '017684', '017687', '019467' )
THEN LEFT(strcli,6)
-- The remaining majority of codes are 5 digit
ELSE LEFT(strcli,5)
END
)
WHERE
strcli <> 'WITHHELD'
AND LEN(strcli) > 5
AND strCLIAreaCode IS null;
Looking at 01169158424, it's easy to see that the first four digits match the four digit option, so the area code part of this number is 0116 - which represents Leicester
Finally, a bit of tidying up:
-- Remove any remaining nulls in area code fields
-- nulls remaining represent internal codes or unidentifiable
-- CLI
update [CUSTOMER].[PhoneCalls]
set strCLIAreaCode = 'N/A'
WHERE strCLIAreaCode IS NULL;
Incidentally, the code shown here is half the code I wrote - I had to do the job for both inbound (CLI - Caller Line Identification) numbers and outbound (DDI - Direct Dial In). the code is essentially the same so I haven't troubled you with it.
I hope you find this useful!