So what do you do when some crazy person designs a database with a single field to hold three separate bits of data? Of course it probably wasn’t designed as such – probably somebody using the database decided that it would be neat if they could record three bits of data about the incoming call. Rather than asking the developers to add a couple of extra fields, some bright spark decided to use the Category field to contain two or three things.
So – let me show you what I mean.
Audit Approach - Audit Approach Manual - Other |
NHS - Other NHS |
NHS - Audit Responsibilities |
NHS - Audit Responsibilities |
NHS - Accounting - Financial instruments |
Local Government - Housing Benefits |
Local Government - Housing Benefits |
Local Government - Housing Benefits |
NHS - Audit Responsibilities |
NHS - Audit Responsibilities |
Local Government - IFRS Conversion - PPE and other non-current assets |
So sometimes there are two categories, and sometimes three. Fortunately they always use a hyphen between them.
But now I have to create a report which reads in this single field and splits it into three – before the first hyphen, between the hyphens, and after the second hyphen (if there is one).
Category 1
So, the first bit is easy:
select
--the original single field
ca_full_name,
-- cat 1 - before first hyphen
substring(ca_full_name, 1, charindex(' - ', ca_full_name) )
as Category_1,
SubString() lets you read in a text field, select a start point, then select a finish point. So if your text says “Antidisestablishmentarianism” you can say
select substring ('Antidisestablishmentarianism', 1,4)
and get the result “Anti”.
But in this case we don’t know where to stop, so instead of specifying the end point as a number, we have to use the Charindex() function.
select charindex ('i', 'Antidisestablishmentarianism')
This means look through the text until you find the first letter ‘i’.
The answer is 4
So the following code says look through the name until you find space hyphen space
charindex(' - ', ca_full_name)
Okay so far?
Category 2
Here’s how to get the middle bit. I’m indebted to Stephen Lasham, who lives in New Zealand and wrote this article back in 2004 (isn’t the Internet wonderful?). The trouble with Charindex() is that it only finds the first occurrence of the thing you are looking for.
Extracting a string from between two delimiting characters
-- cat 2 - from first hyphen to second hyphen or end
CASE
-- when no second hyphen,
WHEN CHARINDEX(' - ',(SUBSTRING(ca_full_name,
CHARINDEX(' - ',ca_full_name,1)+1,99))) = 0
-- then extract from first hyphen to the max length of 99
THEN LTRIM(RTRIM(SUBSTRING(ca_full_name,
CHARINDEX(' - ',ca_full_name,1)+3,99)))
-- else extract from the first hyphen up to second hyphen
ELSE LTRIM(RTRIM(SUBSTRING(
/* <text>> */ ca_full_name,
/* <start > */ CHARINDEX(' - ',ca_full_name,1) + 3,
/* <length> */ CHARINDEX(' - ', SUBSTRING(ca_full_name,
CHARINDEX(' - ', ca_full_name, 1)
+1, 99)) - 2 ) ) )
END AS Category_2,
So this time, there are two steps. If there isn’t a second hyphen i.e. there are only two categories in the field, start not at the beginning but at the first hyphen, then read to the end. The Ltrim() and Rtrim() bits get rid of any leading or trailing spaces, so it looks nice.
If there IS a second hyphen, Stephen’s code starts at the first hyphen, then reads to the second hyphen.
Category 3
Okay, nearly there.
-- cat 3 - after second hyphen (if present)
CASE
-- when no second hyphen,
WHEN CHARINDEX(' - ',(SUBSTRING(ca_full_name,
CHARINDEX(' - ',ca_full_name,1)+1,99))) = 0
-- then print nothing
THEN (SUBSTRING(ca_full_name,0,0))
ELSE
-- read back from the end until you get to the hyphen
reverse(substring(reverse(ca_full_name), 1,
charindex(' - ', reverse(ca_full_name))-1))
END as Category_3
So, if there isn’t a second hyphen, then print nothing at all (start at 0, stop at 0). Because there isn’t any third bit, see?
But if there is a third bit, what I’ve done is read it from the end using the Reverse() function
select reverse ('Antidisestablishmentarianism')
Which of course gives us msinairatnemhsilbatsesiditnA.
The code reads in the text, reverses it, reads through it until it gets to the hyphen. Then it takes the backwards text up to that point and reverses it again so that it comes out facing the right way. So we end up with this:
Category_1 | Category_2 | Category_3 |
Local Government | Whole of Government Accounts | Other |
Probation | Accounts and Accounting | Service concessions and leases |
Probation | IFRS Conversion | Employee benefits |
Local Government | Capital Finance | Financial Instruments |
Local Government | Operational and Legal Environment | Other |
Other | Trash | |
Foundation Trusts | Additional Services | Other |
Ethics | Non-Audit Services | Other |
Foundation Trusts | Accounting | Service concessions and leases |
Probation | Audit Responsibilities | |
Local Government | IFRS Conversion | IFRS conversion |
Local Government | Operational and Legal Environment | Other |
Hi Jack,
ReplyDeleteLooking at your code, I feel there is a simpler way of doing it, what you are after is a split function much as exists in VB and C#, this would be able to pass back each data item between the "-" to the calling function. This could be implemented either as a Scalar or as a Table function, whichever is your preference.
Regards, Steve
Thanks Steve. I actually found a function ready made to do that when I searched Google, but thought I would play about with other techniques. Then I mislaid the function... :o)
ReplyDelete