So what do you do when some crazy person designs a database with a single field to hold three separate bits of data? Of course it probably wasn’t designed as such – probably somebody using the database decided that it would be neat if they could record three bits of data about the incoming call. Rather than asking the developers to add a couple of extra fields, some bright spark decided to use the Category field to contain two or three things.

So – let me show you what I mean.

Audit Approach - Audit Approach Manual - Other

NHS - Other NHS

NHS - Audit Responsibilities

NHS - Accounting - Financial instruments

Local Government - Housing Benefits

NHS - Audit Responsibilities

Local Government - IFRS Conversion - PPE and other non-current assets

So sometimes there are two categories, and sometimes three. Fortunately they always use a hyphen between them.

But now I have to create a report which reads in this single field and splits it into three – before the first hyphen, between the hyphens, and after the second hyphen (if there is one).

Category 1

So, the first bit is easy:

select

--the original single field

ca_full_name,

-- cat 1 - before first hyphen

substring(ca_full_name, 1, charindex(' - ', ca_full_name) )

as Category_1,

SubString() lets you read in a text field, select a start point, then select a finish point. So if your text says “Antidisestablishmentarianism” you can say

select substring ('Antidisestablishmentarianism', 1,4)

and get the result “Anti”.

But in this case we don’t know where to stop, so instead of specifying the end point as a number, we have to use the Charindex() function.

select charindex ('i', 'Antidisestablishmentarianism')

This means look through the text until you find the first letter ‘i’.

The answer is 4

So the following code says look through the name until you find space hyphen space

charindex(' - ', ca_full_name)

Okay so far?

Category 2

Here’s how to get the middle bit. I’m indebted to Stephen Lasham, who lives in New Zealand and wrote this article back in 2004 (isn’t the Internet wonderful?). The trouble with Charindex() is that it only finds the first occurrence of the thing you are looking for.

Extracting a string from between two delimiting characters

http://www.sqlservercentral.com/articles/Advanced+Querying/extractingastringfrombetweentwodelimitingcharacter/1397/

-- cat 2 - from first hyphen to second hyphen or end

CASE

-- when no second hyphen,

WHEN CHARINDEX(' - ',(SUBSTRING(ca_full_name,

CHARINDEX(' - ',ca_full_name,1)+1,99))) = 0

-- then extract from first hyphen to the max length of 99

THEN LTRIM(RTRIM(SUBSTRING(ca_full_name,

CHARINDEX(' - ',ca_full_name,1)+3,99)))

-- else extract from the first hyphen up to second hyphen

ELSE LTRIM(RTRIM(SUBSTRING(

/* <text>> */ ca_full_name,

/* <start > */ CHARINDEX(' - ',ca_full_name,1) + 3,

/* <length> */ CHARINDEX(' - ', SUBSTRING(ca_full_name,

CHARINDEX(' - ', ca_full_name, 1)

+1, 99)) - 2 ) ) )

END AS Category_2,

So this time, there are two steps. If there isn’t a second hyphen i.e. there are only two categories in the field, start not at the beginning but at the first hyphen, then read to the end. The Ltrim() and Rtrim() bits get rid of any leading or trailing spaces, so it looks nice.

If there IS a second hyphen, Stephen’s code starts at the first hyphen, then reads to the second hyphen.

Category 3

Okay, nearly there.

-- cat 3 - after second hyphen (if present)

CASE

-- when no second hyphen,

WHEN CHARINDEX(' - ',(SUBSTRING(ca_full_name,

CHARINDEX(' - ',ca_full_name,1)+1,99))) = 0

-- then print nothing

THEN (SUBSTRING(ca_full_name,0,0))

ELSE

-- read back from the end until you get to the hyphen

reverse(substring(reverse(ca_full_name), 1,

charindex(' - ', reverse(ca_full_name))-1))

END as Category_3

So, if there isn’t a second hyphen, then print nothing at all (start at 0, stop at 0). Because there isn’t any third bit, see?

But if there is a third bit, what I’ve done is read it from the end using the Reverse() function

select reverse ('Antidisestablishmentarianism')

Which of course gives us msinairatnemhsilbatsesiditnA.

The code reads in the text, reverses it, reads through it until it gets to the hyphen. Then it takes the backwards text up to that point and reverses it again so that it comes out facing the right way. So we end up with this:

Category_1	Category_2	Category_3
Local Government	Whole of Government Accounts	Other
Probation	Accounts and Accounting	Service concessions and leases
Probation	IFRS Conversion	Employee benefits
Local Government	Capital Finance	Financial Instruments
Local Government	Operational and Legal Environment	Other
Other	Trash
Foundation Trusts	Additional Services	Other
Ethics	Non-Audit Services	Other
Foundation Trusts	Accounting	Service concessions and leases
Probation	Audit Responsibilities
Local Government	IFRS Conversion	IFRS conversion
Local Government	Operational and Legal Environment	Other

All done! I hope this is useful. And just think – none of this would have been necessary with a better database design in the first place.

2 comments:

Unknown25 December 2010 at 01:30
Hi Jack,

Looking at your code, I feel there is a simpler way of doing it, what you are after is a split function much as exists in VB and C#, this would be able to pass back each data item between the "-" to the calling function. This could be implemented either as a Scalar or as a Table function, whichever is your preference.

Regards, Steve
Jack31 December 2010 at 11:32
Thanks Steve. I actually found a function ready made to do that when I searched Google, but thought I would play about with other techniques. Then I mislaid the function... :o)

DBAtasks

Friday, 3 December 2010

Extracting Different Bits Of A Single Field

Extracting a string from between two delimiting characters

2 comments: