Tuesday, 9 August 2011

Data Quality - What could possibly go wrong?

Well, pretty much everything.  If you have enough data, there will be mistakes in it.  Try writing your own name down on a piece of paper.  In CAPITALS.  There's quite a good chance that you will spell your own name wrong - something about the unfamiliar capitals trips up the link between your brain and your fingers.  Now imagine that you are a data entry clerk typing in hundreds of names every day.  Some of them are Arabic, others Polish, some even Welsh.  Can you guarantee that every name is correct?

There are ways to help, of course.  Take addresses, now - most countries have a postal code system which can be used in automatic sorting machinery to help your letter get to the right place.  Most countries use a 4 or 5 digit numeric code - 90210 famously defines Beverly Hills in California, so typing an easy number into your address software automatically fills in the town (and it isn't Beverley Hills) and state.  And by reading the relevant postal address file, the software can check the spelling of the streets.  The postcode systems in the UK and the Netherlands can go right down to street, even sometimes premises level.  But these things have to be kept up to date - and periodically the postal authorities make changes in order to take on board changes out in the wild, even sometimes to fix mistakes.

So your name and address data may contain errors - perhaps your older data even preceded the current computer system and was transcribed from a Rolodex.  Check it.

Check the names and genders and titles as well, while you are at it.  Where you find blanks, there are a few things you can do:

  • Where there is a gender, you can derive a title (if Gender = F, then Title = Ms).  Note that while this is fairly risk-free for men, some women dislike being called Ms.  
  • And vice versa - where there is a title, you can derive a gender (if Title = Mr, then Gender = M).  
  • Where there is a forename, you can - usually - derive a title and gender (if Forename = Andrew, then Title = Mr and Gender = M).  This will still leave a number of names (e.g. Hilary) where it is not possible to determine the gender.  

Maybe you don't have a first name at all, just the initials.  Is that a man or a woman?  If you don't know, how can you address a letter?  Suppose it says Mr J Smith, but then states the gender as F?  Something wrong, but what?   You could of course classify such cases as "Unknown", but that might screw up your letters completely -  “Dear Unknown Smith” is not going to win you much business for your widget factory.  “Dear Customer” might be acceptable.

Suppose you have a list of email addresses - you need to make sure that they are valid before sending off your mailshot.  Lots of things you can check here:

  • Check for spaces in field
  • Check that name is present
  • Check that address does not end in a full stop     
  • Check that suffix e.g. .co.uk is not missing
  • Check that suffix is not truncated
  • Check that @ symbol is present     
  • Check that @ symbol is not duplicated      
  • Check that there is no spurious full stop in the address
I found some SQL code on the internet to do this - you don't expect me to do any actual WORK, do you? (After writing that sentence, the boss made me rewrite my code in PL/SQL in order to do that useful job for an Oracle database - I should have kept my mouth shut).  If you want a copy, let me know and I'll pass it on.  One big problem, of course - you can check all these things and have the most valid email address ever, but it's no good if your customer has changed to a new ISP last month.  

And while I think about it, don't forget that you need to comply with data protection legislation to hold all this stuff.  UK readers can find a handy checklist here, but there are similar rules for most countries.  You can get your data checked for people who have moved house, or people who have died (if you want to upset a bereaved relative, send a cheery letter to the recently deceased).  And if you know that someone has died, make sure that you don't contact them, especially not if you are going to write to Dear Mr Smith (Deceased).  

Tuesday, 7 June 2011

How to Keep Your Job

Emails, eh?  You reply to your friend with a sarcastic comment mocking some corporate announcement that Dilbert would have found extreme, and suddenly everyone in the company is reading your wit and wisdom.

The great thing about emails is that they provide a way to communicate instantly with lots of people.  The trouble with emails is that they provide a way to communicate instantly with lots of people.  In the days when you wrote a memo and put it in the internal mail, it probably wouldn't arrive until tomorrow, and you could always get it back from the out tray if you needed to stop it.  Or even if you thought of something to add after writing it.

Well, here's a handy hint - you can do exactly that with an email.  Remember I was talking about email the other day and how to set up rules to sort them into folders?  Well, I was.  You can set up a rule that slows down your email to a more human pace.  Let me show you how to do it.

Start by setting up a new rule.  In Outlook, click on Tools, and then select Rules and Alerts.


Right now, click on New Rule, and it'll give you a list of ready made rules, none of which do what you want to do.  That's to put a delay on outgoing mails, remember?  But right down at the bottom of the list there's a thing about starting from a blank rule, and the thing you are looking for - check messages after sending.  Select that one, and click on Next.




Right - now you get the following screen - lots of possible conditions.  You could think about setting it to put in a delay if you choose to send a message to Company - All, perhaps.  But you might still reply to a career-challenging individual by accident, or some weasel who forwards your message to the world.  Remember Claire Swire?  Yes, that Claire Swire.  So think twice, okay?  And apply this rule to all your outgoing emails - scroll right down to the very bottom of the list and choose "On this machine only".  Hit Next.



Again, there are lots of things you could do - but the thing we want is down at the bottom of the list again.  Choose "Defer Delivery by a number of minutes".  Then in the box at the bottom, click on the link where it says a number of  and type in 5.  If you don't realise that you have screwed up in five minutes, you probably aren't going to realise in 60.  Oh, go on then, put in 10, 15, whatever you want. See if I care.  



You could stop there, if you want.  Click on Finish, and go home.  Next time you press Send and then think "Ooops!", you can stop worrying - it's sitting in your out-box and you can grab it back, edit it, delete it, whatever you want.  I've saved your backside from corporate hell, and you owe me a beer, or suitable non-alcoholic refreshment if this is more appropriate in your jurisdiction.

Still here?  You want the last bit?  OK, suppose you really, really can't wait five minutes.  What a sad life you must live, to be sure.  Take a break, smell the flowers, gaze into the distance for a few minutes and let your thoughts drift away.  You'll feel better for it.

Right then.  Exceptions.  You really want it to go now, you can't wait five minutes because you have promised your boss to send something right now, and she is waiting, drumming her fingers on the desk and wondering what has gone wrong with the email system.  Before her blood pressure rises from a simmer to a rolling boil, I'll tell you.

Click on the thing that says "except if the body contains specific words".  Click on the link where it says specific words and type in your secret code for making things go straight away.


Like what?  You have to be a bit careful.  Don't type something like "Urgent" or "Priority" because you probably get lots of emails with words like that in them.  You can be dull, dull, dull and type something like "For immediate transmission".  I like to say "Priority Green" - it doesn't mean anything particularly, but you can stick it at the bottom of your signature file next to the corporate thingy about saving the planet by not printing out this email, where it doesn't look out of place.  If anyone notices it, they probably think it is some environmental initiative which they missed, and that you are obviously a very virtuous person.



Last of all, give your new rule a name, something like Delay Outgoing Emails, something like that and click on Finish.


From now on, any email you send has a life-saving delay built into it.  You can get round it if you really need to, but I shouldn't bother - five minutes won't usually be noticed.  The crucial thing is that you have put in a delay longer than the Ohnosecond.

Saturday, 14 May 2011

Passwords

This is copied from an article by Buffalo Bill Brabant...

"There has been a resurgence of hackers  attacking webmail users and then using the stolen accounts to send ads and malware to the address books of the stolen account. This is at the least very embarrassing to those involved as one of the users of my Yard sale groups found out last week. After a post by a hacked AOL user they requested I remove the person's posting privileges. Instead I placed the user on moderation and explained what had happened.
As fate would have it, the complainer, who was also an AOL user got hacked about a week later.

Anyhow because it is timely here is an article I wrote about a year ago on password strength which is your only defense against being hacked.



We probably have all seen at least one of these lately, an email from someone we know to about ten of their friends with no subject line or the name of the person as a subject. When you open the email you find a url to a website and if you click on it and you are lucky the worst you will get is a Viagra ad but it just as easy for them to rig the site with malware downloads like antivirus 2010.

This started out a few years back when people were hacking Yahoo mails to steal choice addresses and after their methods were posted on youtube, spammers started hacking hotmail email accounts last winter and then more recently AOL, and within the past week Yahoo accounts.

There has also been a revival of a scam from last year where a person hacks into your email and changes passwords so you cannot check your email. They then send emails out to everyone in your address book saying that you were on vacation out of the country and that you have lost or been robbed of everything except your passport and need thousands of dollars to get out of your predicament. If the person sends an email to inquire about the circumstances the hacker answers as you and has money wired to the foreign country where armed with a false ID he can claim the money from however many people care for you. 

You can improve your odds against being hacked with a strong password but it doesn't have to be hard to remember. Let's say for example your name is Dave Jones. You can strengthen security by using upper and lower case letters so we will try DaVeJoNeS which is a step up from davejones which is rated weak by Microsoft 

With the DaVeJoNeS it rises to medium security which means that no one will accidentally guess your password even if they know you but it would not stop a hacker so lets go up one more step. We can add number and symbols from the top row of keys to help strengthen the password so we add a number to our medium password and Dave likes 7 so we get DaVe7JoNeS. Adding that one number makes our password in the strong category.

Now we will take Dave's password and we will add a few symbols in this case an asterisk in front and one in back or &*DaVe7JoNeS*& along with two ampersands and our password becomes very strong and hard to crack.

Here is another place where you can check your passwords but even with the Microsoft site I would send them a slightly altered password then what you intend to use because it is always safer to trust no one.

Feel free to share this with your friends because if I can stop one person from being hacked I consider it well worth the time it took me to write it down."


Thanks Buffalo!  

You can generate a random password, but it's difficult to remember things like >J^+B#8d.  

You could follow Buffalo's method but make it even sneakier by starting with someone else's name - the politician you hate most, perhaps?  A one-hit wonder pop-star from the 1980s? 

Thursday, 21 April 2011

The Dreaded Error 64

Remember I said that we backed up databases over the network?  (Try to keep up). 

It meant that all the data went into one convenient place before going to a nightly tape backup.  And at first it all seemed fine, until…

<Cue Twilight Zone music>

Error 64…..


Every now and then a backup would fail.  The error message was – Error 64.  The SQL Server error logs come up with the helpful explanation “unknown error”. 

So for 30 servers, and each running a backup task for User and System databases, I might get half a dozen Error 64s a week.  There was never a pattern to it – at least, not one that I ever found.  Some days I would get several, other days none at all.  It didn’t seem to be linked to the SQL Server database servers – all experienced the problem occasionally, none had it consistently.  So maybe the problem lay at the other end of the network – at the destination media server.  Yet there were no errors there – everything seemed perfectly happy at that end. 

Look it up on Google, and you will probably find that “the specified network name is no longer available”.  From the start I suspected a network glitch, but our network specialists denied all knowledge of such a thing.  And certainly with the only errors appearing in SQL Server’s logs, it was hard to claim that it was a network problem.  Besides, while one backup fails because the network is unavailable, another backup over the same network, to the same place at the same time, worked perfectly happily. 

The majority of our servers are SQL 2005, but after a while the backup failed on a SQL Server 2000 server – but this time …

<Cue Twilight Zone music>

Error 22029…..

Apart from that, the symptoms were the same – “unknown error”, intermittent failure, no pattern discernable.  It’s difficult to discern a pattern when you only have a few 2000 instances, anyway. 


And just for the sake of completeness, when it happened on a SQL Server 2008 box…

<Cue Twilight Zone music>

Error 64…..

But instead of “unknown error” it says “The specified resource cannot be found” which sounds a little bit like “the specified network name is not available”, doesn’t it?

Workaround

First thing I tried was staggering the start times of the backups.  Originally everything set off at 1900, long after the pen-pushers and keyboard jockeys had gone home, and in plenty of time to get stuff copied to the media server for the backup to tape at 0200 the next morning.  So I set system backups going at 2 minute intervals, and then user backups starting at 2000, again at 2 minute intervals.  Did it make a difference?  Nah.  So it probably wasn’t overloading the system with some sort of bottleneck then. 

So how about a fallback position?  If the first backup fails, try taking another backup to a separate location.  Not a bad idea even without

<Cue Twilight Zone music>

Error 64…..

because of course the media server itself might fail.  So a second media server was made available, identical to the first one.  The maintenance plans were modified so that if the backup failed, it had another go and sent it to the second media server.  More detail here

This worked pretty well.  Although the error still happened, the secondary backup swung into action and took the backup.  Only very occasionally did both primary and secondary fail.  But it still happened.


Solved it!

I wish I could take the credit, but it was none of my doing.  One weekend the server team upgraded the two media servers from Windows Server 2003 to Windows server 2008.  They doubled the RAM from 4Gb to 8Gb while they were at it.  And the dreaded Error 64 was never seen again.  So if you have this problem, I hope this helps. 
                                       


Saturday, 26 February 2011

Wrestling with Email

I get lots of them, so do you.  I get far more at home than I do at work, because I own or moderate about 20 Yahoogroups, and am a member of 20 more.  Plus all my LinkedIn stuff goes to my home email.  So how do I deal with them?

To start off, I use a thing called Tonsho to virtually eliminate spam. New senders are challenged which means that if there isn't a human sending it, it doesn't come near me. And the other good thing is that it's free (they make their money out of charging to send ginormous emails). 

Then I have rules set up to sort email into appropriate folders. If the subject is Project X, it automatically goes into the Project X folder - where I can get to it right after I finish Projects A to W <g>. Outlook automatically shows me how many unread messages I have in the Project X folder. At home my [Live-In-France] folder contains all the emails for that particular Yahoogroup - this lets me prioritise.  

Read and decide what to do with the Inbox - skim through it with your finger on the delete button.  If it needs a quick reply, then reply.  If it needs further thought, tag it and add a date for further action.  If it's something I am likely to want to keep, I move it to an appropriate folder (Useful Technical Stuff, maybe) 

I found this post Email Etiquette for the Super Busy by Jocelyn K Glei with some good advice:

Never send an angry or contentious email.  Put a five minute delay on your outgoing mail by setting up a rule in Outlook.  That gives you a chance for second thoughts - five minutes delay won't hurt - although it can seem like a long time when you are waiting for the messages to be sent!  

Be concise.  I don't use my Blackberry for email (I have work email at my office and home email at home, I don't need it in between as well).  But if I did I would be a bit peeved to get three page messages to read on a 5cm screen.  

Include deadlines - if there's a deadline, it's good to know that.  Burying the deadline at the bottom of the message is not as good as putting it at the top.  

Don’t send “Thanks!” emails.  Interesting advice. I usually try to say thank you, for example to people who accept my invites to link in. But Jocelyn argues cogently that I should NOT. That may be a step too far away from the human-human interface for me, but you can always say Thanks in Advance.  


Tuesday, 15 February 2011

Back Up Over Network

Microsoft in its wisdom assumes that backups will be held on the local server e.g.
  • Data on Drive D:
  • Logs on Drive E:
  • Backups on Drive F:
What happens if the local server fails?  Ooops. 
We back up to disk and then to tape (belt and braces) - we've had a few less drastic problems than server failure, mainly because we have run out of space on the local machine because some databases have grown a bit bigger than was initially intended.  It's a bit fiddly to back up to tape from 30 different servers too.  So to resolve these problems we decided to back up over the network (BUON) to a central media server.    This makes the process of backing up to tape faster and more reliable.  Our server bods reckon they saved about 900Gb of space on local servers.  And just in case there is a problem with the central media server, we can fall back to an alternative.


This blog is about SQL Server 2005 and 2008.  The technique is a bit different for SQL Server 2000 - let me know if you want this and I'll send you my notes.  But you're moving off 2000 now, aren't you?

Here's how I did it:
Set up a maintenance plan called BUON ServerName System Maintenance Plan (or whatever you like, it's just a name).  I like to set up one plan for System databases and another for User databases. 
a)      Schedule plan for 19:00 daily (for us, most people are finished by 18:00 and the tape backup occurs at 02:00 so this should give it plenty of time, even if the first attempt fails and it has to take another backup).  Don't forget to set up a schedule, otherwise your colleagues will mock you. 
b)      Check Integrity.  There's a nice new thing in SQL Server 2008 that lets the maintenance plan ignore a database if you have taken it offline - tick it.  If you take a database offline, then the maintenance plan carries on happily without whingeing at you that the database is offline.  More for user databases than system, admittedly. 


c)      Reorganise Index; ignore if offline

d)   Backup with destination \\acbriap024\SQL_Backups\<servername>

Here's the meat in the sandwich.  ACBRIAP024 is the primary media server, and SQL_Backups is a shared drive on that machine.  I set up a directory on that share named after the SQL Server instance I want to back up. 


System databases get a .sys.bak extension, User databases get usr.bak. 
And I always tick the Verify Backup Integrity box - it takes a bit longer because it runs a "Restore Database with Verify Only" command after taking the backup, but it's a useful precaution. 

e)      Maintenance Cleanup - clear out backups older than 2-3 days
f)      If step (d) fails, here's the fall back position - backup with destination
\\acbriap023\SQL_Backups\<servername>
This is the secondary media server - the belt and braces machine. 
g)      Maintenance Cleanup

Here's what it looks like:



Notice the red line? Red for failure - if you right click on the line, you can specify whether you want something to happen if the previous step succeeds (green) or fails (red) or just completes regardless (blue). In this case, if the backup to the primary server works, it just tidies up the primary server by deleting old backups and all is complete. If the primary backup fails, then it goes to the secondary backup.

If you want more detail on the individual tasks in a maintenance plan, check out this estimable free download from Brad McGehee.  Brad lives the dream and administers databases in Hawaii, so he must be right!

Monday, 7 February 2011

Telesales Ju-Jitsu

Telemarketers - don’t you just hate them?


Well, up to a point.  When I’m at work, I’m fairly cool about someone who rings up and tries to sell me something.  This may be because it doesn’t happen all that much in my present job and it provides a welcome relief from the slog of administering databases, and maybe because business to business sales are better targeted and have at least some chance of interesting me.


At home though, I’m registered with the Telephone Preference Service which pretty much eliminates calls from reputable companies.  That means that anyone who still tries is not a reputable company.  If they ring up, they want to sell you something dodgy.  So they are fair game for a bout of telesales ju-jitsu.  String them along, and revel in the pleasure of stringing them along.  See how outlandish you can be.

Gavin rang me up all the way from India to tell me about the terrible virus problems I was having on my computer.  Obviously hoping that I didn’t know a computer from a pile of rocks.  I think he wanted me to type in a website URL which of course would pretend to scan my system and Shock Horror!  find a virus.  Not sure if he wanted to sell me an anti virus system, or if he would just run the "scanner" again and miraculously have it clear the problem. Or it may be worse than that and they empty your bank account - never got that far - scam update here.  

Anyway, I was all ears, asked Gavin to help me.  He asked me to type in the URL and I asked him what that was.  So we discussed the Start button and of course I don’t have one on the screen.  After about ten minutes of him getting increasingly frantic as none of the things he tried to get me to do worked, he asked me how I logged on to the internet.  I told him I didn’t have internet access on my PC.  Ha ha!


I told an energy salesman who wanted me to switch to his supplier that I had an experimental cold fusion reactor in my basement which generated all the power I needed, and asked him if his company would be interested in buying my surplus energy.  Still waiting to hear back.


And I told a shares scammer that I was part of an anarcho-syndicalist commune that believed in holding all property in common, and that capitalism was a jackboot on the neck of the people, and share ownership is a weapon used against the working class, etc, etc, until he gave up.

I look forward to telesales calls now...