It meant that all the data went into one convenient place before going to a nightly tape backup. And at first it all seemed fine, until…
<Cue Twilight Zone music>
Error 64…..
Every now and then a backup would fail. The error message was – Error 64. The SQL Server error logs come up with the helpful explanation “unknown error”.
So for 30 servers, and each running a backup task for User and System databases, I might get half a dozen Error 64s a week. There was never a pattern to it – at least, not one that I ever found. Some days I would get several, other days none at all. It didn’t seem to be linked to the SQL Server database servers – all experienced the problem occasionally, none had it consistently. So maybe the problem lay at the other end of the network – at the destination media server. Yet there were no errors there – everything seemed perfectly happy at that end.
Look it up on Google, and you will probably find that “the specified network name is no longer available”. From the start I suspected a network glitch, but our network specialists denied all knowledge of such a thing. And certainly with the only errors appearing in SQL Server’s logs, it was hard to claim that it was a network problem. Besides, while one backup fails because the network is unavailable, another backup over the same network, to the same place at the same time, worked perfectly happily.
The majority of our servers are SQL 2005, but after a while the backup failed on a SQL Server 2000 server – but this time …
<Cue Twilight Zone music>
Error 22029…..
Apart from that, the symptoms were the same – “unknown error”, intermittent failure, no pattern discernable. It’s difficult to discern a pattern when you only have a few 2000 instances, anyway.
And just for the sake of completeness, when it happened on a SQL Server 2008 box…
<Cue Twilight Zone music>
Error 64…..
But instead of “unknown error” it says “The specified resource cannot be found” which sounds a little bit like “the specified network name is not available”, doesn’t it?
Workaround
First thing I tried was staggering the start times of the backups. Originally everything set off at 1900, long after the pen-pushers and keyboard jockeys had gone home, and in plenty of time to get stuff copied to the media server for the backup to tape at 0200 the next morning. So I set system backups going at 2 minute intervals, and then user backups starting at 2000, again at 2 minute intervals. Did it make a difference? Nah. So it probably wasn’t overloading the system with some sort of bottleneck then.
So how about a fallback position? If the first backup fails, try taking another backup to a separate location. Not a bad idea even without
<Cue Twilight Zone music>
Error 64…..
because of course the media server itself might fail. So a second media server was made available, identical to the first one. The maintenance plans were modified so that if the backup failed, it had another go and sent it to the second media server. More detail here
This worked pretty well. Although the error still happened, the secondary backup swung into action and took the backup. Only very occasionally did both primary and secondary fail. But it still happened.
Solved it!
I wish I could take the credit, but it was none of my doing. One weekend the server team upgraded the two media servers from Windows Server 2003 to Windows server 2008. They doubled the RAM from 4Gb to 8Gb while they were at it. And the dreaded Error 64 was never seen again. So if you have this problem, I hope this helps.