Monday 16 December 2013

The Curse of SQL Server Embedded Edition


Help!  The database is writing a log file which has filled up drive C of server XYZMGT02!

Huh?  That isn’t one of our database servers – in fact I’ve never even heard of it!  Not only that, I don’t even have permission to log onto it!  Nuffink to do with me, guv!

 It turned out that there was a database involved, sure enough, which is why the DBA team got called.  But it wasn’t something that we had ever set up.  Windows Server Update Services or WSUS  downloads updates from Microsoft and sends them out to the computers in the corporate network.  It runs under a freebie cut-down version of SQL Server called Embedded Edition  - SSEE for short – and not unlike Express Edition, when you want to manage it, the things you need have more often than not been disabled. 

The underlying problem in this case was that normally, updates get distributed to the network and can then be purged from the WSUS system.  But if for some reason a computer on the network is unavailable, that update cannot be delivered, and therefore it is not purged.  Drive F:\ which contains the WSUS data had filled up.  And then the software writes a message in the log on Drive C to say something like:
“Could not allocate space for object 'dbo.tbXml'.'PK__tbXml__0E6E26BF' in database 'SUSDB' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup."

53,881 error messages – all but a dozen say that. Keep on writing that message for long enough, and you fill up 10 Gb of Drive C, which then grinds to a halt, bringing the whole server down.  
Now in an ideal world I would have configured that log so that it gets located somewhere else - drive D has twice the space on it, and even if it filled right up, it wouldn't give the server heart failure.  But as far as I can tell, there is no way to change the destination drive - the edit option has been disabled.  Alternatively I might get SQL Server to send an email message to the WSUS administrator - but email has been disabled too. 

Hmm, tricky.  Let's think about those error logs for a minute.  By default, SQL Server carries on writing an error log until it gets restarted - which might mean forever.   This can mean that the error log gets very large indeed, and slow to open if you ever want to have a look at the contents.   So on most of the servers I work with, I like to create a new log every month, by setting up an agent job to run this:
exec master..sp_cycle_errorlog

exec msdb..sp_cycle_agent_errorlog

That's one for the error log, and one for the agent error log - which of course doesn't exist in SSEE (duh, because it has been disabled).

Again by default, SQL Server keeps the current log, plus the six previous logs.  This seems very sensible  - you are probably never going to want to check further back than six months.  And you can change that default if you do.  

But in this case we don't have room on the disk to save all that stuff, and since every error message is in effect identical, we don't really care.  So what I did was set up a scheduled task to cycle the error logs daily.   So it retains the error messages for the past seven days, and then slings them.  

A scheduled task is a Windows option, and not nearly as flexible as SQL Server Agent - but if you can't use Agent , it can come in handy.  

So - I created a folder called scripts on drive C.  
I created a text file called Cycle_Errorlog.sql which contains exec master..sp_cycle_errorlog
 
I created a text file called Cycle_Errorlogs.bat which changes to Drive C, goes to the correct directory, and runs SQLCMD with the SQL script above.  Notice that the connection string to the embedded edition is a bit weird - full details here

C:\
cd\Program Files\Microsoft SQL Server\90\Tools\binn\
sqlcmd -E -S \\.\pipe\MSSQL$MICROSOFT##SSEE\sql\query -i "c:\scripts\Cycle_Errorlog.sql"


And I set up a scheduled task to run the batch file daily.

Three months on, WSUS is still filling up Drive F with updates that can't be deployed, the WSUS Administrator is tearing his hair out, but drive C has plenty of room, and the server isn't crashing. 

Saturday 14 December 2013

Book Review - The Phoenix Project

A parable of life for IT folk, told from the point of view of mild-mannered Bill Palmer who is suddenly promoted out of his comfortable middle-management niche to Vice President of IT Operations. Then everything starts to go wrong.  

The payroll fails. This is a BAD thing. Trying to fix it, they mess up the SAN (storage area network), another bad thing. Bill and his team sit down to create a change management system to stop this from happening in the future. Then the auditors strike - to comply with the rules, they have to do something about a stack of issues six inches high. But they can't do that because the number one priority is Phoenix, which will save the company from bankruptcy (yet another bad thing).


Luckily Bill has the advice of Erik his mentor to fall back on, as well as his common sense. They beat back the dreaded auditors, help Phoenix limp into production, and introduce far better ways of doing things which rapidly overtake Phoenix and leave their competitors struggling in their wake. I say a parable, rather than a novel - the authors want you to behave in a certain way with your IT and so they show the mistakes to avoid and good practices for you to follow. And surprisingly, I rather enjoyed it.

Size Property is NULL so indexing fails

Here’s an oddball story.  As part of the weekly maintenance plan, I rebuild the indexes.  Usually this is fine – even on big databases – they have a whole weekend to sort themselves out in, after all.

But then one day I got a message to say the job had failed

Executing the query "ALTER INDEX [IX_Events_eventId] ON [dbo].[Events] REBUILD WITH ( PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY  = OFF, ONLINE = OFF )
" failed with the following error: "Could not allocate a new page for database ‘Gandalf01' because of insufficient disk space in filegroup 'PRIMARY'. Create the necessary space by dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.
The statement has been terminated.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.


Insufficient disk space?  Thinking that 2 terabytes ought to be enough disk space to turn round in for most DBs, I right clicked on the database and asked for properties - it told me that the property size is not available. 


One of the joys of SQL Server is that there are almost always at least two ways to skin a cat.  I ran this code:

SELECT SUM(size)*1.0/128 AS [size in MB] FROM [Gandalf01].sys.database_files

It worked and told me the database size (and as I suspected, it wasn't much).

I tried the GUI again and this time the property dialog came up fine and showed me the size, as expected.
I tried my re-index again and it worked. 

I've seen something like this before - the GUI refuses to tell me the database properties because the database owner has somehow got set to NULL. 

EXEC sp_helpdb   --reveals that the database owner is NULL and the GUI refuses to work
EXEC sp_changedbowner [SA] -- changes the owner to SA and the GUI now works.

Conceivably the owner is someone who has now left - given that Fred has left and his access removed, it doesn't seem unreasonable that his database might no longer have an owner.  But how can it no longer have a size?  I would be interested to know if anyone has an explanation (other than "It's a bug, Jack")