Wednesday, 5 December 2012

Implementing Maintenance Plans

OK, so I've been reading stuff, a dangerous hobby I know.  Lenin read Das Kapital, and before you knew it he had overthrown the Tsar and his ministers, and declared a Bolshevik Republic in Russia.  (Speaking of Lenin, have you ever noticed that all his statues show him hailing a taxi?)


In my case I've been reading Brad McGehee's guide to maintenance plans, and I've been looking at ways to put it into practice.  Of course there is no single right way to create a maintenance plan.  Just looking at backups, you can do it in any number of ways.  Using a maintenance plan is one way, you could also use Red Gate Backup or another third party tool and set up a job for backup.  Or I know a colleague who set up a cursor to work through each database in turn, backing it up.  The important thing is that you take a backup, not how you do it.

But the virtue of maintenance plans, which came in with SQL Server 2005, is that they provide a clean straightforward way of performing routine maintenance tasks, without forcing you to do much in the way of hard work.  That's the good thing; the bad thing is that you can't script them out and install them on another server - you have to create a new plan from scratch. Fortunately it's an easy drag and drop process.  Maintenance plans use a subset of SSIS - SQL Server Integration Services - to carry out the necessary maintenance tasks.  So you start by selecting a task from a "toolbox" menu, and dragging a box onto the screen.  Each box describes a particular task: backup, update statistics, indexing, whatever - you configure it, and then go on to the next task.  I find this an elegant way of doing stuff, particularly when you want to do things in order, and you don't want to start the second task until you are sure that the first thing has worked.

Here's one I did earlier - backing up over a network.   This example shows another nice feature - see the red line?  This tells the system what to do if there is a failure - if the first backup fails, take another backup to a different backup server.



Let me show you what works for me - and I'll try to talk about what you might like to change.

First of all, I like to set up two separate maintenance plans - one for the system databases (master, model, msdb) and another for all the user databases.  The system DBs are usually relatively small, and straightforward,  Having separate maintenance plans for System and User databases allow the System DBs to be backed up in seconds; a failure in the User databases does not stop the system databases from being backed up.  If there is a failure, chances are it will be with one of the big user DBs, and by having a separate maintenance plan, I know that one important thing at least is working, and system DBs are fine.

The description is optional, but it is good practice to enter one, with initials and date.  Each plan has appropriate sub-plans - for Daily, Weekly and Hourly Tasks as required.  So for example:


DBAdmin System DB Maintenance
Planned Preventative Maintenance for System DBs - jw 13/11/2012
    • Daily Tasks
    • Weekly Tasks
DBAdmin User DB Maintenance
Planned Preventative Maintenance for User DBs - jw 13/11/2012

    • Hourly tasks
    • Daily Tasks
    • Weekly Tasks
Here's what it looks like on screen:
SQL Server 2012 (and onwards) has an extra column:- Run As e.g. run as SQL Server Agent service account.
Notice I've given each sub-plan a schedule - it's probably better not to do this immediately, but wait until you have tested each element.  At the very least, don't schedule the transaction log backups until you have taken a full backup!

I don't anticipate a need to take transaction log backups of the system databases, so the hourly Tasks sub-plan can be skipped for the System DBs,  

Did I mention that SQL Server will create a log report of all your maintenance plan activities?  A quick flashback to the menu - see the button that vaguely resembles two pieces of paper?  If you hover over it, a tooltip pops up to say Reporting and Logging. 




      Click on that Reporting and Logging button, and you get this dialog box.  There are just a couple of things to do here.  First of all, tick the box to say Log Extended Information.  This fills the log with extra data.  You will never need this extra data unless you haven't got it, in which case it will be crucial to troubleshooting your problem.  Second, copy and paste the folder location into your clipboard - you'll need it in a few minutes.  

  


Weekly Tasks

I've put down three tasks to be done weekly - Rebuild the Indexes, clean up the History, and clear out the log reports.  


Rebuild Index 

Once a week is probably enough, but of course that depends on your particular circumstances.  The Rebuild might be needed more frequently if you have a high volume of data edits, inserts or deletes, in which case you can do this as one of the Daily Tasks.  If you have a narrow maintenance window, you might like to consider the need for a SQL task to run custom code to rebuild, reorganise or leave indexes alone, depending on the level of fragmentation it detects.  If you want to do this with a separate job, right click on the box above and select Disable.  It's also a good idea to leave a reminder note to yourself - right click on the background next to the box and select Annotation.  This lets you write something like "Replaced by Special Indexing Job jw 01/12/2012" - so you don't come across the disabled job in three months' time and enable it.  

But if you have time to Rebuild, why not?  Here's the dialog box.  I won't go into the details because usually all you have to do is Pick the Rebuild Index task off the toolbox menu, drag it onto the screen and tell it what databases you want. If you want to know all the whys and wherefores, have a look at Brad McGehee's guide.   



I like to pick either all the system databases or all the user databases, depending.  You can do different things to different databases, by picking individual ones, but you then have the problem of  keeping on top of changes - if someone adds a new database you have to remember to go through all your maintenance plans deciding what to do with it.  Keep it standard if you can.  One box I do recommend you tick is the one to ignore databases where the state is not online - it saves a lot of hassle.  Obviously you don't want to re-index an offline database, but if you forget to tick this box, SQL server will try, and fail, and then the whole job fails.  



Clean out history 
Once a week, clear out stuff older than four weeks.  That's the default, you can change it if you can think of a good reason for doing so.  




Clean out old log reports 

 Remember the reporting and logging folder which you added to your clipboard a few minutes ago?  This is where you need it.  The Maintenance Cleanup Task is usually used for clearing out old backups, but you can also use it here for clearing out old log reports.  If you think about it, you have a couple of daily tasks and maybe a dozen hourly transaction log backup tasks.  Over a few weeks, this will mount up.  So the default is to sling those that are older than four weeks old.  You don't have to - you can keep them for ever if you want, it's no skin off my nose - but four weeks seems reasonable.



 Warning - SQL Server can be a bit flaky at deleting old files, especially things that aren't old BAK files.  In theory, this was resolved years ago when they brought out Service Pack 2 for SQL Server 2005. In practice, it may work as designed, or it may not.  If it doesn't, check out this blog post - Maintenance Plan Log Files Won't Die

Daily Tasks

I've put down four tasks to be done each day- check database integrity, update statistics, take the backup and remove old backups.




Check Database Integrity

This is going to check for corruption in your databases.  It actually runs DBCC CheckDB(database) for each of the databases you select.


Don't believe me?  Just press the View T-SQL button:



USE [ReportServer]
GO
DBCC CHECKDB(N'ReportServer')  WITH NO_INFOMSGS
GO
USE [adventureworks2012]
GO
DBCC CHECKDB(N'adventureworks2012')  WITH NO_INFOMSGS
GO



You can type out these commands yourself if you want, but why would you ever go to all that trouble?  I think this is a good illustration of maintenance plans making life simple.

Here's the other good thing.  In 99 times out of a hundred, probably even more often than that, DBCC CheckDB will find no problems.  But that one time, there's a corruption in the database, and that's potential bad news.  And if it happens, the maintenance plan stops, sends me a warning message (I'll get on to that) and doesn't back up the corrupt database or delete the old, good, backup.  Turn it the other way round - by doing the check before taking the backup, you know that you are not backing up a corrupt database.

Update Statistics


Update the statistics, and it could potentially make the queries run quicker.  If statistics are out of date, the query execution plan is relying on out of date information and could make a bad decision.  I've never seen proof of this in practice, but that might be because I like to update the statistics every day!


Back Up Database

This is the big one, the one that really matters.  If you don't back up your databases, perhaps DBA work is not the right niche for you?  Here's how:


I'll stick with my policy of backing up all the user databases, and taking a full backup of all of them every night.  But this might not always be possible.  If a database is very large, it might be best to take a full backup at the weekend, and take differential backups on all other days.  Check how long the backups are taking.

I like to create a sub-directory for every database - I find this particularly useful if I'm taking full and transaction log backups - this keeps them neatly together, and makes it easy to find the backups you need in a hurry.  Not everyone likes this - try it both ways and decide.

I like to use the file extension usr.bak for user databases, sys.bak for system databases.  This ensures that the maintenance cleanup task only clears out the correct backups.  (It probably isn't critical, unless you have very different deletion schedules for user and system backups).


Here's the code to take the backup: 


BACKUP DATABASE [adventureworks2012]
       TO  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Backup\adventureworks2012\adventureworks2012_backup_2012_12_05_203624_3408922.usr.bak'
       WITH NOFORMAT,
              NOINIT, 
              NAME = N'adventureworks2012_backup_2012_12_05_203624_3408922',
              SKIP,
              REWIND,
              NOUNLOAD,
              COMPRESSION, 
              STATS = 10
GO


Then, if you tick the Verify box, it does an extra chunk of code, and runs restore with verify only - in other words, it simulates running a restore without actually doing so.  This is a good test of whether you have a good backup or not.  (The ultimate test, of course, is actually restoring your database - some would argue that this is the only test that matters).


declare @backupSetId as int
select @backupSetId = position from msdb..backupset
       where database_name=N'adventureworks2012'
       and backup_set_id=(select max(backup_set_id) from msdb..backupset where database_name=N'adventureworks2012' )

if @backupSetId is null
begin
       raiserror(N'Verify failed. Backup information for database ''adventureworks2012'' not found.', 16, 1)
       end

RESTORE VERIFYONLY FROM  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Backup\adventureworks2012\adventureworks2012_backup_2012_12_05_203827_4199319.usr.bak'
       WITH  FILE = @backupSetId, 
       NOUNLOAD, 
       NOREWIND
GO

The downside of running Verify is that the maintenance plan will take longer to run.  Perhaps 10% longer, perhaps twice as long.  The default is OFF, but I would always turn it on unless I have good reason to be concerned about running time.  Try it and see.  

Finally, if you are running SQL Server 2008 onwards, you can choose to compress your database.  Third party products like the excellent Red Gate Backup offer different levels of compression; the native Microsoft product offers only on or off.  On is fine - I don't really see the value in fine-tuning the amount of compression - just do it!  One word of warning - you can set the default for compression to be "Use default server setting", which sounds well and good until you realise that you don't know what the default server setting is.  Best to explicitly say whether you want it or not, if you ask me.  


Clean up old backups

This is the Maintenance Cleanup task again - we saw this earlier in the weekly tasks when clearing out the log files.  It's the same, except that we are removing usr.bak files, and probably the retention period is something like three days.


Hourly Tasks

A quick one - Back Up the Transaction Log, and remove old backups




Backup Transaction Log

Pretty much as before - the essential difference is that the Backup Type is Transaction Log, and the file extension is trn.  





Clean up old backups

And again - same as before, remembering to clear out the trn files.  


Scheduling

It rather depends on your business - let's assume that your business works office hours (Monday to Friday, 9am to 5pm).  If it doesn't, you are going to have to think about when you can do things to be minimally disruptive.  

Weekly jobs - well, duh - once a week, perhaps on a Saturday or Sunday evening.  Let's say 1800 Sunday

Daily jobs - every evening, let's say at 2000.  This should mean that the weekly job is finished on Sunday before the daily job starts.  But you'll need to check the typical duration.  

Hourly jobs - start an hour or two before normal start time, finish an hour or two after knocking off time.  So every hour from 0600 to 1900 should cover it.  If nothing is happening overnight, there isn't much point in backing up that transaction logs.  

Whatever you decide, keep tabs on it for the first week at least to make sure that things aren't overlapping and generally screwing up.   


Problems

I'm going to make this the subject of another blog post, as this one is getting rather long-winded.