Friday, 15 May 2020

How to Kill a Database

It's not very likely that you will need to get rid of databases on the production side, but certainly development boxes can be full of junk - test databases, experiments that didn't work out, projects that got cancelled... So you just drop the junk DB, right?

Wrong...

Get the business owner to confirm in writing before you do this stuff, and be prepared to retrieve it in case they change their mind.  This is pretty much the opposite of business as usual, so beware.

1) You can make a database read-only.  It is no longer in use, but the users might want to look at historical data sometimes.  The users can still see their data, but they can't change anything.  But if an application tries to record something like date_last_accessed, it won't work.  Consider running optimization processes, such as full UPDATE STATISTICS before changing the status to read only .




--update statistics
use Vanilla
Go

exec sp_updatestats



-- to make a database read only
alter  database Vanilla set read_only with rollback immediate;
go

-- and back to normal
alter  database Vanilla set read_write;
go

The rollback immediate thing will cope with anyone who is in the middle of working on the database (the one they told you is no longer in use).  Of course, it may be a process rather than a person.  You may need to make the database single-user

-- if other people are using it, take control
alter database Vanilla set single_user with rollback immediate;

--and give it back
alter database Vanilla set multi_user;





2) You can take a database offline.  This is a step beyond read only; the users can't see it but you can.  If it turns out that they need it after all, it's usually the work of a moment to bring it back online.  If not this may help.

Taking a database offline is a wise precaution to take before deleting it.  Tell the helpdesk to let you know if anyone screams.  Wait for a month or two, maybe three, depending..  Put a note in Outlook to remind you in three months, otherwise it will stay offline forever.  If you are using SQL Server 2005, you may need to tinker with your maintenance plans - the option to ignore offline dbs only came in with 2008.

-- Take the Database Offline
alter database Vanilla set offline with rollback immediate;
go

-- Bring the Database back Online
alter database Vanilla set online;

go




3) You can detach a database.  It's still there, but you can't see it or get it back without a bit of effort.   If the file naming isn't routine, it might be very fiddly indeed.

Out of sight, out of mind - once you have detached a database "temporarily" you can't see it unless you go on the box and browse to the folder(s) where the files live.  So before you detach the Vanilla database, run this:

execute sp_helpdb Vanilla

 This will show you where the files live; copy the results into an Outlook task and mark it for attention a couple of weeks hence. Something like this:


name  fileid  filename  filegroup  size   usage  maxsize  growth
Vanilla 1 E:\MSSQL.1\MSSQL\DATA\Vanilla.MDF PRIMARY 7808 KB Unlimited 10% data only
Vanilla_Log 2 F:\MSSQL\Logs\Vanilla_Log.LDF NULL 1024 KB Unlimited 10% log only

There's a fair chance that you or your predecessor forgot to take this precaution some time in the last 10 years.  This might mean that the server (especially, as I say, if it's a dev box) is cluttered with long forgotten detached databases.

The script below looks for databases which are not attached to the current instance.  Unfortunately if you have other instances on the box, it will detect those databases too.  (If you can tell me how to fix that, I'll be very grateful). But apart from that, it's very good at finding potentially forgotten files which are taking up space that you can use.

It uses xp_cmdshell, so assuming you probably have this disabled by default, enable it before and disable it afterwards for security reasons.  (Be careful though - if it isn't disabled by default, you might end up disabling something important. )  Then it creates a temporary table,  populates it with a list of .mdf files, and compares the mdf files it finds with a list of those that are attached.


---- enable these jobs
---- show advanced options
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO

---- enable xp_cmdshell
sp_configure 'xp_cmdshell', 1;
GO
RECONFIGURE;
GO

-- based on http://www.sqlservercentral.com/Forums/Topic6166-5-1.aspx
-- create temporary table
create table #temp_mdf_files
(
      full_filename varchar(200)
)

-populate the temp table with any MDF files found
insert #temp_mdf_files
       exec xp_cmdshell 'dir c:\*.mdf /s/b'

insert #temp_mdf_files
       exec xp_cmdshell 'dir d:\*.mdf /s/b'

insert #temp_mdf_files
       exec xp_cmdshell 'dir e:\*.mdf /s/b'


select
       -- exclude the subdirectory name
       upper(reverse(substring(reverse(full_filename ), 1,charindex('\', reverse(full_filename ) )-1) )) As MDF_FileName,
       full_filename
from
       #temp_mdf_files
where
       --exclude rows which contain system messages or nulls
       full_filename like '%\%'
      
      
and --exclude system databases
       upper(reverse(substring(reverse(full_filename ), 1,charindex('\', reverse(full_filename ) )-1) ))
not in
       ('DISTMDL.MDF', 'MASTER.MDF', 'MODEL.MDF', 'MSDBDATA.MDF' , 'MSSQLSYSTEMRESOURCE.MDF', 'TEMPDB.MDF' )


and    -- MDF filename excluding the subdirectory name
       full_filename
not in
       (select Upper(FILEname) from sys.SYSdatabases)

-- -- SQL Server 2000
--     not in (select Upper(FILEname) from sysdatabases) 
order by MDF_FileName



-- Housekeeping
drop table #temp_mdf_files



-- disable these jobs
-- show advanced options
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO

-- disable xp_cmdshell
sp_configure 'xp_cmdshell', 0;
GO
RECONFIGURE;
GO

-- hide advanced options
sp_configure 'show advanced options', 0;
GO
RECONFIGURE;
GO



4) You can drop a database

You're sure?   Really sure?  


Not if it's offline, you can't.  You need to bring it back online before you do that.  It might disappear from SSMS, but the physical files are still lurking in there. In fact I suspect that the reason I am housekeeping a list of detached databases is that people have taken them offline


Someone dropped the database but because it was offline it never got deleted – it just stayed on as a detached DB




And take a backup before you drop it.  Zip up the backup and save it in your Final Backups folder. 

Monitoring Mirroring

Since Mirroring is now deprecated in favour of Always On, I think it is a minority sport these days

But it works perfectly well, especially if you haven't got Always On.  In a perfect world, we would all upgrade to the latest version as soon as it comes out, and take advantage of all the whizzy new features.  But we don't, of course

Anyway, monitoring mirroring.  A bit of a tongue twister, that.

Assuming you have set up mirroring already, right click on the database you want to monitor, select Tasks, and choose Launch Database Mirroring Monitor

Click on the Warnings tab.  Do this for each of the mirrored databases

Initially, it doesn't monitor anything.  Tucked away in the bottom right of the screen is a button marked Set Thresholds.  Click on it and tick all the boxes

Microsoft sets these defaults - 
Unsent log exceeds 1KB
Unrestored log exceeds 1KB
Oldest unsent transaction exceeds 1 minute
Mirror commit overhead exceeds 1millisecond

I would recommend starting with these and seeing how it goes.  You will almost certainly end up upping those defaults, but it probably isn't my place to tell you what to do

‌Remember to go to the SQL Agent jobs - find Database Mirroring Monitor Job - and set it to Notify you if it finds a problem.  

Be sure to set up the alerts and the Agent Job at both servers, so that it continues to alert you in the event of a failover.  

Tuesday, 12 May 2020

Notifications

I have a script to check that I have got my SQL Server Agent jobs right.

#1 Owner not SA

The default owner of the agent job you set up is you.  Well, that's fine, until you leave the company, they disable your account, and everything falls over.  So I tend to make sure that all the jobs have the owner SA.  In a prefect world you might think about setting up an account just to run SQL Agent jobs.  I'll make an exception for SharePoint jobs, which I prefer not to touch with a BargePole



-- 1 - Owner not SA
--**************************************************************
select
 '1 - Owner not SA' as Problem,
       suser_sname(sj.owner_sid ) as Owner,
       replace(replace(sj.name,char(10),' - '),char(13),' ') as Job_Name,
       replace(replace(sj.[Description],char(10),' - '),char(13),' ') as Job_Description
from
       msdb.[dbo].[sysjobs]                          sj
where
 suser_sname(sj.owner_sid ) not in
 (
  'SA',
  'SharePoint'
 )
order by
       [enabled] desc,
       sj.name



#2 Jobs with no description

Whoever set the job up didn't bother to put in a description.  Well, you don't actually NEED a description.  But when the job fails, you will be left sitting there wondering what the heck the job is supposed to be doing and who set it up and when

So just a brief sentence, maybe a paragraph, plus the name of the guy to contact and the date.  Even if it's you - chances are that you won't remember it in a year's time - that description will help a lot


-- 2 - No description available.
--**************************************************************
select
 '2 - No description available' as Problem,
       suser_sname(sj.owner_sid ) as Owner,
       replace(replace(sj.name,char(10),' - '),char(13),' ') as Job_Name,
       replace(replace(sj.[Description],char(10),' - '),char(13),' ') as Job_Description
from
       msdb.[dbo].[sysjobs]                          sj
where
 sj.[Description] = 'No description available.'
or
 sj.[Description] = ''
order by
       [enabled] desc,
       sj.name



#3 Not Logging completion

You don't NEED to put stuff in the logs
But if you don't, what's the point of having a log?


-- 3 - Jobs which are not set to update event log on completion
--**************************************************************
select
 '3 - update event log on completion' as Problem,
       suser_sname(sj.owner_sid ) as Owner,
       replace(replace(sj.name,char(10),' - '),char(13),' ') as Job_Name,
       replace(replace(sj.[Description],char(10),' - '),char(13),' ') as Job_Description,
       sj.[enabled],
    sj.notify_email_operator_id,
    sj.notify_level_eventlog,
    sj.notify_level_email
from
       msdb.[dbo].[sysjobs]                          sj
where
 sj.notify_level_eventlog not in
 (
  2 -- failure
  ,3 -- completion
 )
order by
       [enabled] desc,
       sj.name

/*
-- update event log on completion (i.e. success AND failure)
 update msdb.[dbo].[sysjobs]
 set notify_level_eventlog  = 3  -- on completion
 where notify_level_eventlog in
 (
  0 --not set
  ,2 -- on failure
 )
*/



#4 Who you gonna call?

You really don't want the job to fail and nobody knows about it, do you?
Oh, you check every day?  Every single job on every single server?  Even during your vacations?  Well good for you. 

I don't.  I get SQL Server  to send an email to the DBAdmin team if anything falls over
The exception is the Heartbeat Monitor


-- 4 - Jobs which are not set to email on failure
-- DBAdmin Heartbeat Monitor should email on success (1)
--**************************************************************
select
  '4 - not set to email on failure',
       suser_sname(sj.owner_sid ) as Owner,
       replace(replace(sj.name,char(10),' - '),char(13),' ') as Job_Name,
       replace(replace(sj.[Description],char(10),' - '),char(13),' ') as Job_Description,
       sj.[enabled],
    sj.notify_email_operator_id,
    sj.notify_level_eventlog,
    sj.notify_level_email
from
       msdb.[dbo].[sysjobs]                          sj
where
 sj.notify_level_email not in
 (
  2 -- failure
  ,3 -- completion
 )
order by
       [enabled] desc,
       sj.name







Monday, 11 May 2020

SSIS Meaningless Error Messages

My SSIS package failed.

It's a simple job that copies data older than 15 days from Database A on premises to Database B in Azure.  It then deletes the old data from Database A.  So a simple archive job.  There's not much there to fail - and it runs every day - just not today.

Well, let's have a look at the error messages in the Progress tab:

[ADO NET Destination [16]] Error: An exception has occurred during data insertion, the message returned from the provider is: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.

[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PROCESSINPUTFAILED.  The ProcessInput method on component "ADO NET Destination" (16) failed with error code 0xC020844B while processing input "ADO NET Destination Input" (19). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.  There may be error messages posted before this with more information about the failure.

[OLE DB Source [1]] Error: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.

[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED.  The PrimeOutput method on component "OLE DB Source" (1) returned error code 0xC02020C4.  The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.  There may be error messages posted before this with more information about the failure.

There now, isn't that informative?
Well, now that I have worked out what went wrong, it is - here are the important words in all that guff:

An exception has occurred during data insertion

It is trying to add data to the destination and fails, reporting a timeout. 

The hexadecimal stuff doesn't really help me terribly much.  It looks really specific - but when you google it, there are all sorts of suggestions for error code 0xC02020C4 and 0xC020844B - a network glitch (that old standby), memory settings, all sorts of things

What I THINK happened was a problem delivering data to Database B in Azure.  And the problem in my case is that the Azure database has been set to 100GB, and I'm trying to put more data than that into it!

It's a standard Azure DB - so the max size is 250GB.  Of course you can get much bigger than that, but it would involve paying Microsoft big bucks.  Let's stick to the 250. 

-- *** Note this needs to be run on the MASTER database
ALTER DATABASE [Archive_2019Q34] MODIFY (MAXSIZE = 250GB);

I suppose that if Azure databases grew of their own accord, I would be bitching about how they grew so big it costs a fortune.  It would be nice if SSIS gave us something more meaningful though

Tuesday, 3 March 2020

Missing Indexes



Think of a telephone directory
It shows everyone in the area, sorted by Surname and Firstname
If it was sorted by phone number, it would take an awfully long time to find Zachary Young
So that’s why it is sorted the way it is

A French phone directory is different
It’s sorted by Village, then Surname and Firstname

In database language, those are two different clustered indexes.  Clustered just means that the list of information is sorted that way





But suppose you were Detective Chief Inspector Tom Barnaby, every week hunting down the killer of half the population of the village of Midsomer.  You found a clue - a phone number scrawled in the victim’s blood at the murder scene, and wanted to find out whose number it was.  Yeah, obviously the murderer is going to turn out to be Zachary Young, but you don’t know that.  

Your database query might look something like this:

select
       firstname,
       lastname,
       address1,
       Town
from 
       Directory
where 
       PhoneNumber = '0123456789'



It’s going to take a long time to run that – because the directory table is missing an index on PhoneNumber

When you run the query, it has to search all the way through the list of names until eventually it finds a match

SQL Server keeps a record of all the queries it runs, and you can get it to count up all the times it scans through a table because the index it needs is missing.  

So – you REALLY need that missing index.  You will find the murderer eventually, but the index makes it a lot easier

Create the index, and everything works a whole lot more easily

Want something a bit more technical?  Here's how to find the missing indexes

Monday, 24 February 2020

The Joys of Admin


I'm admin on a group on LinkedIn called French Connections – 65,621 members at the time of writing.  The other day I got a join request to review.  The profile was a little bit thin, and only 15 connections, so I sent a message… 

But read for yourself.  

Monday Jack Whittaker sent the following message at 12:47 PM

You are both members of French Connections on LinkedIn

Hi Charlie!   Thanks for applying to join the French Connections group! I have a slight problem - we like members to have some sort of connection to Francophonie (it helps to keep spammers at bay), and unfortunately you don't seem to have one. Probably because you haven't added it to your profile yet, or possibly I just missed it!   Perhaps you live in France? Drive a Peugeot? Love French cuisine?   Please reply and tell me why you would like to join, and if it’s appropriate, add it to your profile. Do please let me know within a few days or I’ll assume you aren’t interested. I look forward to hearing from you   Regards Jack Whittaker Moderator

  • Today Charlie O sent the following messages at 3:30 PM 

You are totally wrong in your assumptions.   first of all i am Belgian and Flemish which speaks french and 12 other languages   secondly i lived in nice france for 5 years even got married to my first wife in Nice   But your reaction as a moderator says it alll   Forget about me and be less biased next time. little bonafide Ape.   I know more about france and the culture it stand for then you ever will. You  missed a golden opportunity and cant even write french I bet.   SO, dear wanker: au revoir, Le Petit Pois Un grand salut du LION Flament, which would gobble up little minded Brits or Yankess for breakfast. You must have nothing but 'faux' wannabees trying to understand what I did a long time ago in your group.  I was willing to help you learn about what you claim interest you. Learn to do proper research and get an education  upgrade in social communication on this type of platform you badly need dear FOOL.  Carpe Diem,  I rest my case Dear Moderator you are now challenged and I will make sure the public at large knows how biased you are ....

And you do not dictate what goes on or comes off my profile dear fool!

 <Charlie O sends me a link to his brother’s Linkedin profile for no obvious reason – to spare his brother's blushes, I’ve removed it from this transcript>

my brother would slpa you for saying what you said to me, shall I prosecute you in a court room?

Dumb fool

Moderate that, Mr Le Professuer de math qui est devenue un venduer de petit pois!


  • Jack Whittaker sent the following message at 3:34 PM

Jack Whittaker 3:34 PM

Hi Charlie None of that stuff appears in your profile Any one of those things would have got you welcomed to the group without question - but I'm not psychic So - let's assume you are having a bad day - would you still like to join or have I offended you too much?



  • Charlie O sent the following messages at 3:35 PM

People like me are protecting what you can find out about me for good reasons as is proven yet again!!!

Lets assume you dont know google exsist A simple charlie o would have answered all your dumb questions

learn to do your job as a moderator you dumb ape

now leave me be and stop wasting my time with your spam pre created paragraphs

No one wil tell me what to put on a profile Unlike you I dont need this linkedin app or site to achiev what I achieved in lifeI was trying to help your group You dumb fool which means I am the good kind of apple you kicked in the balls, bravo fool! ILL GLADLY TAKE SOME TIME NOW TO DOCUMENT THIS ON SOCIAL MEDIA AND MAKE YOUR ATITUDE GO VIRAL

Lets see how you deal with that as a moderator and my internet foot print is huge and I know how to SEO the right way ...

Skills which will garantee your group gets seen as it needs to be seen a bunch of wankers from the UK trying to make french clients or contacts!!!

No smart come backs I see you are a coward as well?

Last but no0t least you should never have been made a moderator ... you haven't got the skills for the role!

lets see where this ape works  and if i have him as client if so your company will be booted off the books

sql dba admin what a joke

Do you even speak french ape?  <enough to know how to spell "vendeur">

Brexit YES voter I bet ....

grow up fool!



  • Jack Whittaker sent the following message at 3:49 PM

You seem to be slightly upset Charlie All because I asked a man who lives in Sheffield if he has any connection to francophonie I'm curious - why do you feel that I insulted you?



Sadly, Charlie O never replied

I blocked him from the group, of course – well, obviously

But I took his advice and googled his name.  I found his Facebook account – and it’s chock full of adverts for all sorts of crap.  Remember this line in my initial message to him?  We like members to have some sort of connection to Francophonie (it helps to keep spammers at bay)”.  Turns out he IS a spammer!

Thursday, 20 February 2020

Activity Group not synchronizing

There was an activity group failover in the night.  Some sort of glitch – glitch happens.  The Primary flicked over to the Secondary and the Secondary became the Primary, no problem.
The Old Primary became the Secondary
BUT…

Databases on the New Secondary are marked as Not Synchronizing




The error log says: A connection timeout has occurred on a previously established connection to availability replica with id [5FA3CBDB-5504-43E2-8697-2C713110307B].  Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.

Well, the replica has certainly transitioned.  Other instances on the nodes were unaffected


A bit of googling finds this article.
Microsoft recommends restarting service on secondary replica.
The problem occurs because of an internal race condition.
 What on earth does that mean?  Ah - it's a bit like a deadlock.  

And another article.
Resolve Secondary Replica Database Not Synchronizing You cannot ONLINE a secondary replica availability group database Since a secondary replica availability group database is really in an ongoing restoring state, it cannot be set Online like the primary replica. Remember it is already in the Online role, according to the DMVs.

Solution
Restart the SQL Server instance hosting the secondary replica. This will initiate recovery on all databases including availability group databases, and so long as SQL Server can access all the database files, the availability group database affected should recover successfully and resume synchronization with the primary replica. 

OK, that all seems pretty clear.

Next question - can we do it now?  In the working day?  

Well, if we don't, we have lost our Availability group resilience - that's not a Good Thing

If you look at the picture above, we have an estimated recovery time of 0 seconds.  I've added a few extra columns to my Dashboard - there's a list of them here.  Now that seems too good to be true.  It's been down for a few hours, are you seriously expecting me to believe that there is no data waiting to go across to the Secondary?  Pshaw! I say to you.
One of my colleagues thinks that Dave has mentioned this issue before – but he isn’t sure.  Dave is on holiday - because of course he is.  Whenever you need someone who knows stuff, Sod's Law mandates that they are not around.  
So we are a bit scared of restarting the services on the secondary – is it going to impact on the primary as it catches up with the backlog of stuff to be transferred? 
We decide to wait until everyone goes home for the night. 
I stop the agent service.  
I stop the SQL Server service
I start the SQL Server service.  
I start the Agent service
I go and have a look at the Dashboard



The two small databases have already sorted themselves out.  
The two big ones estimate 20 minutes and 10 minutes to recover, but it quickly becomes obvious that this is an over-estimate.  The actual time taken to recover was about 3 mins for the larger, 1 min for the smaller.  There doesn't appear to have been any impact on the Primary, but it fixed itself before I could check.  
When Dave returns from his hols he confirms that restarting the services on the secondary will fix this; it takes seconds he says.    He reports that it sometimes happens when he manually fails over e.g. to apply patches.  Out of hours, of course. 
So with hindsight, I would feel much more confident about restarting the service on the Secondary node, and do it in the working day rather than run the risk of losing data.