Friday 10 August 2018

Login Failed on Named Instance

I have a SQL Server 2016 server – call it Gandalf

There are three instances – Default, Frodo, and Sam

Every day I check the error logs.  Every day I see that someone or something is trying to log in at 0200. 

Error: 18456, Severity: 14, State: 5. 
Login failed for user ‘agent_default'. Reason: Could not find a login matching the name provided. [CLIENT: <local machine>]  -

And on Sam, Login failed for user ‘agent_default' and Login failed for user ‘agent_Frodo'

Microsoft's syspolicy_purge_history job runs every night at 0200 on every instance.  It always succeeds.  But it is the only SQL Server agent job that runs at 0200.  So it must be that, surely?  I change it to run at 0215.  Next day, the error log says that our mysterious friend attempted to log in at 0200. 

Hmm, not that then.  Is it an automated scan from our white hat penetration testing server?  I wasted a lot of time trying to pin this down until I was able to find confirmation that Nope - it is not set up to look on Gandalf. 

And then suddenly I noticed that the login attempt had happened at 0215, not 0200.  It IS syspolicy_purge_history !  But it is the job on the OTHER instance which is causing the problem. 

1)      When the job   runs on the Frodo instance, no issues
2)      When the job runs on the default instance, the error log on Frodo says Default agent account failed to log in -
Error: 18456, Severity: 14, State: 5. 
Login failed for user ‘agent_default'. Reason: Could not find a login matching the name provided. [CLIENT: <local machine>]  -
3)      When the job runs on the Sam instance, the error logs on Default and Frodo says
Login failed for user ‘agent_default' and
Login failed for user ‘agent_Frodo'

If you check out that job, Step 3 is running a chunk of Powershell.  It gets the Instance name and runs a task to erase records (older than the specified number of days).

if ('$(ESCAPE_SQUOTE(INST))' -eq 'MSSQLSERVER') {$a = '\DEFAULT'} ELSE {$a = ''};
(Get-Item SQLSERVER:\SQLPolicy\$(ESCAPE_NONE(SRVR))$a).EraseSystemHealthPhantomRecords()

Except that it also looks on all the other instances.  And because I have a different service account for each instance, it fails to find a login, which is why the job succeeds on each instance, but the other instance records a failed login attempt!
It's a known bug with Powershell, apparently.  And that link gives me the Powershell code to fix it!

$applicationName = "SQLPS ($env:USERNAME@$env:COMPUTERNAME)"
$SQLServerConnection = New-Object System.Data.SqlClient.SqlConnection
$SQLServerConnection.ConnectionString = "Data Source=$(ESCAPE_NONE(SRVR));Initial Catalog=master;Integrated Security=SSPI;Application Name=$applicationName"
$PolicyStoreConnection = New-Object Microsoft.SqlServer.Management.Sdk.Sfc.SqlStoreConnection($SQLServerConnection)
$PolicyStore = New-Object Microsoft.SqlServer.Management.Dmf.PolicyStore ($PolicyStoreConnection)

Just cut and paste that into Step 3 of the syspolicy_purge_history job, and the mysterious errors stop.

1 comment:

  1. Hi We have the same issue for all our sql servers.
    But like in a multi instance environment all our instances running using same Service account.
    suppose we have SQLServ1 and there we have 3 named instances
    Instance1, instance2, Instance3 all these 3 instances using same service account .. So why I am also getting the same issue.
    Initially I started explicitly adding the service account in the security login and it stopped generating the error but I think that should not be the resolutions.