RMM Check: Unifi Health

At Slingshot, we’ve moved our Managed clients over to Unifi networking systems, and I wanted to use our RMM to directly monitor the Unifi Controller. The API isn’t officially documented, but I did find other resources: the community’s API documentation, CyberDrain’s examples, and the Unifi API browser. Add tons of other research and trial/error, and I finally accomplished a Powershell RMM script that monitors Device connections, resources/ports, and Alerts, and reports on a ton more.

Enough with the intros. Here’s an instance currently alerting in Solarwinds RMM:

…and More Info gives the full report (looks better in console — I wish SWRMM preserved whitespace):

Before I forget, some quick notes:

  • Needs a local account (limited admin/readonly) on controller
  • For UDMPs with firmware 1.6 or greater, use port 443; For older controllers, use port 8443
  • It defaults the Controller IP to the detected Gateway IP (we do a lot of UDMPs).
  • Several of the Device Status codes are documented nowhere, so this script might have the only public record of them (for posterity, I’ve figured out 2=pending adoption, 9=inform error, and 11=isolated).

And finally the Powershell — (self-consciously) still in progress with plenty of debug stubs, BUT with lots of useful production miles already under its belt:

<#	checkHealth-UnifiController.ps1
Purpose: for all sites on a controller, checks device connections, resources, ports, alerts
Author: 	Slingshot Solutions, www.slingfive.com
Params:		$IP (or hostname) of controller (default: detected gateway), [int]$PortNum (default:443), Username, PASSWORD
Notes:
* needs a local account (limited admin/readonly) on controller
* For UDMPs with firmware 1.6 or greater, use port 443;  For older controllers, use port 8443
#>

# PREP SCRIPT:
param(
  $IP = (Get-wmiObject Win32_networkAdapterConfiguration |where-object{$_.IPEnabled -and $_.DefaultIPGateway}).DefaultIPGateway[0],
  $PortNum = 443, # or 8443 depending on controller version
  $Username,
  $Password
)
# tweaks:
trap { $_; exit 1 }	# force RMM to treat unhandled runtime errors as ERROR!
#widen host buffer for better output on old PS versions:
$pswindow=(Get-Host).UI.RawUI
$newsize=$pswindow.BufferSize; if($newsize.width -lt 100){$newsize.width=100}; $pswindow.BufferSize=$newsize
$newsize=$pswindow.WindowSize; if($newsize.width -lt 100){$newsize.width=100;$newsize.height=20}; $pswindow.WindowSize=$newsize
function short([string]$str,[int]$size){if($str.length -gt $size){return ($str.substring(0,$size-3)+"...")}else{ return $str}}
# vars:
$exitcode = 0
$report = ""
$FullRpt = ""
$StopWatch = [system.diagnostics.stopwatch]::startNew()

if(!$Password){
	write-host "PARAMETERS:"
	foreach ( $key in ((Get-Command -Name $MyInvocation.InvocationName).Parameters).Keys ) {
		$val = ((Get-Variable $key -ea SilentlyContinue).Value) -join ", "
		Write-Host "  $($key): $val" 
	}
	write-error "Hey dummy, set your params!"
	exit 1000
}


$Credential = @{  username="$Username"; password="$Password"; remember=$True; strict=$True; } |ConvertTo-Json
if($PortNum -eq 443){ # UnifiOS 1.6 or newer:
	$BaseURI = "https://$($IP):$($PortNum)/proxy/network"
	$LoginURI = "https://$($IP):$($PortNum)/api/auth/login"
}elseif($PortNum -eq 8443){ # EdgeOS 1.5 firmware or older:
  $BaseURI = "https://$($IP):$($PortNum)"
	$LoginURI = "https://$($IP):$($PortNum)/api/login"
}

# CONNECT TO CONTROLLER
[Net.ServicePointManager]::SecurityProtocol = "tls12, tls11, tls"
[Net.ServicePointManager]::ServerCertificateValidationCallback = { $True }
#$LoginURI
Try {
	$login = Invoke-RestMethod -uri $LoginURI -Method POST -Body $Credential -ContentType "application/json" -SessionVariable UniFiSession | out-null
}catch{	#wrong username/password = "The remote server returned an error: (400) Bad Request"
	write-error "* ERROR - Api Connection Error: $($_.Exception.Message)"    
	$exitcode = 1010
}
if($exitcode -eq 0){
	#write-host "CONNECTED`r`n"
}



# GET ALL SITES
if($exitcode -eq 0){
	try {
		$urlSites = "$($BaseURI)/api/stat/sites"
		$sites = Invoke-Restmethod -Uri $urlSites -Method GET -WebSession $UniFiSession
	}catch{
		#$report += "* ERROR - Sites Query Failed: $($_.Exception.Message)`r`n"
		write-error "* ERROR - Sites Query Failed: $($_.Exception.Message)`r`n"
		$exitcode = 1011
	}
}

# LOOP THROUGH SITES
if($exitcode -eq 0){
	# DIRTY HACK: UDMPs lose their auth on the second call (Something ServerCertificateValidationCallback related), so just pound it back every time:
	Invoke-RestMethod -uri $LoginURI -Method POST -Body $Credential -ContentType "application/json" -SessionVariable UniFiSession | out-null

	$rptDevices = ""
	$rptAlarms = ""
	Foreach ($site in $sites.data){  # SITE
		try {
			$urlDevices = "$($BaseURI)/api/s/$($site.name)/stat/device/"
			$devices = Invoke-Restmethod -Uri $urlDevices -Method GET -ContentType "application/json" -Headers @{"Accept"="application/json"} -WebSession $UniFiSession
		}catch{
			write-error "* ERROR - Device Query Failed: $($_.Exception.Message)`r`n"
		}
		$shortsitename = short $site.desc 7

		$FullRpt += "`r`nSITE '$($site.desc)' - $urlDevices :`r`n"

		$DeviceStateNames = @{0="disconnected"; 1="connected"; 2="pending adoption"; 3="3(?)"; 4="upgrading"; 5="provisioning"; 6="heartbeat missed"; 9="inform error"; 11="isolated" }
		#https://ubntwiki.com/products/software/unifi-controller/api:
		$DeviceModelNames = @{"BZ2"="UniFi AP"; "BZ2LR"="UniFi AP-LR"; "U2HSR"="UniFi AP-Outdoor+"; "U2IW"="UniFi AP-In Wall"; "U2L48"="UniFi AP-LR"; "U2Lv2"="UniFi AP-LR v2"; "U2M"="UniFi AP-Mini"; "U2O"="UniFi AP-Outdoor"; "U2S48"="UniFi AP"; "U2Sv2"="UniFi AP v2"; "U5O"="UniFi AP-Outdoor 5G"; "U7E"="UniFi AP-AC"; "U7EDU"="UniFi AP-AC-EDU"; "U7Ev2"="UniFi AP-AC v2"; "U7HD"="UniFi AP-HD"; "U7SHD"="UniFi AP-SHD"; "U7NHD"="UniFi AP-nanoHD"; "UFLHD"="UniFi AP-Flex-HD"; "UHDIW"="UniFi AP-HD-In Wall"; "UCXG"="UniFi AP-XG"; "UXSDM"="UniFi AP-BaseStationXG"; "UCMSH"="UniFi AP-MeshXG"; "U7IW"="UniFi AP-AC-In Wall"; "U7IWP"="UniFi AP-AC-In Wall Pro"; "U7MP"="UniFi AP-AC-Mesh-Pro"; "U7LR"="UniFi AP-AC-LR"; "U7LT"="UniFi AP-AC-Lite"; "U7O"="UniFi AP-AC Outdoor"; "U7P"="UniFi AP-Pro"; "U7MSH"="UniFi AP-AC-Mesh"; "U7PG2"="UniFi AP-AC-Pro"; "p2N"="PicoStation M2"; "US48PRO"="UniFi Switch Pro 48"; "US8"="UniFi Switch 8"; "US8P60"="UniFi Switch 8 POE-60W"; "US8P150"="UniFi Switch 8 POE-150W"; "S28150"="UniFi Switch 8 AT-150W"; "USC8"="UniFi Switch 8"; "US16P150"="UniFi Switch 16 POE-150W"; "S216150"="UniFi Switch 16 AT-150W"; "US24"="UniFi Switch 24"; "US24P250"="UniFi Switch 24 POE-250W"; "US24PL2"="UniFi Switch 24 L2 POE"; "US24P500"="UniFi Switch 24 POE-500W"; "S224250"="UniFi Switch 24 AT-250W"; "S224500"="UniFi Switch 24 AT-500W"; "US48"="UniFi Switch 48"; "US48P500"="UniFi Switch 48 POE-500W"; "US48PL2"="UniFi Switch 48 L2 POE"; "US48P750"="UniFi Switch 48 POE-750W"; "S248500"="UniFi Switch 48 AT-500W"; "S248750"="UniFi Switch 48 AT-750W"; "US6XG150"="UniFi Switch 6XG POE-150W"; "USXG"="UniFi Switch 16XG"; "UGW3"="UniFi Security Gateway 3P"; "UGW4"="UniFi Security Gateway 4P"; "UGWHD4"="UniFi Security Gateway HD"; "UGWXG"="UniFi Security Gateway XG-8"; "UP4"="UniFi Phone-X"; "UP5"="UniFi Phone"; "UP5t"="UniFi Phone-Pro"; "UP7"="UniFi Phone-Executive"; "UP5c"="UniFi Phone"; "UP5tc"="UniFi Phone-Pro"; "UP7c"="UniFi Phone-Executive";
		"UDMPRO"="UniFi Dream Machine Pro"}

		# LOOP THROUGH DEVICES FOR SITE
		Foreach ($device in ($devices.data)){
			$DeviceModelName = $DeviceModelNames[$($device.model)]
			$DeviceStateName = $DeviceStateNames[$device.state]

			if( $device.default -eq $True){	# PENDING ADOPTION:
				$vwireEnabled = ($device.vwireEnabled -eq $True)
				$discovered_via = $device.discovered_via	#scan or l2
				
				$FullRpt += "* $($device.type) $DeviceModelName, ip:$($device.ip), mac:$($device.mac), state:$DeviceStateName $($device.state), discovered_via:$discovered_via, wireless:$vwireEnabled`r`n"				
				$rptDevices += "* PENDING ADOPTION$(if($vwireEnabled){" (WIRELESS)"}): $shortsitename > ($DeviceModelName)`r`n"
				$exitcode = 1010

			}elseif ($device.adopted -eq $True){
				if($device.state -eq 0 -or $device.state -eq 9 -or $device.state -eq 11){	# DISCONNECTED, INFORM ERROR, ISOLATED:
					$FullRpt += "* '$($device.NAME)' - $($device.type) $DeviceModelName, ip:$($device.ip), mac:$($device.mac),`r`n    state:$($device.state) $DeviceStateName, adopted:$($device.adopted), disabled:$($device.disabled -eq $True)`r`n"				
					if($device.disabled -ne $True){	# don't care if it's disabled
						$rptDevices += "* $($DeviceStateName.toUpper()): $shortsitename > '$($device.name)'`r`n"
						$exitcode = 1011
					}	
	
				}else{	# OTHERWISE:					
					$uptime = new-TimeSpan -Seconds (0+ $device.'system-stats'.uptime)

					$FullRpt += "* '$($device.NAME)' - $($device.type) $DeviceModelName, ip:$($device.ip), mac:$($device.mac),`r`n    state:$($device.state) $DeviceStateName, uptime:$uptime, cpu:$([int]$device.'system-stats'.cpu)%, mem:$([int]$device.'system-stats'.mem)%, current:$(!$device.upgradable) (FW v$($device.version))`r`n"

					if( [math]::Round($device.'system-stats'.uptime) -lt "300") { 
						$rptDevices += "* $shortsitename > $($device.name) : Disconnected `r`n"
						$exitcode = 1010
					}
					if( [math]::Round($device.'system-stats'.cpu) -gt "90.0") { 
						$rptDevices += "* $shortsitename > $($device.name) : CPU usage of $($device.'system-stats'.cpu)% `r`n"
						$exitcode = 1012
					}
					if( [math]::Round($device.'system-stats'.mem) -gt "90.0") { 
						$rptDevices += "* $shortsitename > $($device.name) : Memory usage of $($device.'system-stats'.mem)% `r`n"
						$exitcode = 1013
					}
					if( $device.upgradable -eq $true ) {
	#					$rptDevices += "* $shortsitename > $($device.name) : Firmware upgrade available <-- IGNORING til QA recovers`r`n"
	#					$exitcode = 1014
					}

					#PORTS:
					#if($device.port_table.count -gt 0){	#don't need to see AP Pros' secondary ports
					if($device.type -eq 'usw'){
						$FullRpt += "  "+ ($device.port_table |ft port_idx,name,enable,up,is_uplink,port_poe,autoneg,speed,full_duplex,network_name -AutoSize |out-string).trim().replace("`n", "`n  ") +"`r`n"
					}
					Foreach ($port in $device.port_table.data){
						if($port.stp_state -eq "discard"){
							$rptDevices += "* $shortsitename > $($device.desc) > PORT $($device.name) : blocked due to STP issues `r`n" 
							$exitcode = 1017
						}
					}
		
				}

			}
			
		}

		Foreach ($device in $devices.data.wan1 |where-object {$_.name}){	#weirdly duplicates with blanks otherwise
			$FullRpt += "* FW WAN1 '$($device.name)' - is_uplink:$($device.is_uplink), up:$($device.up), ip:$($device.ip), netmask:$($device.netmask), gateway:$($device.gateway)  `r`n"
			if($device.is_uplink -and $device.up -ne $True) { 
				$rptDevices += "* $shortsitename > WAN1 $($device.name) : link down `r`n" 
				$exitcode = 1015
			}
		}
		Foreach ($device in $devices.data.wan2 |where-object {$_.name}){#weirdly duplicates with blanks otherwise
			$FullRpt += "* FW WAN2 '$($device.name)' - is_uplink:$($device.is_uplink), up:$($device.up), ip:$($device.ip), netmask:$($device.netmask), gateway:$($device.gateway)  `r`n"
			if($device.is_uplink -and $device.up -ne $True) { 
				$rptDevices += "* $shortsitename > WAN2 $($device.name) : link down`r`n" 
				$exitcode = 1016
			}
		}

		
		# DIRTY HACK: UDMPs lose their auth on the second call (Something ServerCertificateValidationCallback related), so just pound it back every time:
		Invoke-RestMethod -uri $LoginURI -Method POST -Body $Credential -ContentType "application/json" -SessionVariable UniFiSession | out-null

		try {
			$urlAlarms = "$($BaseURI)/api/s/$($site.name)/stat/alarm/"
			$alarms = Invoke-Restmethod -Uri $urlAlarms -Method GET -ContentType "application/json" -Headers @{"Accept"="application/json"} -WebSession $UniFiSession
		}catch{
			$report += "* ERROR - Alarm Query Failed: $($_.Exception.Message)"
			write-error "* ERROR - Alarm Query Failed: $($_.Exception.Message)"
		}
		Foreach ($alarm in ($alarms.data)){
			if(! $($alarm.handled_time)) {	# ???
				$rptAlarms += "* ALERT: $shortsitename > '$($alarm.ap_name)' : $($alarm.msg) @$([datetime]$alarm.datetime -f "yyyy-MM-dd-Hmmss") `r`n" 
				$exitcode = 1018
			} 
		}

	}
	
}


if($rptDevices){
	$report += "$rptDevices `r`n"
}
if($rptAlarms){
	$report += "$rptAlarms `r`n"
}


# REPORT PASS/FAIL
if($exitcode -gt 0){
	write-host "FAIL - problems found:"
}else{
	write-host "OK - no problems found!"
}
$report


# REPORT REST:
 write-host "CHECKED DATA:"
 write-host " "($FullRpt.replace("`n", "`n  ")).trim() #format


write-host 
write-host "PARAMETERS:"
foreach ( $key in ((Get-Command -Name $MyInvocation.InvocationName).Parameters).Keys ) {
	$val = ((Get-Variable $key -ea SilentlyContinue).Value) -join ", "
	Write-Host "  $($key): $val" 
}
	
write-host
write-host "CONTEXT:"
write-host "  Script Path:" (get-item $MyInvocation.InvocationName)
write-host "  Script Last Updated:" (get-item $MyInvocation.InvocationName).LastWriteTime " (try -5hrs [SW saves as UTC])"
write-host "  Execution Time: Total $($elapsed_Total)sec"


# WRAP UP:
write-host 
write-host "exitcode: $exitcode" 
exit $exitcode

NOTE: updates since publishing this have so far added:
* proper detection of the “pending adoption” state
* awareness of the undocumented “adopting” state
* awareness of the undocumented UBB, USP and USW-Flex-Mini models
If anyone’s dying to see the latest version, let me know in the comments.

Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *