What's a "sanity check" and why would I use one?
AlwaysUp is designed to keep your important applications functioning 24/7.
To fulfill that mandate, AlwaysUp will quickly detect when your application crashes, hangs or even uses too
much RAM — and automatically restart it in seconds.
But as you probably know, applications can fail in many ways beyond the obvious.
In fact, we've seen countless situations where an application says it's running
but when we investigate we discover that it's not actually doing any work.
The underlying process is in a vegetative state, showing up in Task Manager but otherwise dead to the world.
Fortunately, you can extended AlwaysUp by plugging in additional failure detection logic to detect
a range of subtle and unusual problems.
We call those extensions "sanity checks" — because they check the sanity of your applications.
Today, you can choose from several predefined sanity checks that verify
basic network connectivity, check for log file activity, ensure that a web server is responding, and more.
Or you can even write your own sanity check, if no built-in option fits your needs.
Why should you use a sanity check?
To quickly identify tricky problems and ultimately reduce downtime for your mission-critical applications, of course!
How do I set up a sanity check in AlwaysUp?
To add a sanity check to your application AlwaysUp:
-
Edit your application in AlwaysUp, either by double-clicking its row or by highlighting it and selecting
Application > Edit/View from the menu.
-
Switch to the Monitor tab.
-
Check the Whenever it fails a periodic sanity check box and click the Set button on the right:
-
You'll be looking at the Add Sanity Check window, which will guide you through the process of adding a new sanity check:
The built-in sanity checks are listed in the dropdown:
Each one is described below.
-
After choosing a sanity check and providing the necessary settings, you'll specify how often AlwaysUp should run the sanity check:
-
Finally, confirm your settings and click the Add button to save your new sanity check:
Built-in sanity checks
For your convenience, AlwaysUp comes with a few sanity checks that have helped customers before.
They're very easy to deploy and require only basic information about your application or situation to install.
Is a drive letter mapped and accessible?
If a specific drive needs to be accessible while your application is working, you should deploy this sanity check.
Simply choose the drive letter to monitor, and AlwaysUp will do the rest:
Was a special file updated recently?
Does your application update a file while it's working normally?
If that file is not updated, does it mean that the application should be restarted?
If that strikes a chord, you should add this sanity check.
Enter the full path the file to monitor (and the timeout value) and you'll be good to go:
Is a TCP/IP server accepting connections?
Use this sanity check when you want to make sure that a TCP/IP endpoint is always accepting connections. All protocols are supported.
Enter the IP address or host name of the server and the port to check:
Is a web server responding properly?
If your application implements a HTTP/HTTPS web server, this sanity check will periodically ping its URL and make sure it returns a valid response.
Enter the complete URL to fetch (including the port number, if necessary) and a timeout value to install this check:
Is an important/helper program running?
If your application relies on the presence of another program to function properly, this sanity check is for you.
You'll need to enter the name of the helper's executable, as seen in Task Manager:
Does your application have one or more network connections open?
If you're running a TCP/IP network server, this sanity check will confirm that it has open inbound or outbound connections (or both):
Was an adverse Windows event reported while your application was running?
When Windows writes an event to the event logs, it can have significant impact. Use this sanity check if you'd like to stop/restart
your application whenever Windows reports one or more key events.
Select the event log and enter the numeric identifiers of the triggering events. You'll have to separate multiple IDs with commas:
Writing your own sanity check
If there's no built-in sanity check that detects the unique problems in your application, you can write
your own sanity checking program or script and plug it into AlwaysUp.
In terms of requirements, a sanity check can be either:
- an executable (*.exe) written in any programming language, or
- a Windows batch file (*.bat).
The only obligation is that it exits with a return code of:
- 0 when the check succeeds and AlwaysUp shouldn't do anything to the application;
- 1 when the check fails and AlwaysUp should stop/restart the application as you have configured;
- 10 when the check fails and AlwaysUp should reboot the computer;
- 100 when the check fails and AlwaysUp should stop your application and NOT restart it
(despite what you've configured in AlwaysUp);
- any other value when your utility encounters a problem independent of the application being monitored
and AlwaysUp shouldn't do anything to the application.
In the last case, AlwaysUp will not restart the application but it will write the result to the event log (and send an email if so configured).
Note that if your sanity check utility fails to complete in 120 seconds, AlwaysUp will stop/restart the application as configured
— same as if your utility exited with 1.
How to plug your custom sanity check into AlwaysUp
To have AlwaysUp use your sanity check, simply choose
Check your application with a custom executable/script
from the list and follow the prompts:
On the settings page, you will be asked to enter the full path to your sanity check executable or batch file.
You can enter parameters as well:
Using special command line variables in your sanity check
AlwaysUp can pass your sanity check utility one or more variable values.
Compose your command line with the appropriate string and AlwaysUp will make the
substitution before invoking your program.
Variable |
Replaced With |
$ALWAYSUP_APPPID |
The program identifier (PID) of your running application, as seen in the Task Manager, or -1 if the application isn't running. |
$ALWAYSUP_APPEXENAME |
The name of the executable run to invoke your application. (Just the name, not the full path.) |
$ALWAYSUP_APPRUNNUMBER |
The number of times AlwaysUp has started your application.
This will be 0 before AlwaysUp starts your application and will increase by 1 each time that AlwaysUp starts/restarts your program. |
$ALWAYSUP_APPSTARTTIME |
The time when the application being monitored was started, in the format "YYYY/MM/DD HH:MM:SS".
This value is the empty string if the application is not running. |
$ALWAYSUP_APPUPTIME |
The number of seconds that have elapsed since the application being monitored was started.
This value is -1 if the application is not running. |
$ALWAYSUP_APPNAME |
The name of your application in AlwaysUp. |
$ALWAYSUP_APPPATH |
The full path to the application that AlwaysUp runs. |
$ALWAYSUP_SERVICENAME |
The name of the service created by AlwaysUp to run your application.
If your application is named "MyApp", the service name will likely be "MyApp (managed by AlwaysUpService)". |
$ALWAYSUP_SERVICESTARTTIME |
The time when the service created by AlwaysUp was started, in the format "YYYY/MM/DD HH:MM:SS". |
$ALWAYSUP_SERVICEUPTIME |
The number of seconds that have elapsed since the service created by AlwaysUp was started. |
For example, to have AlwaysUp pass your sanity check executable the main application's process identifier (PID),
your command line might resemble this:
C:\MyServer\my_check.exe $ALWAYSUP_PID
If your main application is running with PID 563, then, when it's time, AlwaysUp will invoke your sanity
check executable like this:
C:\MyServer\my_check.exe 563
Sanity checks for AlwaysUp 15 and earlier
If you're running a version of AlwaysUp before version 16, you'll have to plug in your sanity checks the "old" way
— by specifying an executable or script.
The legacy Sanity Check Plugins for AlwaysUp and Service Protector page describes the previous method.