SUSE Conversations


Application Monitoring Made Easy for Java Applications Using Nagios



By: jarbaby

March 7, 2008 10:28 am

Reads:2008

Comments:5

Rating:0

When it comes to monitoring your backroom, keeping your company’s applications healthy is a big part of how you distinguish yourself from the competition. In my experience, however, applications written in-house tend to get the least attention by system administrators, especially when they are running (or, as far as the administrator is concerned, hidden …) within a J2EE application container.

It shouldn’t come as a surprise, since most monitoring tools come bundled with capabilities to ensure everything from the health of your filesystem to the the temperature of your CPUs, so monitoring these resources is usually as easy as accepting default setups.. They also tend to offer at least one or two options for refactoring your applications to include a proprietary agent (translation: you’ll need to change your code to get it to work …). The problem with this model is that few companies are willing to invest the time and resources necessary to properly integrate an application monitoring agent into their architecture. That’s why so many companies simply punt when it comes to enterprise level application monitoring.

This article is intended to describe a method for generating a maintenance free system level view of the status of your backroom applications without the need to refactor any of your code. The monitoring platform is Nagios (www.nagios.org), a popular open source tool that has has been around for years and enjoys a large community. The application we are going to monitor is an application called HelloWorldApplication. The only assumption we are going to make is that the application uses the logging system log4j, which is an industry standard logging tool that many Java applications already rely on.

Nagios Installation

Since we are focusing on using Nagios, we won’t be going into the installation process. Here are some links to guide you through this process:

Nagios Plugins – Active Service Check

There are only 3 things you need to do to monitor anything in Nagios:

  1. Define your SERVICE (can be anything from a resource to a custom application)
  2. Define each HOST where your SERVICE runs
  3. For active service checks, write a plugin that can be run on the HOST, and deterministically report on the health of the SERVICE, returning one of 4 different exit values:
    • OK: exit code 0: indicates a service is working properly.
    • WARNING: exit code 1: indicates a service is in warning state.
    • CRITICAL: exit code 2: indicates a service is in critical state.
    • UNKNOWN: exit code 3: indicates a service is in unknown state.

… and a 512 byte message that gets written to STDOUT

It’s that simple. The elegant part of this design is that the dependency is on the exit value and STDOUT, so the language is irrelevant.

Once you register the plugin with Nagios, the plugin will be run at the desired frequency on the desired HOST. Any messages that indicate a problem will automatically exercise the notifications you have registered in Nagios. In my experience these active service checks (pull model) are the most common pattern in use.

Nagios Plugins – Passive Service Check

But there is another model supported by Nagios which I believe is much better suited for scaling when it comes to collecting information from your business applications: passive service checks (push model). Passive checks describe the model where some other system uses a nagios client (nsca_send, written in C) to send the same information you would expect from a plugin. This information is forwarded to Nagios via the nsca server (a nagios extension).

One of the interesting features of Nagios is the ability to create a federation of installations. This architecture makes it possible to configure satellite Nagios installations to forward all of their traffic to a central server, so administrators can monitor the entire system from the central server. This architecture required a design that would be capable of handling a significant throughput load. The passive service check mechanism was the result of this requirement.

The remainder of this article will describe how the passive service check model can be used to monitor all Java based applications which make use of log4j to handle errors, warnings, etc… , by installing and configuring the NagiosAppender. A side benefit of using the NagiosAppender (http://sourceforge.net/projects/nagiosappender/) is that we can skip the part about writing a plugin for active service checks. (whoopeee!!)

NOTES:

  • The NagiosAppender contains a pure Java version of the protocol used in the nsca_send program, so there is no reliance on the installation of nsca_send.
  • If you are starting with a fresh installation of Nagios, be sure to install the standard package of plugins before using the nsca server (http://nagiosplugins.org)

Nagios – Virtual HOST

First, we’re going to take advantage of a ‘feature‘ inherent in the passive service check approach. Since passive service checks simply arrive at the nsca server, the notion of the HOST they are associated with is somewhat meaningless (… unless Nagios is periodically running active service checks as well against a plugin you supplied, which is possible). What does this mean? Rather than limiting our list of HOST values in the Nagios configuration to actual server names in the backroom, we can define what I will call a virtual host. In other words, instead of creating (and maintaining!!!) the following configurations in Nagios:

SERVICE HOST
HelloWorldApplication server1
HelloWorldApplication server2
HelloWorldApplication server3
HelloWorldApplication server4

… we’ll just create a single configuration for the HelloWorldApplication …

SERVICE HOST
HelloWorldApplication production

Note the name of the server is production. In this backroom, there IS no server called simply production. It’s a name we invented to describe all of the servers in the backroom. That means that as we add to or take away from the pool of servers in the physical backroom, we don’t need to worry about changing anything in the Nagios configuration. (Your system administrator is going to love you for this …)

Log4j – NagiosAppender Setup

Now all we need to do is define some static log4j configuration for the HelloWorldApplication running in our production backroom, install it on each of the servers (without modifying it!!) where the HelloWorldApplication is running, and add the NagiosAppender to the application classpath. Once you restart the application, Nagios will receive and react to any passive service checks which arrive associated with SERVICE= HelloWorldApplication, HOST=production.

(At this point, seasoned Nagios users will be tempted to abandon this article, as it would appear that they will never be able to identify the physical hardware where the errors are occurring at 2AM on Christmas Eve. I suggest you read to the end before casting stones …)

Here is an example of the appender we’ll add for HelloWorldApplication:

<appender class="org.apache.log4j.nagios.NagiosAppender" name="NAGIOS-HelloWorld">

  <param name="Host" value="monitor1"/>
  <param name="Port" value="5667"/>

  <param name="ConfigFile" value="../server/all/conf/nsca_send_clear.cfg"/>

  <param name="ServiceNameDefault" value="HelloWorldApplication"/>
  <param name="useMDCServiceName" value="false"/>
  <param name="MDCServiceNameKey" value="nagios_service_name"/>

  <param name="useMDCHostName" value="true"/>
  <param name="MDCHostNameKey" value="virtual_host"/>
  <param name="InitializeMDCHostNameValue" value="production"/>

  <param name="useShortHostName" value="false"/>
  <param name="MDCCanonicalHostNameKey" value="nagios_canonical_hostname"/>

  <param name="Log4j_Level_WARN"     value="NAGIOS_WARN"/>
  <param name="Log4j_Level_ERROR"    value="NAGIOS_CRITICAL"/>
  <param name="Log4j_Level_FATAL"    value="NAGIOS_CRITICAL"/>

  <param name="IncludeFilterEnabled"    value="false"/>
  <param name="ExcludeFilterEnabled"    value="false"/>
  <param name="PatternFilterFile"  value="../server/all/conf/NagiosIncludeExcludeFilters.properties"/>

  <param name="SendStartupMessageOK" value="Application Errors Cleared"/>

  <layout class="org.apache.log4j.PatternLayout">
      <param name="ConversionPattern" value="server: %X{nagios_canonical_hostname}: %m%n"/>
  </layout>  
</appender>

Let’s go over each of the components to this appender which are unique to Nagios.

Define the Nagios Server Destination

First we have the Host and Port parameters. These refer to the nsca server that needs to be running in conjunction with Nagios (see http://nagios.sourceforge.net/download/contrib/documentation/misc/NSCA_Setup.pdf). In this case the server name is monitor1, and the port for the server is 5667.

  <param name="Host" value="monitor1"/>
  <param name="Port" value="5667"/>

Locate the NSCA_SEND Configuration File

Next we have the configuration file. This is the same file that the native nsca client reads to determine what encryption options are to be used when sending the passive service checks. The native nsca client supports more than a dozen standard flavors of encryption. Unfortunately, the only options at this time for the NagiosAppender are to turn encryption entirely off, or use the XOR scheme. You will need to make sure that the value you select is consistent with the setup of your nsca server, which has a similar configuration file.

<param name="ConfigFile" value="../server/all/conf/nsca_send_clear.cfg"/>
NOTES: even though you don’t need to install the nsca_send program, you will need to install it’s configuration file in a directory accessible to your application

Define Your Nagios Service Name

Next, we set the ServiceNameDefault. This is the value associated with the SERVICE that Nagios has been configured to monitor.

<param name="ServiceNameDefault" value="HelloWorldApplication"/>

It ends with the name ….Default because it is possible to have the SERVICE set programmatically via the log4j MDC facility. For our purposes, we’ll turn off the MDC feature and simply set it statically in our log4j file with the following:

<param name="useMDCServiceName" value="false"/>
<param name="MDCServiceNameKey" value="nagios_service_name"/>
NOTES: To use the MDC facility, you would need to add log4j calls to your code. While this is a very useful tool, the purpose of this article rules out code changes.

Define Your Nagios Host Name

Now we get to the interesting part, where we define the HOST portion of the Nagios message.

<param name="useMDCHostName" value="true"/>
<param name="MDCHostNameKey" value="virtual_host"/>
<param name="InitializeMDCHostNameValue" value="production"/>

Here we are leveraging the log4j MDC facility (without modifying our code) to:

    Indicate we want the appender to use the MDC facility for storing the value of the HOST portion of the service check (useMDCHostName = true)
    Define the hashing key where the appender should store the value of HOST
    Initialize the hash value to production

Define Your MDC Key for Your Physical Server Name

So …. now that we have wired in the settings to take care of the virtual server name, how do we deal with the physical server name? The answer is rather simple. When the NagiosAppender initializes, it can determine the name of the physical server, and store it in the log4j MDC facility for later use in the Layout. This will make it quite clear to the system administrator which physical server is causing the problem.

<param name="useShortHostName" value="false"/>
<param name="MDCCanonicalHostNameKey" value="nagios_canonical_hostname"/>

Here we have:

    Identified an MDC hash key which the appender will use to store the physical server’s hostname
    Indicated whether the canonical host name used (in 1. above) should be shortened or remain an fqn (fully qualified name).

Define Mappings Between Log4j Levels and Nagios Levels

Now we need to assign some mappings between the log4j levels and the Nagios levels. Log4j is typically used to log quite a bit of information, especially if DEBUG or TRACE are turned on. Since we’re using Nagios to track problems that require intervention, we’ll only provide mappings for the message levels we’re interested in. Here are the settings to indicate that we are only interested in log4j WARN, ERROR, and FATAL level messages, in addition to their counterpart within Nagios. (This would be a good time to have your system administrator review your settings!!!)

<param name="Log4j_Level_WARN"     value="NAGIOS_WARN"/>
<param name="Log4j_Level_ERROR"    value="NAGIOS_CRITICAL"/>
<param name="Log4j_Level_FATAL"    value="NAGIOS_CRITICAL"/>
NOTES: The NagiosAppender will only forward messages which have a mapping

Define Filters

The include / exlude filters can be used for more granular control of which messages get passed. We’ll leave them off for this example.

<param name="IncludeFilterEnabled"    value="false"/>
<param name="ExcludeFilterEnabled"    value="false"/>
<param name="PatternFilterFile"  value="../server/all/conf/NagiosIncludeExcludeFilters.properties"/>

Define a Startup Message

Here is a feature your administrator’s may find useful. When the NagiosAppender is initialized, which in most cases means when the application is started, the appender can send a one-time message. Since a common reason for a restart is to clear an error state, it is convenient for the appender to reset the state in Nagios to OK, instead of making your system administrator do it manually. (You can also send a one-time message for UNKNOWN, WARN, and CRITICAL, but I can’t think of any useful applications for this).

<param name="SendStartupMessageOK" value="Application Errors Cleared"/>

Define Your Message Layout

The last section defines the layout of the text portion of the NagiosMessage. If we’re careful, we use the MDC facility to incorporate the physical server name into the prefix of the message. In the following example, we are saying that we want the beginning of the message to start with the value in the MDC key nagios_canonical_hostname.

<layout class="org.apache.log4j.PatternLayout">
  <param name="ConversionPattern" value="server: %X{nagios_canonical_hostname}: %m%n"/>
</layout>
NOTES: The value nagios_canonical_hostname must match that specified in the previous element for MDCCanonicalHostNameKey

System Level View

Now that you have your backroom configured, you should be able to monitor the health of all of your application instances via the Nagios web interface. Here is a sample shot of the system we have running at Tideworks. Of particular interest is the fact that each of these applications runs as a separate deployable entity within a cluster of JBoss instances. Without leveraging the notion of a virtual server, this view would contain an entry for each application running on each physical instance of the JBoss cluster, making for a very cluttered view. Also, (sorry for the poor resolution …) note that the message associated with the WARNING alert has the physical server name prefixed to it. This is a critical component for your system administrator.

Nagios View of Services Detail – Virtual Backroom Application Monitoring

Click to view.

Quick Review

Nagios

    Add the nsca server, if necessary
    Define a virtual host within Nagios
    Define each application running on the virtual host and turn on passive service checks

Application Server

    Add the NagiosAppender to your classpath
    Configure a log4j appender for each separate application

Conclusion

Application monitoring is often neglected because of the perception that it is difficult to accomplish in a timely and cost effective manner. This article has demonstrated a simple solution to this problem by showing how Java applications using the log4j logging system can easily be monitored with Nagios using a configuration that is maintenance free – you’ll never need to update your configuration (both on the application server and Nagios server) as your backroom grows.

(This approach is not limited to Java applications. Any language capable of executing the nsca_send program directly could take advantage of this design as well.)

Author

Jarlath Lyons
Senior Software Engineer
Tideworks Technology, Inc.
jarlyons@gmail.com

Technical References

Nagios Server version 2.9
NagiosAppender version 1.2.1
Log4j version 1.2.9

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

Tags: , ,
Categories: Enterprise Linux, Technical Solutions

Disclaimer: As with everything else at SUSE Conversations, this content is definitely not supported by SUSE (so don't even think of calling Support if you try something and it blows up).  It was contributed by a community member and is published "as is." It seems to have worked for at least one person, and might work for you. But please be sure to test, test, test before you do anything drastic with it.

5 Comments

  1. By:AdaPopescu

    Thanks for your post! It really helped me because I didn’t know many things about java and I really neded some help. http://www.yissum.co.il/

  2. By:Anonymous

    The narrow page format is a pain for reading the text.
    Not user friendly at all.

    Application Monitoring Made Easy for Java Applications Using Nagios
    http://www.novell.com/communities/node/4131/application-monitoring-made-easy-java-applications-using-nagios

    Move the left side tabs to the top of the page to allow a wider text area..

    Aussiegeek

  3. By:Anonymous

    Have you tried NSCA integration with Java Applications if yes can you please give me the examples.

    mail id :dini_cbit@yahoo.com

  4. By:Anonymous

    You may also want to try http://code.google.com/p/jsendnsca/.

    An API for sending passive checks from within your java code, examples on the site

  5. By:jarbaby

    The latest version of the NagiosAppender library (1.4.0) contains a class that provides the equivalent of nsca_send.c

    Check out the NEWS link at …

    http://sourceforge.net/forum/forum.php?forum_id=904934

Comment

RSS