Resin JAMM | Java Monitoring | Server Monitoring

Resin Java Application Monitoring and Management (JAMM) does Java and server (OS) monitoring and management.

Because your site’s reliability is important, Resin monitors its internal sensor net every 60 seconds, recording your server’s memory, cpu, network, database and cluster status. Resin monitors JVM metrics, Java Application Servers metrics and OS metrics. The Resin JAMM saves the data so you can adjust the servers based on load, and can analyze problems after they occur. The data is saved so Resin can identify trends and anomalies.

Resin Health System Video

Genesis of the Resin JAMM

The Resin JAMM was born and forged out of real life support needs. Case in point, a fortune 100 customer needed some help diagnosing some very tricky issues. We employed the Resin JAMM (formerly the Resin Health System). We quickly found the issues in their code and some 3rd party library code. To track down these issue using conventional mechanisms would have been arduous. As as reward they greatly expanded their deployment of Resin Pro.

Resin JAMM grew organically not in a vacuum. When you buy Resin Pro, you get our world class support. When you send a question or issue, our core engineering team answers. Our core engineers use Resin JAMM to develop Resin, and to provide support. We use it. We improve it. Once you use it to debug an issue, you will wonder how you lived without it for so long.

Resin JAMM | Cloud wide Java Monitoring

The Resin cloud system participates in the health system by sending a heartbeat every 60 seconds, checking that each server in the cluster can connect to the triple-redundant triad hub.

Each 15 minutes, the health system checks the sensor data and reports a status: OK, WARNING, CRITICAL or FAIL. You can direct Resin’s health system to act based on the health check results, mailing a notification, gathering further thread and memory information or even restarting.

Picture showing graphs of common metrics collected by the Resin Health System

  • health checks: validates system health every minute, recording for further action or review.
  • rule-based health actions: when load is high or memory low, Resin can send mail, dump the heap, dump the threads, perform Java profiling or even restart based on administration rules.
  • flexible monitoring: gathers, archives and displays any JMX value.
  • postmortem analysis: when a server fails, the saved monitors and logs are available to determine and resolve issues.
  • track trends and anomalies: track trends and anomalies with ability to do just in time Java profiling.

Benefits and Features

Administrator Friendly

  • Monitoring facilities available via built-in console or third party applications via JMX or REST
  • Versioned application deployment for graceful upgrade of applications
  • One step distributed deployment in clustered environments, including cloud
  • Sophisticated request rewriting mechanism to simplify deployment architecture

Cloud

  • Cloud-aware administration shows virtual server status across the cloud.
  • Heartbeats detect failed servers immediately.

Learn more about JAMM

Resin WatchDog: Non Stop Resin uses the Resin Health System:

As you may know, Resin runs in a Non Stop Resin mode called Watchdog mode. This is achieved with the Resin Watchdog process. Watchdog mode differentiates Resin from the crowd of application servers. The Watchdog process is one of the reasons that Resin Pro is the server of choice for very large deployments and OEM products like Network Appliances where reliability is paramount. To be honest, this level of reliability is the difference between sleeping at night and getting support calls at 3 AM. You don’t have to be Salesforce.com or Cisco to have mission critical requirements and even small department level deployments can be mission critical.

To achieve Resin Non Stop mode, Resin employs a lightweight watchdog process that monitors the responsiveness of the Resin process and restarts the resin application server if it becomes unresponsive. For a while now the watchdog process works in concert with the Resin JAMM to improve the ability to detect issues. As Resin JAMM improves so does Resin’s Watchdog support.

The issues that causes the restart can be a bug in your code, a bug in library code that you use, a denial of service attack an unexpected spike of use, queries that are suddenly taking a lot longer to run, etc. When you combine Resin Watchdog mode plus Resin Health System reporting, you have the ability to not only respond to these issue but to easily diagnose and fix them.

Resin JAMM | JVM Monitoring Metrics

The Resin Health System has the ability to monitor runtime Resin server, operating system and Java virtual machine (JVM) metrics like request count per minute, heap space, tenured memory, GC time, thread count, JDBC pool size, block thread count, SQL query time, CPU utilization, file descriptor usage, and much more. The Resin Health System has a web interface, REST interface, JMX interface and CLI interface for JVM monitoring. You can visualize what is going on with every node in your deployment along with baseline data to see if anything has changed.

The key to the Resin Health System is the ability to set limits and rules that trigger actions like performing a thread dump, running a CPU profile, running a heap dump, generating a report, sending an email and restarting the server. Resin Pro is preconfigured so you can take advantage of this system right away.

Resin JAMM | PDF Reports for Java Monitoring and Server Monitoring

Resin uses JVM monitoring and server monitoring data to create summary reports, post mortem reports and other custom reports (PDF reports). These reports contain key configuration data, identity data, and graphs of server metrics. You can configure Resin to generate a weekly server status report. You can keep these reports around as a historical record of the state of your servers. These reports are essential for diagnosing issues that might happen in the future. Baseline historical data can give you a lot of perspective in interpreting current server state when problems occur. To know where you are, it is important to know where you have been.

Resin JAMM | Health System Presentation

Health Monitor Triggers Explained

A good way to describe this is as follows: you can setup action triggers which are based on maximum limits, for example if the server CPU is at 95% for three minutes straight, then go ahead and generate a thread dump, heap dump, a lightweight CPU profile, then restart Resin. It is a bit more complicated then this as Resin does this in an efficient way with a primary monitor and then a recheck monitor if some limit is first met. This capability has been in Resin 4 for a while. Resin’s Health System goes beyond a server monitoring system, as it can be reactive and proactive while a normal server monitoring system just watches.

Resin JAMM: Post Mortem Analysis, critical data when it is needed most

In this release, the post mortem analysis report produces a much fuller snapshot of the server. When Resin restarts for a health problem, the Resin Health System takes the data collected about the state of the server and generates a post mortem PDF report. This post mortem report has the following:

  • a summary of the system
  • metering graphs on everything Resin Health System tracks  (heap space, number of threads, etc.)
  • a complete sorted and filtered thread dump
  • a CPU profile
  • warnings and errors from the log file
  • JMX parameter dump
  • and much more

The post mortem report can tell you exactly what was going on in the system just before it died.  For example, not only does the post mortem report tell you that the CPU was busy, it tells you what the CPU was busy doing. .

You can also trigger snapshot summary reports from the command line or from the admin page (for example if you are running a load test and want to take a snapshot at different points during the load test).

Think of the alternative of not having such a system, you have a system that is about to die, say a memory leak or a denial of service attack or a long running query that causes a back up in threads that causes objects to stay around longer, which causes memory to become tenured instead of collected in the eden space, which causes high GC times, and an eventual out of memory exception, etc. Whew!

Now the system goes down. What happen? Will it happen again? With Resin Pro, not only will the server restart and continue to operate, but a report generates so you can diagnose what happened. Track down bugs and denial of service attacks and their ilk like never before.

Now for a small deployment one to 3 servers, this can be convenient and helpful. For a large cloud deployment, this is a lifeline. Cloud enabled Java EE application servers need a Non Stop mode with health snapshots. The more server nodes, the more you need JIT profiling and snapshots.