|
|
 |  |
|
|
Resin provides a number of diagnostic messages using the JDK logging
facility. Full debug logging is enabled with an empty name (meaning match all
names) and a level of 'all'. Since this will produce many messages, it is
useful to put the messages in a a seperate file.
The following configuration creates a debug log for each day, and when a
problem occurs the debug log is used to help find out what the problem is.
Since the entry is in resin.conf, the log will
capture messages from the server itself and all applications. The resulting
log messages are found in the file $RESIN_HOME/log/debug.log.
<!-- resin.conf -->
<resin>
...
<log name='' level='finer' path='log/debug.log'
timestamp="[%H:%M:%S.%s]"
rollover-period='1D'/>
...
</resin>
More specific areas can be targetted with appropriate values for
name. The documentation for provides a
table of useful values for name.
Often you can narrow down the logging requirements to a single web
application. The log entry can be placed in
<web-app-root>/WEB-INF/web.xml and only log messages from that
web application will go to the log file.
The following configuration creates a debug log for each day in <web-app-root>/WEB-INF/work/debug.log.
<!-- <web-app-root>/WEB-INF/web.xml -->
<web-app>
...
<log name='' level='finer' path='WEB-INF/work/debug.log'
timestamp="[%H:%M:%S.%s]"
rollover-period='1D'/>
...
</web-app>
If an application seems stuck, or is running out of resources, a thread dump
will reveal the state of the server.
Java's thread dumps are a vital tool for server debugging. Because
servlets are intrinsically multithreaded, it is very possible to create
deadlocks without realizing it, or to have runaway threads that
consume resources and cause OutOfMemory exceptions. That's
especially true when you start adding third-party software like
databases, EJB, and Corba ORBs.
jps and jstack are useful tools included in JDK 5, providing a
quick command line method for obtaining stack traces of all current threads.
# jps
12903 Jps
20087 Resin
# jstack 20087
Attaching to process ID 20087, please wait...
Debugger attached successfully.
Client compiler detected.
JVM version is 1.5.0-beta2-b51
Thread 12691: (state = BLOCKED)
- java.lang.Object.wait(long) (Compiled frame; information may be imprecise)
- com.caucho.util.ThreadPool.runTasks() @bci=111, line=474 (Compiled frame)
- com.caucho.util.ThreadPool.run() @bci=85, line=423 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)
Thread 12689: (state = BLOCKED)
- java.lang.Object.wait(long) (Compiled frame; information may be imprecise)
- com.caucho.util.ThreadPool.runTasks() @bci=111, line=474 (Compiled frame)
- com.caucho.util.ThreadPool.run() @bci=85, line=423 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)
...
On Windows, ctrl-break may produce a thread dump.
On Unix, "kill -QUIT" will produce a thread dump. On the Linux IBM JDKs,
you'll need to signal one of the child processes. On the Linux Sun JDK, you'll
need to kill the grandparent process of most of the threads:
unix> ps -ef | grep java
ferg 14243 10792 0 Jan13 pts/0 00:00:00 sh -c /usr/java/bin/java
ferg 14244 14243 0 Jan13 pts/0 00:00:00 java -Dres
ferg 14253 14244 0 Jan13 pts/0 00:00:01 java -Dres
ferg 14254 14253 0 Jan13 pts/0 00:00:00 java -Dres
ferg 14255 14253 0 Jan13 pts/0 00:00:00 java -Dres
ferg 14256 14253 0 Jan13 pts/0 00:00:00 java -Dres
ferg 14257 14253 0 Jan13 pts/0 00:00:00 java -Dres
ferg 14258 14253 0 Jan13 pts/0 00:00:00 java -Dres
unix> kill -QUIT 14244
The Solaris 1.3 JDK has a single process, so it's easy.
The Solaris 1.2 JDK has an extra quirk. You need to run Resin in the
foreground (i.e. not using "start".) Then you'll need to answer some
questions on the console.
You get a thread dump without signalling the process by starting the JVM
with some extra arguments to allow a debugger to attach. You can then attach
with the debugger at any time to get a thread dump. This
technique works on all operating systems.
Here are some step by step instructions:
- Start Resin with some extra arguments that allow a debugger to attach:
$RESIN_HOME/bin/httpd -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5432
- Wait until you believe the application is in a state of deadlock
or has runaway threads.
- In another terminal (window), use jdb to connect to the running instance of Resin:
$JAVA_HOME/bin/jdb -connect com.sun.jdi.SocketAttach:hostname=localhost,port=5432
jdb will show something like:
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
Initializing jdb ...
>
- Use the "suspend" command and then the "where all" command to get
a thread dump:
>
suspend
All threads suspended.
>
where all
tcpConnection-6862-3:
[1] java.lang.Object.wait (native method)
[2] com.caucho.server.TcpServer.accept (TcpServer.java:650)
[3] com.caucho.server.TcpConnection.accept
(TcpConnection.java:208)
[4] com.caucho.server.TcpConnection.run (TcpConnection.java:131)
[5] java.lang.Thread.run (Thread.java:536)
tcpConnection-543-2:
[1] java.lang.Object.wait (native method)
[2] com.caucho.server.TcpServer.accept (TcpServer.java:650)
[3] com.caucho.server.TcpConnection.accept
(TcpConnection.java:208)
[4] com.caucho.server.TcpConnection.run (TcpConnection.java:131)
[5] java.lang.Thread.run (Thread.java:536)
..
- Use the "resume" command to resume the process
>
resume
Unix users (and Cygwin users on Windows) will recognize the
opportunity to make a script:
#!/bin/sh
echo -e "suspend\nwhere all\nresume\nquit" | $JAVA_HOME/bin/jdb -connect \
com.sun.jdi.SocketAttach:hostname=localhost,port=5432
Although no rigorous benchmarking has been done, there appears to
be little overhead or performance penalties involved in having the JVM
start with the options that allow a debugger to attach.
In any case, you'll eventually get a trace that looks something like the following (each JDK is slightly different):
Full thread dump:
"tcpConnection-8080-2" daemon waiting on monitor [0xbddff000..0xbddff8c4]
at java.lang.Object.wait(Native Method)
at com.caucho.server.TcpServer.accept(TcpServer.java:525)
at com.caucho.server.TcpConnection.accept(TcpConnection.java:190)
at com.caucho.server.TcpConnection.run(TcpConnection.java:136)
at java.lang.Thread.run(Thread.java:484)
"tcpConnection-8080-1" daemon waiting on monitor [0xbdfff000..0xbdfff8c4]
at java.lang.Object.wait(Native Method)
at com.caucho.server.TcpServer.accept(TcpServer.java:525)
at com.caucho.server.TcpConnection.accept(TcpConnection.java:190)
at com.caucho.server.TcpConnection.run(TcpConnection.java:136)
at java.lang.Thread.run(Thread.java:484)
"tcpConnection-8080-0" daemon waiting on monitor [0xbe1ff000..0xbe1ff8c4]
at java.lang.Object.wait(Native Method)
at com.caucho.server.TcpServer.accept(TcpServer.java:525)
at com.caucho.server.TcpConnection.accept(TcpConnection.java:190)
at com.caucho.server.TcpConnection.run(TcpConnection.java:136)
at java.lang.Thread.run(Thread.java:484)
"tcp-accept-8080" runnable [0xbe7ff000..0xbe7ff8c4]
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:413)
at java.net.ServerSocket.implAccept(ServerSocket.java:243)
at java.net.ServerSocket.accept(ServerSocket.java:222)
at com.caucho.server.TcpServer.run(TcpServer.java:415)
at java.lang.Thread.run(Thread.java:484)
"resin-cron" daemon waiting on monitor [0xbe9ff000..0xbe9ff8c4]
at java.lang.Thread.sleep(Native Method)
at com.caucho.util.Cron$CronThread.run(Cron.java:195)
"resin-alarm" daemon waiting on monitor [0xbebff000..0xbebff8c4]
at java.lang.Thread.sleep(Native Method)
at com.caucho.util.Alarm$AlarmThread.run(Alarm.java:268)
"Signal Dispatcher" runnable [0..0]
"Finalizer" daemon waiting on monitor [0xbf3ff000..0xbf3ff8c4]
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:108)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:123)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:162)
"Reference Handler" daemon waiting on monitor [0xbf5ff000..0xbf5ff8c4]
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:420)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:110)
"main" waiting on monitor [0xbfffd000..0xbfffd210]
at java.lang.Thread.sleep(Native Method)
at com.caucho.server.http.ResinServer.waitForExit(ResinServer.java:674)
at com.caucho.server.http.ResinServer.main(ResinServer.java:821)
at com.caucho.server.http.HttpServer.main(HttpServer.java:95)
Each thread is named. Here are some of the common names:
| Thread Name | Description
|
| tcp-accept-8080 | Resin thread listening for new connections
on port 8080.
|
| tcpConnection-8080-3 | Resin servlet thread handling
a connection from port 8080.
|
| tcp-cron | Resin's run-at thread
|
| tcp-alarm | Resin's alarm thread
|
There should be one tcp-accept-xxx thread for each <http> and
<srun> that Resin's listening for. The tcp-accept-xxx thread should
almost always be in socketAccept.
There should be several tcpConnection-xxx-n threads. Each of these
is the servlet thread. On a busy server, these can appear anywhere in
your code. If several appear in one location, you've likely got some sort
of deadlock or at least a slow lock. Idle threads are either in tcpAccept or
httpRequest or runnerRequest (for keepalive threads.)
For deadlocks, you'll want to look at the "waiting on monitor" threads
and any case where lots of threads are stuck at the same location.
Most memory problems are due to memory leaks in the application
program. For example, a cache or a vector that fills up with stale
data, or a singleton or static variable which doesn't properly
detect a web-app reload. Some more exotic memory issues
relate to running out of heap memory or virtual memory when using
a large number of threads (> 256).
The steps to track down a memory problem are:
- Enable -J-verbosegc with the httpd.sh start or httpd -install.
The -verbosegc flag logs the garbage collection of the heap, which will
let you know if you're running out of heap memory (the most common case).
- Get a heap profiler or use the heap dump in the JVM. JProfiler is an
inexpensive commercial heap profiler. Although the JVM's heap dump is
not very user friendly, it's always available. You should be using
a heap profiler as part of your development process and certainly use one
before any production launch.
- With the heap profiler, find the 2-3 top memory users and fix those
memory leaks.
- Common application errors include:
- ThreadLocal variables that are not properly cleared at the end of each request.
- Singleton or static hash maps and caches, esp check for clearing web-app restarts.
- Spawned threads which are not stopped on web-app restart.
- web-app variables (like the "application" variable), stored in a static variable.
- If the heap is clean, i.e. -verbosegc shows a steady heap, you'll need to
look at non-heap memory:
- Thread stack usage (-Xss1m). Each thread takes up some non-heap memory.
The default on some systems is 8 meg. Because of 32-bit virtual memory limits of about 2G on some systems, even 256 threads with 8 meg stacks can chew up the virtual memory. You can drop the stack size with the -Xss directive.
- JNI memory. If you're using JNI libraries or drivers that use JNI,
it's possible that the JNI can allocate more memory than is available.
- fork/exec and OS limits. If the OS does not have enough available
swap space, for example, the OS might refuse a "jikes" compilation.
- NIO, memory mapping, and .jar files. The JDK memory-maps .jar files.
In some cases with very large numbers of jars, this can result in
running out of virtual memory. The same issue can appear for NIO memory mapping.
- If all of these are eliminated, it might be a Resin bug. However,
you will need to have located the memory leak as Resin-related before
any memory-related bug report, i.e. it's necessary to go through the
above steps before reporting a bug. Bug reports that complain
about OutOfMemory errors, but have not even gotten a JDK memory
dump are most likely application errors. You must provide a
heap dump when reporting any potential Resin memory problems.
-verbosegc is a debugging output from the JVM. It's very convenient
for checking basic memory usage and garbage collection time.
It's a good idea to use -verbosegc for any production system.
When starting Resin, you'll use -J-verbosegc.
The specific output depends on the JVM, and looks something like
the following:
[GC 9176K->8647K(9768K), 0.0014790 secs]
[GC 9287K->8668K(9768K), 0.0011120 secs]
[GC 9308K->8668K(9768K), 0.0007810 secs]
The "(9768K)" is the most important number, showing that the
maximum allocated heap is about 10 meg. The other numbers show the
actual heap used before and after garbage collection.
If an application gobbles up memory until it throws an OutOfMemory
exception, or seems to be spending an excessive amount of time doing garbage
collection, a heap dump can help track down the source of the problem.
What you really need to do is get a good CPU and heap profile.
The JDK has a simple one (but not very
user-friendly), so you don't absolutely need to buy a profiler.
An alternative for simple heap monitoring (garbage collection issues), are
the jvmstat tools.
It's a standard java argument, so "java -Xrunhprof:help" will tell you how to start it. For example, you could
start Resin (unix or windows) as
> httpd.sh -J-Xrunhprof:heap=sites,cpu=samples
(On Unix, Resin's startup script has a -cpuprof-ascii argument that sets it for you automatically.)
Run a load against it for a decent amount of time (you can even use something like Apache's "ab" for 10
minutes), and then stop the server nicely. i.e. you can't just kill it with ctrl-C, but you need to make it do a
clean exit.
That will dump a java.hprof.txt. Skip to the bottom to get the trace.
Assuming you're being cheap, and using the JDK's heap profiling instead of
buying a profiler, you'll need a bit of help interpreting it.
The following is an example of a heap dump running Resin. Here's what you
might see skipping down to the "SITES BEGIN" section. (I did mention this is
the cheap way of doing things.) As with most profiling, you only want
to pay attention to the top 20. Everything else is in the noise. Ignore it.
SITES BEGIN (ordered by live bytes) Tue Jan 9 17:44:33 2001
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
1 11.87% 11.87% 983520 120 1393320 170 2553 [B
2 9.89% 21.76% 819600 100 1286772 157 4070 [B
3 9.09% 30.85% 753756 23 3539376 108 4970 [L<Unknown>;
4 5.83% 36.68% 483564 59 778620 95 7180 [B
5 5.74% 42.42% 475368 58 745836 91 7178 [B
6 4.35% 46.77% 360624 44 696660 85 7182 [B
7 2.97% 49.74% 245880 30 450780 55 7176 [B
8 2.37% 52.11% 196704 24 352428 43 7254 [B
9 1.88% 53.99% 155724 19 262272 32 7174 [B
10 1.78% 55.77% 147528 18 245880 30 7137 [B
11 1.53% 57.30% 126988 1063 16973092 129113 3271 [C
12 1.34% 58.64% 110684 3953 20362832 727244 1213 sun/io/CharToByteISO8859_1
13 1.25% 59.88% 103320 738 141820 1013 5942 java/lang/Class
14 1.21% 61.10% 100548 49 221616 108 5003 [L<Unknown>;
15 1.21% 62.31% 100548 49 221616 108 5005 [L<Unknown>;
16 1.07% 63.38% 89080 1532 18393580 317347 1340 [B
17 0.79% 64.18% 65568 8 81960 10 8408 [B
18 0.79% 64.97% 65552 4 65552 4 27630 [C
19 0.70% 65.67% 58232 24 1110128 386 5038 [C
20 0.68% 66.35% 56200 450 116816 980 7186 [C
There are two things to look for. First, if any one class starts
chewing up a large number in the "live objs" column, you'll need to
take a look. That might be a memory leak.
Second, if some class has a huge number in the "alloc'ed objs" column,
you may be wasting lots of garbage collection time that could be easily
solved with some caching.
The [C in the class name means a character array.
To see what that really means, you'll need to look at the stack
trace (3271):
TRACE 3271:
java/lang/String.<init>(String.java:244)
com/caucho/util/CharBuffer.close(CharBuffer.java:714)
com/caucho/vfs/FilesystemPath.normalizePath(FilesystemPath.java:162)
com/caucho/vfs/FilesystemPath.schemeWalk(FilesystemPath.java:127)
That's part of Resin's VFS code. Maybe in the future it'll make sense
to try to reduce that.
You can get a longer trace using the "-prof-depth 10" argument to Resin.
(Or the corresponding depth in -Xrunhprof.) That will often give more
information.
The CPU profiling is a little easier to read. On some JDK's you'll need to
run it with the JIT disabled.
CPU SAMPLES BEGIN (total = 424614) Tue Jan 9 17:44:33 2001
rank self accum count trace method
1 21.36% 21.36% 90704 7266 com/caucho/server/http/VirtualHost.logAccess
2 10.84% 32.20% 46041 7269 java/net/SocketInputStream.socketRead
3 5.99% 38.19% 25428 1213 java/lang/Class.newInstance0
4 5.11% 43.31% 21715 7896 com/caucho/util/CharBuffer.toString
5 4.82% 48.13% 20463 1286 sun/io/CharToByteISO8859_1.convert
6 3.54% 51.66% 15018 1242 sun/io/CharToByteConverter.<init>
7 2.68% 54.35% 11388 7241 java/io/PrintWriter.<init>
8 2.47% 56.82% 10508 7748 com/caucho/server/http/Request.fillCookies
9 2.27% 59.09% 9650 1214 sun/io/ByteToCharConverter.<init>
10 1.85% 60.94% 7857 5097 java/lang/String.<init>
11 1.59% 62.53% 6754 1341 java/lang/String.substring
12 1.57% 64.10% 6650 1340 java/lang/String.getBytes
13 0.92% 65.02% 3907 7897 java/lang/String.<init>
14 0.76% 65.78% 3227 3259 com/caucho/vfs/FilePath.fsWalk
15 0.75% 66.53% 3195 7895 com/caucho/server/http/Request.fillCookie
16 0.71% 67.25% 3031 7321 java/lang/String.getBytes
17 0.71% 67.95% 2996 3270 com/caucho/util/CharBuffer.close
18 0.68% 68.63% 2892 3271 java/lang/String.<init>
19 0.66% 69.29% 2782 7318 com/caucho/vfs/FilePath.openWriteImpl
20 0.61% 69.90% 2604 7320 java/io/FileOutputStream.<init>
You should only pay attention to the top 20 or so. You'll
probably need to ignore a few of the top 10, because they're just waiting
for a user response. The SocketInputStream.socketRead is an example.
You can use the trace number to get a call trace:
TRACE 7266:
com/caucho/server/http/VirtualHost.logAccess(VirtualHost.java:487)
com/caucho/server/http/Application.logAccess(Application.java:1846)
com/caucho/server/http/Response.finish(Response.java:1345)
com/caucho/server/http/Request.finish(Request.java:416)
(thanks to Rob Lockstone for his comments)
Run Resin with the additional argument -Xloggc:gc.log where
"gc.log" is the name of the log file which will be created in the root Resin
directory, e.g. /resin/gc.log.
Once the server has run under load for a while, or when problems start
appearing, look in the gc.log file and search for "Full". They will happen
fairly infrequently at first, and then towards the end of the file, they will
become progressively more frequent until they are happening almost continuously
(with little or no change in the size of the heap). Note that the "timestamp"
in the first column is the number of seconds since the process started.
Garbage collection logging will have a small imapct on performance but is
invaluable in diagnosing garbe collection related problems.
Possible reasons for excessive garbage collection are
memory leaks and
inadequate heap memory
Conflicts often arise with older or incompatible versions of classes.
Starting Resin with an empty CLASSPATH environmental variable is a first
step to eliminating these classes.
win> set CLASSPATH=
win> bin/httpd.exe
unix.sh> export CLASSPATH=""
unix.sh> bin/httpd.sh
If you have manually placed any jars in $RESIN_HOME/lib or you
jdk directories, those are possible conflicts as well.
If a new version of Resin s installed overtop of an older
version of Resin (i.e. in the same directory), some old jars
might be left. It is best to give each version it's own directory.
If the RESIN_HOME environment variable is not set, Resin may be
picking up an older version.
You can start httpd with the -verbose option to see the CLASSPATH that
is being used when Resin is started.
To monitor just the HTTP headers, the following debug logging can be
enabled in $RESIN_HOME/resin.conf:
<resin xmlns="http://caucho.com/ns/resin">
...
<log name='com.caucho.server.http' level='finer' path='log/http.log'/>
<log name='com.caucho.server.connection' level='finer' path='log/http.log'/>
...
</resin>
Intercepting and monitoring the raw data passed between a web browser and
Resin can provide valuable insight. This raw data reveals the headers and
content that are submitted by the browser, and the headers and content that are
returned by Resin.
The Apache Axis jar includes a utility
"tcpmon" that can be used to intercept and monitor the communication between a
browser and Resin. Using tcpmon, you specify a "listen port" and a "target
host" and a "target port". For example, if you usually have Resin running on
port 8080, you can start tcpmon with a "listen port" os 9090, and a target host
of localhost and a target port of 8080. Now you will use a url
http://localhost:9090 in your browser. This causes your browser
to use tcpmon. tcpmon will log the request, pass it through to Resin (on port
8080), and log the response before returning it to the browser.
Resin's default is to use the `internal' compiler because it is the most
likely to be available. Sometimes the internal compiler will start to cause
errors, causing exceptions or simply hanging and taking up a thread. The
solution is to change the compiler to `javac' in resin.conf:
<javac compiler="javac" args=""/>
Jikes is another option for the compiler, as discussed in the documentation
for .
Each thread is allocated a stack, if the size of the stack is too large,
then you may run out of memory when a higher number of threads is reached.
Techniques to adress this are discussed in JVM Tuning.
The netstat utility is useful for obtaining information from the
operating system about the current state of network usage.
unix$ netstat -anp
win> netstat -an
The -a indicates that both listening (waiting for a connection) and
non-listening (active) sockets should be displayed. The -n causes the
display of port numbers (i.e. 80) instead of port names (i.e. http). The -p
causes the display of the process that is using the socket. Since the windows
netstat is different than the Unix one, -p is not used on Windows.
The connection state is probably the most important information. The man page
for netstat has a full description of each state that is displayed.
"LISTEN" or "LISTENING" is an indication that there is a process waiting for
connections on the socket.
"TIME_WAIT" indicates that the socket is waiting after close to handle packets
still in the network. After a connection is closed, the socket is kept open by
the operating system for a short period of time. Even after the close is done,
there may be some stray packets on the network that are destined for the
connection. The idea of TIME_WAIT is that it keeps the socket open long enough
to catch those stray packets, so that they don't get erronously delivered to a
new connection opened on the same socket.
If Resin is indicating that it cannot bind to a port, it may be an
indication that another process is using the port. netstat may
help determine the process that is using the port.
Since netstat produces a lot of information, it is helpful to filter the
output to limit it to the port of interest. The following example is for port
80:
unix$ netstat -anp | grep ":80" | less
Copyright (c) 1998-2009 Caucho Technology, Inc. All rights reserved. caucho® ,
resin® and
quercus®
are registered trademarks of Caucho Technology, Inc.
|
Copyright (c) 1998-2009 Caucho Technology, Inc. All rights reserved. caucho® ,
resin® and
quercus®
are registered trademarks of Caucho Technology, Inc.
|