JBOSS – total CPU time compared to total page response time

I thought this was interesting. By taking the total response time of pages from the access log, you can take the amount of server CPU used and compare it to the total elapsed request time. This works for web servers, so if you have a lot of business logic code on the server that wouldn’t be accounted for in page response times, this won’t be accurate. It’s not perfect, but it’s definitely good enough for conversation.

So how do we do this?

First, we take the CPU used in the period in which we are interested…

-bash-4.1$ sar -s 11:30:00 -e 11:40:00
Linux 2.6.32-220.el6.x86_64 (foobar1.com)        09/27/2014      _x86_64_        (1 CPU)

11:30:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
11:31:01 AM     all     72.68      0.00     11.04      0.00      0.00     16.28
11:32:01 AM     all     39.67      0.00      3.35      0.03      0.00     56.95
11:33:01 AM     all     31.82      0.00      1.08      0.02      0.00     67.08
11:34:01 AM     all     31.81      0.00      2.01      0.08      0.00     66.09
11:35:01 AM     all     33.78      0.00      1.63      0.03      0.00     64.56
11:36:01 AM     all     26.67      0.00      1.22      0.03      0.00     72.08
11:37:01 AM     all     28.18      0.00      1.17      0.08      0.00     70.57
11:38:01 AM     all     28.43      0.00      1.17      0.02      0.00     70.39
11:39:01 AM     all     24.21      0.00      0.95      0.07      0.00     74.77
Average:        all     35.19      0.00      2.61      0.04      0.00     62.16
-bash-4.1$

We then take the time period in which we are interested and get the total page response time. As shown above, we have only a single CPU, so we adjust the printed elapsed time percentages to account for this…

-bash-4.1$ awk '$3 ~ "2014:11:3" {split($3,t,":");s[t[2]":"t[3]]+=$NF} END {for (i in s) {print i,s[i]/1000/(1*60)*100}}' /var/log/jbossas/p_ecm_01/localhost_access_log.2014-09-27.log  | sort
11:30 293.29
11:31 174.033
11:32 57.7817
11:33 60.3083
11:34 61.2817
11:35 56.05
11:36 55.3567
11:37 51.3367
11:38 48.24
11:39 34.8517

As you can visually see above, early in the test we used 72% of our available CPU time, and 293% of wall clock time (any system has an infinite capacity for waiting). As such, CPU was less than a quarter of wait time. However, as we near the end of the test, we see we use 24% of available CPU time, and the corresponding wall clock time is 34%. As such, CPU as a percentage of total response time increases to over 70% as we get farther into the test. This is a cached application, so early in the test, wait times were higher as data was fetched from the database. As a result, CPU time as a percentage of total response time was lower. However, later in the test when data was being managed in memory, CPU started to take more of the pie as less external waits occurred.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.