Skip navigation.

Monitoring and logging CPU Utilization of Virtual Machines in Xen

Oct 6 update: Added logging of disk [d] and network [n] info.
Oct 4 update: added availability option. Now uses xentop internally.
Oct 2 update: added graphing to xenstat.pl. Now xenstat.pl detects Guest VM start/shutdown and resets itself. Number of vcpus also shown. Misc bug fixes.

You can download xenstat.pl here.

Syntax

perl xenstat.pl [$mode] [$intervalsecs=5] [$nsamples=0] [$urlToPostStats]

Quick Guide

perl xenstat.pl          -- generate cpu stats every 5 secs
perl xenstat.pl 10       -- generate cpu stats every 10 secs
perl xenstat.pl 5 2      -- generate cpu stats every 5 secs, 2 samples

perl xenstat.pl d 3      -- generate disk stats every 3 secs
perl xenstat.pl n 3      -- generate network stats every 3 secs
perl xenstat.pl a 5      -- generate cpu avail (e.g. cpu idle) stats every 5 secs

perl xenstat.pl 3 1 http://server/log.php    -- gather 3 secs cpu stats and send to URL
perl xenstat.pl d 4 1 http://server/log.php    -- gather 4 secs disk stats and send to URL
perl xenstat.pl n 5 1 http://server/log.php    -- gather 5 secs network stats and send to URL

Requires xentop from Xen 3.2 or newer xentop backported to Xen 3.1.

Usage

To use run "perl xenstat.pl" in domain 0. The following output will be generated, with a new statistic generated every 5 seconds:

[root@server ~]# perl xenstat.pl 
cpus=2
       40_falcon   2.67%    2.51 cpu hrs  in 1.96 days ( 2 vcpu,  2048 M)
       52_python   0.24%  747.57 cpu secs in 1.79 days ( 2 vcpu,  1500 M)
     54_garuda_0   0.44% 2252.32 cpu secs in 2.96 days ( 2 vcpu,   750 M)
           Dom-0   2.24%    9.24 cpu hrs  in 8.59 days ( 2 vcpu,   564 M)

                    40_falc 52_pyth 54_garu   Dom-0    Idle
2009-10-02 19:31:20     0.1     0.1    82.5    17.3     0.0 *****
2009-10-02 19:31:25     0.1     0.1    64.0     9.3    26.5 ****
2009-10-02 19:31:30     0.1     0.0    50.0    49.9     0.0 *****


In the above output, the first few lines summarise the CPUs and running domains. Then we have the statistics generated every 5 seconds. At the end of each line is a simple graph. 5 stars means 90% or over CPU utilisation, 4 stars is 70% or over, etc.

You can also define the interval to poll (in seconds), and the number of samples just like vmstat:

[root@server ~]# perl xenstat.pl 3 2
cpus=2
       40_falcon   2.67%    2.51 cpu hrs  in 1.96 days ( 2 vcpu,  2048 M)
       52_python   0.24%  748.07 cpu secs in 1.79 days ( 2 vcpu,  1500 M)
     54_garuda_0   0.44% 2258.38 cpu secs in 2.96 days ( 2 vcpu,   750 M)
           Dom-0   2.24%    9.24 cpu hrs  in 8.59 days ( 2 vcpu,   564 M)

                    40_falc 52_pyth 54_garu   Dom-0    Idle
2009-10-01 12:14:59     0.0     0.0     1.7     5.7    92.5
2009-10-01 12:15:02     0.0     0.0     0.3    10.4    89.3 *

[root@server ~]#

Logging Using REST web service

To log the CPU utilisation using the Perl script, I didn't want to install a database client in Dom-0. So I added another parameter, a URL to a web server to call with the CPU info as GET parameters. I assume wget is installed in your Dom-0.

[root@server ~]# perl xenstat.pl 10 1  http://192.168.0.1/
cpus=2
     54_garuda_0  0.49%  165.81 cpu sec over 3.62 days (2 vcpu,   750 M)
    59_gyrfalcon  0.62%   69.03 cpu sec over 0.80 days (2 vcpu,  2000 M)
           Dom-0  1.57%    2.15 cpu hrs over 3.62 days (2 vcpu,   564 M)


--10:46:42--  http://192.168.0.1/?54_garuda_0=0.1&59_gyrfalcon=2.1&Dom%2D0=2.2&
Connecting to 192.168.0.1:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 498 [text/html]
Saving to: `STDOUT'

100%[============================================>] 498         --.-K/s   in 0s

10:46:42 (67.8 MB/s) - `-' saved [498/498]

2009-09-29 10:46:42  0.1  2.1  2.2 95.6

This will accumulate statistics for 10 seconds then send it to the above url in this format:

 http://192.168.0.1/?54_garuda_0=0.1&59_gyrfalcon=2.1&Dom%2D0=2.2&.

This allows you to log the data using a REST-ful web service.

Network mode [n]

Shows total network reads and writes in KBytes or MBytes for that time period.

perl xenstat.pl n

    Network I/O (K)  52_pyth 54_garu 59_gyrf   Dom-0
 2009-10-05 19:55:08       7     979       1       3
 2009-10-05 19:55:13       6    1.2M       1       1
 2009-10-05 19:55:18       5     600       2       3

Disk IO mode [d]

Shows total reads and write requests for each domain for that time period.

 perl xenstat.pl d

  Disk I/O (Reqs)    52_pyth 54_garu 59_gyrf   Dom-0
 2009-10-05 19:51:02       4       0    1317       0
 2009-10-05 19:51:07      27       0    1140       0

Availability Option [a]

Shows CPU Availability % (which is the same as CPU Idle %) instead of CPU Utilisation %.

The problem with showing CPU Utilisation occurs when you have multiple Guest VMs with different number of vcpus. If the CPU Utilisation of a guest VM is 50% can you tell whether it is already capped (vcpus = 50% of physical cpus), or can it go higher?

The solution is to reverse the CPU figures and view information in terms of Available CPU % left (100 - CPU Utilisation %). The advantage is that you know when the CPU of a guest VM are exhausted as the figures approach zero. In the example below, note that garuda has only 1 vcpu which means that cpu available is capped at 50% for garuda.

[server~ ]# xenstat a

Output:
-------
cpus=2
     40_falcon   2.33%    2.53 cpu hrs  in 2.26 days (2 vcpu,  2048 M)
     52_python   0.26%  940.55 cpu secs in 2.08 days (2 vcpu,  1500 M)
   54_garuda_0   1.48%   18.47 cpu secs in 0.01 days (1 vcpu,   750 M)
         Dom-0   2.28%    9.73 cpu hrs  in 8.89 days (2 vcpu,   564 M)

    Available CPU %  40_falc 52_pyth 54_garu   Dom-0 CPU-free
2009-10-07 18:25:20   100.0    49.9    99.8    99.5    99.1
2009-10-07 18:25:22   100.0    48.2    42.1    91.7    32.0 ***
2009-10-07 18:25:24   100.0    45.2    25.5    79.3     0.0 *****
2009-10-07 18:25:26    99.9    50.0     0.3    99.8     0.0 *****
2009-10-07 18:25:28   100.0    50.0    16.7    87.7     4.3 *****
2009-10-07 18:25:30   100.0    50.0    73.7    99.8    73.3 *

Initially in the first line of statistics below everything is quiet. With CPU Availability as the statistic, we can immediately notice that garuda has 1 vcpu (50% of 2 physical cpus) and all the others have 2 vcpus:

    Available CPU %  40_falc 52_pyth 54_garu   Dom-0 CPU-free
 2009-10-07 18:25:20   100.0    49.9    99.8    99.5    99.1

In the 2nd line, we can see:

    Available CPU %  40_falc 52_pyth 54_garu   Dom-0 CPU-free
 2009-10-07 18:25:22   100.0    48.2    42.1    91.7    32.0 ***
Now the server is getting busy (with garuda being the busiest), and the amount of CPU-free is less than each of the domains. This means that python domain has 48.2% virtual idle capacity, but at that point in time only 32% of that idle capacity can be serviced.

In the 3rd line, python is heavily loaded and there is no more spare CPU capacity.

    Available CPU %  40_falc 52_pyth 54_garu   Dom-0 CPU-free
 2009-10-07 18:25:26    99.9    0.03    50.0    99.8     0.0 *****

If we were looking at it in terms of CPU idle, it would not be obvious that python is overloaded, as you can see if we look only at CPU usage for the same statistics as the 3rd line:

[server~]# xenstat 
                     40_falc 52_pyth 54_garu   Dom-0    Idle
 2009-10-07 18:25:26     0.1   49.97    50.0     0.2     0.0 *****

I hope this is useful for anyone using Xen. This has been a good experience down memory lane too as I haven't coded in Perl for nearly 10 years!

Download xenstat.pl.

History

In Sept 2009, we started experimenting with the Xen hypervisor. In my testing, I have found that Linux performance is better on Xen than VMWare and we are considering it for Linux rollouts.

Normally when we roll out a new server for a customer, we have a simple PHP script installed as a cron job that runs vmstat and logs the CPU utilization of the server into our database every 5 minutes. It's very useful for benchmarking, monitoring and troubleshooting mysterious performance problems. I needed a similar script for Xen.

A search in Google revealed a Perl script by Tom Brown to record the Xen domain CPU utilisation.

However the following limitations led me to modify it:

  • I want total CPU utilisation to be capped at 100%, which is the way "top" works, but not the way "xm top" works.
  • Does not work properly with multi-core CPUs. CPU utilisation can go over 100%.
  • Unfortunately sleep() does not sleep for precisely the number of seconds you define causing the CPU utilization to go over 100% again. There is some perturbation, either because Dom-0 is still virtualised or some other reason.
  • No easy way of logging to a database.

So i rewrote parts of the script and renamed in xenstat.pl (after vmstat).


Other tools: see xentop, which can run in batch mode, but cannot post to web server.

The original script written by Tom Brown.