Console Monitoring Tools for SUSE Linux Enterprise Server
In this article you may learn about tools for finding errors, spotting bottlenecks or just keeping an eye on your server. SUSE Linux Enterprise server has some built-in command line tools that are suitable for these purposes. We discuss the following commands: top, iostat, sar, free, pmap, uptime, smartctl, and strace.
Packages
The following packages needed:
- sysstat (sysstat-6.0.2-16.4 in SLES 10)
- smartmontools (smartmontools-5.33-20.2 in SLES 10)
- coreutils (coreutils-5.93-22.2 in SLES 10)
- strace (strace-4.5.14-15.2 in SLES 10)
top
Top is a commonly used tool for viewing the list of processes which consume the most resources. It also displays a summary of CPU and memory usage. Example 1 shows a sample top output (using the default fields). The most important commands during running Top are following:
- h: Displays the help.
- d: Set update interval, the default is 3 seconds.
- k {PID}: kills a process identified by PID.
- F: Select sort order. A 2nd screen appears where you can select the sort field.
- f: Select fields to display. A 2nd screen appears where you can select the fields.
top - 12:47:23 up 3:21, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 51 total, 2 running, 49 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 646256k total, 298608k used, 347648k free, 56388k buffers Swap: 530104k total, 0k used, 530104k free, 186056k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 16 0 716 280 244 S 0.0 0.0 0:01.31 init 2 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 3 root 10 -5 0 0 0 S 0.0 0.0 0:00.13 events/0 4 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khelper 5 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kthread 7 root 10 -5 0 0 0 S 0.0 0.0 0:00.02 kblockd/0 8 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid 112 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pdflush 113 root 15 0 0 0 0 S 0.0 0.0 0:00.12 pdflush 115 root 19 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0 114 root 25 0 0 0 0 S 0.0 0.0 0:00.00 kswapd0 321 root 16 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/0 322 root 10 -5 0 0 0 S 0.0 0.0 0:00.02 kseriod 362 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kpsmoused 763 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 reiserfs/0 827 root 12 -4 1832 648 348 S 0.0 0.1 0:00.63 udevd 1346 root 20 0 0 0 0 S 0.0 0.0 0:00.00 shpchpd_event
The fields in this sample are:
- PID: The Process ID of the running software.
- USER: The user who is running the command.
- PR: Priority of the process
- NI: Niceness level
- VIRT: Memory usage of the process. Contains the memory used by the code, the data and the stacks, in kB.
- RES: Usage of physical memory, in kB.
- SHR: Memory shared with other processes, in kB.
- S: State of the process. State can be D (interruptible sleeping), S (Sleeping), R (Running), T (stopped or Traced) or Z (Zombie).
- %CPU: CPU usage, in percent.
- %MEM: Memory usage, in percent.
- TIME+: CPU Time.
- COMMAND: The name of the process.
You can exit Top by pressing Ctrl-C.
free
Free is used for viewing memory usage. It displays the total amount, the used and the available memory and swap space. Using free with -b,-k,-m or -g options show output in bytes, kB, MB, or GB. Example 2 shows a sample output of free.
server01:~ # free -m total used free shared buffers cached Mem: 631 291 339 0 55 181 -/+ buffers/cache: 54 576 Swap: 517 0 517 server01:~ #
uptime
Uptime has three often used functions (see Example 3):
- Shows how long the computer has been running
- Displays the number of logged in users
- Shows system load. You can find more info about system load in this Wikipedia article.
server01:~ # uptime 1:32pm up 4:06, 3 users, load average: 1.41, 0.52, 0.19 server01:~ #
In this case, the system time is 1:32pm, the system has been running for 4 hours and 6 minutes, 3 users are logged in and the load numbers are 1.41, 0.52, 0.19 (1, 5 and 15 minutes average).
pmap
Pmap shows the memory usage of a process along with the underlying files. With pmap you can trace processes that eat up memory.
server01:~ # pmap 1972 1972: acpid START SIZE RSS DIRTY PERM MAPPING 08048000 16K 16K 0K r-xp /sbin/acpid 0804c000 4K 4K 4K rw-p /sbin/acpid 0804d000 136K 20K 20K rw-p [heap] b7dea000 4K 4K 4K rw-p [anon] b7deb000 1124K 380K 0K r-xp /lib/libc-2.4.so b7f04000 8K 8K 8K r--p /lib/libc-2.4.so b7f06000 8K 8K 8K rw-p /lib/libc-2.4.so b7f08000 12K 8K 8K rw-p [anon] b7f11000 8K 8K 8K rw-p [anon] b7f13000 104K 32K 0K r-xp /lib/ld-2.4.so b7f2d000 8K 8K 8K rw-p /lib/ld-2.4.so bfdb8000 88K 8K 8K rw-p [stack] ffffe000 4K 0K 0K ---p [vdso] Total: 1524K 504K 76K 268K writable-private, 1256K readonly-private, and 0K shared server01:~ #
smartctl
Smartctl displays statistics for the hard disk subsystems; useable only when the hard drive is S.M.A.R.T. capable. The most important options for this command are:
- -i: Displays general information about the hard drive (Example 5). Note: If the drive is SMART capable but the feature is turned off you can turn it on using the smartctl -s on {device} command.
server01:~ # smartctl -i /dev/hda smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: Maxtor 2B020H1 Serial Number: B1HZYECE Firmware Version: WAK21R90 User Capacity: 20,490,559,488 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 Local Time is: Tue Jan 2 15:05:06 2007 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled server01:~ #
- -c: Shows the hard drive’s capabilites
- -H: Do some health tests.
- -A: Displays the drive’s attributes. This is very useful for spotting a hard drive that is going to fail. Example 6 shows output for an old disk:
server01:~ # smartctl -AH /dev/hda smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 3 Spin_Up_Time 0x0027 233 232 063 Pre-fail Always - 6399 4 Start_Stop_Count 0x0032 253 253 000 Old_age Always - 179 5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 0 6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0 7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0 8 Seek_Time_Performance 0x0027 249 238 187 Pre-fail Always - 56532 9 Power_On_Minutes 0x0032 251 251 000 Old_age Always - 1030h+44m 10 Spin_Retry_Count 0x002b 253 252 157 Pre-fail Always - 0 11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 253 253 000 Old_age Always - 254 192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 43 193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 313 194 Unknown_Attribute 0x0032 253 253 000 Old_age Always - 0 195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 13 196 Reallocated_Event_Count 0x0008 253 253 000 Old_age Offline - 0 197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 0 198 Offline_Uncorrectable 0x0008 253 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age Offline - 0 200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0 201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 0 202 TA_Increase_Count 0x000a 253 252 000 Old_age Always - 0 203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always - 0 204 Shock_Count_Write_Opern 0x000a 253 252 000 Old_age Always - 0 205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age Always - 0 207 Spin_High_Current 0x002a 253 252 000 Old_age Always - 0 208 Spin_Buzz 0x002a 253 252 000 Old_age Always - 0 209 Offline_Seek_Performnce 0x0024 253 253 000 Old_age Offline - 0 99 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 100 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 101 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 server01:~ #
Note 1: smartmontools works only if you use independent disks, software RAID or 3ware RAID controllers. For other RAID controllers please use software supplied by the vendor.
Note 2: smartmontools has a daemon called smartd which monitors hard disks continuously.
Note from Yast installer: To prevent system hangs from buggy devices, smartd is turned off by default. Please test smartd manually first and then turn it on via the Runlevel Editor or by /sbin/chkconfig -add smartd.
iostat
The iostat tool reports statistics about CPU and input/output rates of disks or partitions. The main options for this command are:
- -c: Reports only CPU statistics.
- -d: Reports only device utilization. Note: Cannot be used together with the -c option.
- -p {device | ALL}: Display statistics for the partitions of a drive. Statistics for all block devices will be displayed if used with ALL.
- -x: Display extended disk reports. Note: Cannot be used together with the -p option.
Command usage: iostat [options] [delay] [repeats]
In Example 7 iostat displays usage for all partitions on /dev/hda 5 times with 1 second delay, in kB. It looks that /dev/hda2 is in use, the swap partition (/dev/hda1) is idle.
server01:~ # iostat -d -k -p hda 1 5 Linux 2.6.16.21-0.8-default (server01) 01/03/07 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn hda 2.30 31.00 7.75 3271938 817668 hda2 3.86 30.84 2.49 3254805 262476 hda1 1.33 0.16 5.14 16805 542232 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn hda 315.84 1263.37 0.00 1276 0 hda2 315.84 1263.37 0.00 1276 0 hda1 0.00 0.00 0.00 0 0 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn hda 214.00 856.00 0.00 856 0 hda2 214.00 856.00 0.00 856 0 hda1 0.00 0.00 0.00 0 0 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn hda 273.00 1056.00 3608.00 1056 3608 hda2 1163.00 1052.00 3600.00 1052 3600 hda1 0.00 0.00 0.00 0 0 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn hda 315.00 472.00 2780.00 472 2780 hda2 1093.00 476.00 3896.00 476 3896 hda1 0.00 0.00 0.00 0 0 server01:~ #
strace
Strace as a diagnostic tool for debugging, hacking programs, scripts. You can find all the system calls by tracing programs you run.
The best way to understand strace is Example 8. In this example we make a text file called test.txt and trace the viewing of this file with cat.
The file contains this text: “>>>> This is the test file’s content <<<<” To do this we enter the following command: strace cat test.txt
server01:~ # strace cat test.txt execve("/bin/cat", ["cat", "test.txt"], [/* 55 vars */]) = 0 brk(0) = 0x804d000 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fa3000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=28008, ...}) = 0 mmap2(NULL, 28008, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f9c000 close(3) = 0 open("/lib/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\300Y\1"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=1404242, ...}) = 0 mmap2(NULL, 1176988, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7e7c000 madvise(0xb7e7c000, 1176988, MADV_SEQUENTIAL|0x1) = 0 mmap2(0xb7f95000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x118) = 0xb7f95000 mmap2(0xb7f99000, 9628, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f99000 close(3) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7e7b000 set_thread_area({entry_number:-1 -> 6, base_addr:0xb7e7b6b0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 mprotect(0xb7f95000, 8192, PROT_READ) = 0 munmap(0xb7f9c000, 28008) = 0 brk(0) = 0x804d000 brk(0x806e000) = 0x806e000 open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/share/locale/locale.alias", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=2528, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fa2000 read(3, "# Locale name alias data base.\n#"..., 4096) = 2528 read(3, "", 4096) = 0 close(3) = 0 munmap(0xb7fa2000, 4096) = 0 open("/usr/lib/locale/en_US.UTF-8/LC_CTYPE", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/lib/locale/en_US.utf8/LC_CTYPE", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=208464, ...}) = 0 mmap2(NULL, 208464, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7e48000 close(3) = 0 open("/usr/lib/gconv/gconv-modules.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=25404, ...}) = 0 mmap2(NULL, 25404, PROT_READ, MAP_SHARED, 3, 0) = 0xb7f9c000 close(3) = 0 fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0 open("test.txt", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=42, ...}) = 0 read(3, ">>>> This is the test file\'s con"..., 4096) = 42 write(1, ">>>> This is the test file\'s con"..., 42>>>> This is the test file's content <<<< ) = 42 read(3, "", 4096) = 0 close(3) = 0 close(1) = 0 exit_group(0) = ? Process 3382 detached server01:~ #
The important parts are highlited with bold. You can see that the /bin/cat command has been executed (it uses shared objects like libc.so.6), then cat opens the file called test.txt, reads its content and writes it to standard output. In last line strace tells us the process finished its running.
You may find it hard to use in the beginning but in some cases strace is indispensable.
Try these tool on your sytem as they are not hard to use and you can rely on them when nothing else works (ie. Tools on graphical interfaces).
No comments yet