Awk
Introduction
Awk stands for aho, weinberger and kernighan, the respective names of its authors. It is good for manipulating rows of text by allowing the script language to parse the input file well. It is often similar to perl which takes much from awks syntax.
Awk can do many things sed can do and more. The two go hand in hand for a script writing system administrator.
Awk and /etc/passwd
It's usually a better idea to use one command instead of using a pipe to another command, where possible. A common mistake:
# grep root /etc/passwd | awk -F: '{print $7}' /bin/bash
which can be easily done all in awk:
# awk -F: '/root/ {print $7}' /etc/passwd /bin/bash
Awk and ps
Say you wanted to find out how much resident memory xfce4 was using on your system, and it appears most xfce applications start with "xf":
$ ps auwx | awk '/xf/{print $5}' 15924 14668 11948 11764 12944 16264 1860
If you wanted to use awk to add the results together instead of doing it manually:
$ ps auwx | awk '/xf/{ tot += $5 } END { print tot }' 69108
N.B. This can be misleading in the case of programs that use large amounts of shared memory (like java).
Awk and multi-field documents
pretend you have a file that has IP numbers where multiple can be on one row and you need to flatten these into 1 per row here is how the solution looks like:
francisco$ cat ipfile 192.168.0.1 192.168.0.2 192.168.0.3 10.0.0.1 10.0.0.2 10.0.0.3
francisco$ awk '{ for (i = 1; i <= NF ; i++) print $i; }' ipfile 192.168.0.1 192.168.0.2 192.168.0.3 10.0.0.1 10.0.0.2 10.0.0.3
Awk in checksumming files
pretend you have some files, here is a list:
francisco$ ls four md5list one three two
And you want to check these against a list called md5list (listed above too), the list is created like so:
francisco$ md5 * > md5list francisco$ cat md5list MD5 (four) = 48a24b70a0b376535542b996af517398 MD5 (md5list) = d41d8cd98f00b204e9800998ecf8427e MD5 (one) = b026324c6904b2a9cb4b88d6d61c81d1 MD5 (three) = 4fbafd6948b6529caa2b78e476359875 MD5 (two) = 26ab0db90d72e28ad0ba1e22ee510510
And at a later time you wanted to check these checksums against your files to tell you what changed you would use this awk command:
find . -type f | awk 'BEGIN { while ( getline < "md5list") { array[$NF] = $2 } } { file=$1; command="md5 " file; command | getline ; if (array[$NF] == "") print $2 " has changed"; close (command); }'
It would print (if I haven't changed any files):
== "") print $2 " has changed" }' < (./md5list) has changed
Only the list itself in this case but if I were to change one:
francisco$ echo 87 >> three == "") print $2 " has changed" }' < (./three) has changed (./md5list) has changed
It tells that the "three" file has changed.
You can run this on your system with slight modifications, notice it takes a lot of ram because it builds an array in memory at beginning. Also in the professional world people use a script called "tripwire" to checksum and check files for signs of replacement.
Also note that the example is done on an OpenBSD system where md5 is the same as Linux's md5sum utility but has a slightly different output, md5 in this example is in the PATH.