Difference between revisions of "Awk"

From Hackepedia
Jump to navigationJump to search
m (→‎Awk and multi-row documents: better wording, row-> field)
m (→‎Awk in checksumming files: awk doesn't like these backslashes)
 
(4 intermediate revisions by the same user not shown)
Line 75: Line 75:
 
you would use this awk command:
 
you would use this awk command:
  
  find . -type f | awk 'BEGIN { while ( getline < "md5list") { array[$NF] = $2 } } { "md5 " $1 | getline ; if (array[$NF] == "") print $2 " has changed" }
+
  find . -type f | awk 'BEGIN { while ( getline < "md5list") { array[$NF] = $2 } }  
 +
{ file=$1; command="md5 " file; command | getline ; if (array[$NF] == "") print $2 " has changed";
 +
close (command); }'
  
 
It would print (if I haven't changed any files):
 
It would print (if I haven't changed any files):

Latest revision as of 12:20, 5 July 2008

Introduction

Awk stands for aho, weinberger and kernighan, the respective names of its authors. It is good for manipulating rows of text by allowing the script language to parse the input file well. It is often similar to perl which takes much from awks syntax.

Awk can do many things sed can do and more. The two go hand in hand for a script writing system administrator.

Awk and /etc/passwd

It's usually a better idea to use one command instead of using a pipe to another command, where possible. A common mistake:

# grep root /etc/passwd | awk -F: '{print $7}'
/bin/bash

which can be easily done all in awk:

# awk -F: '/root/ {print $7}' /etc/passwd
/bin/bash

Awk and ps

Say you wanted to find out how much resident memory xfce4 was using on your system, and it appears most xfce applications start with "xf":

$ ps auwx | awk '/xf/{print $5}'
15924
14668
11948
11764
12944
16264
1860


If you wanted to use awk to add the results together instead of doing it manually:

$ ps auwx | awk '/xf/{ tot += $5 } END { print tot }'
69108

N.B. This can be misleading in the case of programs that use large amounts of shared memory (like java).

Awk and multi-field documents

pretend you have a file that has IP numbers where multiple can be on one row and you need to flatten these into 1 per row here is how the solution looks like:

francisco$ cat ipfile
192.168.0.1
192.168.0.2 192.168.0.3
10.0.0.1 10.0.0.2 10.0.0.3
francisco$ awk '{ for (i = 1; i <= NF ; i++) print $i; }' ipfile
192.168.0.1
192.168.0.2
192.168.0.3
10.0.0.1
10.0.0.2
10.0.0.3

Awk in checksumming files

pretend you have some files, here is a list:

francisco$ ls
four    md5list one     three   two

And you want to check these against a list called md5list (listed above too), the list is created like so:

francisco$ md5 * > md5list
francisco$ cat md5list
MD5 (four) = 48a24b70a0b376535542b996af517398
MD5 (md5list) = d41d8cd98f00b204e9800998ecf8427e
MD5 (one) = b026324c6904b2a9cb4b88d6d61c81d1
MD5 (three) = 4fbafd6948b6529caa2b78e476359875
MD5 (two) = 26ab0db90d72e28ad0ba1e22ee510510

And at a later time you wanted to check these checksums against your files to tell you what changed you would use this awk command:

find . -type f | awk 'BEGIN { while ( getline < "md5list") { array[$NF] = $2 } } 
{ file=$1; command="md5 " file; command | getline ; if (array[$NF] == "") print $2 " has changed"; 
close (command); }'

It would print (if I haven't changed any files):

== "") print $2 " has changed" }'                                             <
(./md5list) has changed

Only the list itself in this case but if I were to change one:

francisco$ echo 87 >> three
== "") print $2 " has changed" }'                                             <
(./three) has changed
(./md5list) has changed

It tells that the "three" file has changed.

You can run this on your system with slight modifications, notice it takes a lot of ram because it builds an array in memory at beginning. Also in the professional world people use a script called "tripwire" to checksum and check files for signs of replacement.

Also note that the example is done on an OpenBSD system where md5 is the same as Linux's md5sum utility but has a slightly different output, md5 in this example is in the PATH.