UNIX/Linux & C Programming:
Chapter nn: Regular Expressions, Pattern Matching, and Filters



Coverage: [UPE] Chapter 4
(regular expressions §4.1 (pp. 101-105),
filters §4.2 (pp. 106-108)
sed §4.3 (pp. 108-114)
awk §4.4 (pp. 114-131))


Outline


Regular expressions

  • a regular expression (RE) defines one or more strings of characters; is said to match any string it defines (e.g., /abc/ is an RE which matches abc)
  • the strings matched by a regular expression can be recognized with a finite state automaton (FSA)
    • has limited recognition capabilities (e.g., no memory) and, therefore, cannot match parentheses
    • FSA for recognizing positive integers and identifiers in C: [1-9][0-9]* + [_a-zA-Z][_a-zA-Z0-9]*
  • built using a combination of literal and metacharacters
  • a character is any character except a newline: a-z A-Z 0-9 () = ; : ,
  • a metacharacter (or special character) is a character which represents something other than itself:
      . * [] ^ - $ / + ? | ( ) \{ \}
  • a delimiter is a special character marking the start or end of a regular expression; we use / here
  • see regexp(5) manpage


Who /uses/ [Rr]eg.lar [Ee]xpre[s*]ions\?

Regular expressions are used by many UNIX utilities, including editors and filters:

  • the shell
  • ex (UNIX line editor; interactive)
  • vi (UNIX visual editor; interactive)
  • emacs (general-purpose editor)
  • tr (character translation tool)
  • grep (global regular expression print; file searching tool/utility; returns entire matched line, not just matched string)
  • sed (UNIX stream editor; non-interactive)
  • awk (pattern scanning and processing language)
  • perl (practical extraction report language; based on the UNIX shell and sed and awk)
  • py (Python scripting language)


Using grep

  • print lines matching a pattern
  • examples:
      $ grep "abc" filename
      
      prints out all lines in the given file containing abc somewhere in them
      $ grep -i "abc" filename
      
      same as above, but ignores case of the desired string
      $ grep -v "abc" filename
      
      prints out all lines in the given file which do not contain abc anywhere in them
      $ grep -i path .login .tcshrc
      
      $ grep -f searchstrings .login .tcshrc
      
      causes grep to look for search strings in the file following the -f

  • quotes are optional around regular expressions which do not contain spaces or other shell metacharacters


Special characters

  • period: .
    • matches any single character

    • /a.c/ matches abc  adc  aec  a=c  a:c
      /x..x/ matches xaax  xavx  x=kx

  • asterisk: *
    • matches zero or more occurrences of the previous RE
    • notice that this is different than the shell wildcard meaning
    • /ab*c/ matches ac  abc  abbc  abbbbbbbbbbbbbbbbc
      /a*/ matches ""  a  aa  aaaaaaaaaa
      /a*b*c*/ matches ?
      /.*/ matches ?

  • square brackets, the character class symbol: []
    • indicates a set of characters, any one of which can match
    • * and $ lose their special meaning
    • ^ at the start means NOT
    • - between characters refers to a range

    • /[Mm]ark/ matches mark  Mark
      /t[aeiou]x/ matches tax  tex  tix  tox  tux
      /[abc].*/ matches anything beginning with a or b or c
      /[a-z][a-z]/ matches any two-letter lower-case string
      /[a-zA-Z]*/ matches any word made of letters
      /[^abc].*/ matches anything starting with something besides a or b or c
      /[a-zA-Z0-9_]*/ matches ?

    • to match a literal ^ in a character class, put it somewhere other than in the first position (e.g., [a-z^])
    • to match a literal - in a character class, put it somewhere other than in between two characters (e.g., [-a-z])
    • all other metacharacters are literal in a character class
    • therefore, context matters

  • caret: ^; outside a character class means `beginning of line'

    /^T/ matches all lines starting with T
    /^[0-9]/ matches ?

  • dollar sign: $; outside of a character class means `end of line'

    /T$/ matches all lines ending with T
    /^$/ matches ?

  • backslash: \; used to escape special characters

    /\./ matches .
    /a\*b/ matches a*b


Protecting metacharacters from multiple levels of interpretation

  • kernel
  • shell
  • grep, ...


Regular expression examples

  • social security numbers: [0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9] (yes, it is rather long winded, but we shorten it; see below)
  • legal C identifier: [a-zA-Z_][a-zA-Z0-9_]*


Regular expression rule

  • REs always match the longest string possible starting from the beginning of the line
  • example:
    This (rug) is not what it once was (a long time ago), is it?
    
    /Th.*is/ matches ?
    /(.*)/ matches ?


Full regular expressions

(used in egrep)


  • as oppossed to basic (also called simple or limited) regular expressions used in grep
  • egrep is extended grep

  • plus: +; like *, but matches one or more occurrences of the preceding RE

    /ab+c/ matches abc  abbc  abbbc but not ac
    ..* = .+

  • question mark: ?; matches zero or one occurrences of the previous RE

    /ab?c/ matches ac  abc

  • logical or: |; matches either the RE before or the RE after the vertical bar

    /abc|def/ matches abc  def

  • parentheses ( ); can be used to group REs for use with *, ?, +, |, and so on

    /ab(c|d)ef/ matches abcef  abdef
    /((abcef)|(abdef))/ matches abcef  abdef
    /ab(cd|de)fg/ matches abcdfg  abdefg

    depending on the program (see below), you may need to use \( and \) instead

  • set braces \{ \}; used to specify repetitions of a RE

    /[0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\}/ matches ssns
    /a\{4,\}/ matches 4 or more a's (n or more)
    /[a-z]\{3,5\}/ matches 3 to 5 lower case letters (n thru m, with n <= m; range)

  • fgrep: self-study


Subtle point about regular expressions

  • in grep and ex/vi, ( and ) characters used alone match themselves, while \( and \) are used for grouping
  • egrep uses the opposite conventions
  • \{ and \} are special in grep and ex/vi
  • see [UIAN] Chapter 6 (pp. 295-301) and, especially, Tables 6-1 and 6-2 (pp. 296-297)


ex (line editor)

  • vi is close to a full programming language because of its use of ex
  • a masterpiece in user-interface software design
  • approaches to studying: memorize commands or learn/know general syntax
  • general syntax of ex commands: :[address]command[options]
  • deleting all blank lines:
    • :g/^$/d
    • grep -v '^$'
  • example addresses
    • 10,20 (lines 10 thru 20)
    • .,100 (current line thru line 100)
    • .,$ (current line thru last line of file)
    • % = 1,$
  • :set list (display each TAB as ^Is and EOLs as $)
  • :set nolist
  • search and replace: :%s/RE/replacement_text/g (same as 1,$s/RE/replacement_text/g); examples:
    • :%s/Alice/Lucy/g (the g makes it global, i.e., replace all occurrences, not just the first, on each line)

    • %s/hello/& world/g (& represents the matched text)

    • :%s/[TAB]/   /g (replaces TABs with 3 consecutive spaces on every line)

    • %s/[ TAB][ TAB]*$// (purges trailing whitespace on every line)

    • :%s/fprintf/FPRINTF/g (replaces all occurrences of fprintf with FPRINTF)

    • :.,$s/fprintf/FPRINTF/g (replaces occurrences of fprintf from the current line (.) to the last line of the file ($) with FPRINTF)

    • :10,20s/fprintf/FPRINTF/g (replaces occurrences of fprintf from line 10 to 20 with FPRINTF)

    • :%s/^\([A-Z][a-z-]*\)[,][ ]\([A-Z][a-z-]*\)$/\2 \1/ (converts names from <last>, <first> format to <first> <last> format)

    • :%s/^\([[:alpha:]]*\)[ ]\([[:alpha:]]*\)$/\2, \1/ (undoes the previous transformation)

  • move text: :100,200m. (moves lines 100 thru 200 to the current line)
  • another example: :10,20w newfile (extracts lines 10 thru 20 and writes them to newfile)


sed (stream editor)


Essential sed

  • (non-interactive) stream editor

  • beginnings of a complete command language
  • execution model for each line in the input stream:
    1. read input line into pattern space,
    2. apply commands to pattern space,
    3. send pattern space to stdout.



  • similar syntax to ex
  • basic syntax: <condition><action>
  • detailed syntax: [<address>[,<address>]][!]<command>[<arguments>]



  • conditions actions
    /RE/ d
    m,n p
    $ q
    <condition>! s/RE/string/
    <condition1>,<condition2>! w <filename>
    i
    a

  • invoking sed
    • sed '<edit commands>' <file(s)>
    • cat <file(s)> | sed '<edit commands>'
    • sed -f <edit commands file> <file(s)>

  • -e option
    • sed -e '{ ... }' file (... represents more than one editing command expression on separate lines, and address space applies to all commands ...)
    • if curly braces ({ }) omitted, put an individual, and possibly distinct, address for each editing command expression



  • use of -n option which suppresses output (step 3 above) (with and without p action or d action)
  • without -n option, p action assumed
  • two examples which produced the same output
    • one with -n: sed -n '/one/p' file
    • one without -n: sed '/one/!d' file
  • sed -n '/<RE>/p' file = grep <RE> file

  • sed is Turing complete


Some representative examples

  • sed 's/[TAB]/   /g' main.c # converts every TAB to three consecutive spaces on every line (will changes take effect in the file main.c?)
  • sed 's/[ TAB][ TAB]*$//' main.c # purges trailing whitespace from each line
  • sed 's/index1/index2/g' main.c # replace string index1 with string index2
  • sed -n '20,30p' file
  • sed '1,10d' file
  • sed '$d' file
  • du -a | sed 's/.*[TAB]//' (ref. [UPE] p. 109)
  • sed 's/^\([A-Z][a-z-]*\)[,][ ]\([A-Z][a-z-]*\)$/\2 \1/' file
  • sed '10,20w newfile' file
  • sed '1,/^$/d' file
  • sed -n '/^$/,/^end/p' file
  • sed 's/^/[TAB]/' file (ref. [UPE] p. 109)
  • sed '/./s/^/[TAB]/' file (ref. [UPE] p. 110)
  • sed '/^$/!s/^/[TAB]/' file (! inverts the condition) (ref. [UPE] p. 110)
  • deleting line(s) which contain the strings one or two
    sed '/one/d
         /two/d' file
    
  • put the editing commands above in a file commands.sed and invoke: sed -f commands.sed <file(s)>


More examples

For the remainder of these notes, consider the following file named faculty.details:
    Name: Mehdi Zargham Office: 139 Anderson Hall Course: ASI 150
    Name: Raghava Gowda Office: 142 Anderson Hall Course: CPS 310
    Name: James P. Buckley Office: 146 Anderson Hall Course: CPS 430/530
    Name: Dale Courte Office: 144 Anderson Hall Course: CPS 387
    Name: Saverio Perugini Office: 145 Anderson Hall Course: CPS 444/544
    Name: Zhongmei Yao Office: 150 Anderson Hall Course: CPS 341
    
Examples:
    sed -n '/CPS/p' faculty.details # same as grep CPS faculty.details
    
    sed '/CPS/!d' faculty.details # same as above
    
    sed -n '/[/]/p' faculty.details # prints lines with a cross-listed course; same as sed -n '/\//p' or grep '\/' faculty.details
    
    sed '/\//d' faculty.details # print lines containing a non-cross-listed course; same as grep -v '\/' faculty.details
    
    sed 's/^Name:[ ]//' faculty.details # removes "Name: " from file faculty.details
    
    sed 's/^Name:[ ]//' faculty.details | sed 's/Office:[ ]//' # removes "Name: " & "Office: " from faculty.details
    
    # how can we purge all attribute labels (i.e., "Name: ", "Office: ", "Course: ")? multiple ways:
    
    sed 's/[A-Za-z][A-Za-z]*: //g' faculty.details
    
    sed 's/[A-Za-z]+: //g' faculty.details # will not work, since sed uses basic regular expressions and not full REs
    
    sed 's/[A-Za-z]\{1,\}: //g' faculty.details
    
    sed 's/^Name:[ ]//' faculty.details | sed 's/Office:[ ]//' | sed 's/Course:[ ]//' # purges all attribute labels
    
    sed 's/^Name:[ ]//;
         s/Office:[ ]//;
         s/Course:[ ]//' faculty.details
    
    cat sedfile
    s/^Name:[ ]//
    s/Office:[ ]//
    s/Course:[ ]//
    
    sed -f sedfile faculty.details
    
    sed 's/^Name:[ ]\(.*\)Office:[ ]\(.*\)Course:[ ]\(.*\)$/\1\2\3/' faculty.details
    
    sed 's/[A-Za-z][A-Za-z]*:[ ]//g' faculty.details
    


d for delete

  • delete lines from the output stream, not original file

  • examples:
    • sed 'd' faculty.details reads in one line at a time into a buffer (work space), deletes it, and prints the contents of the buffer (in this case, empty)
    • sed '1d' faculty.details reads in one line at a time into the buffer, deletes it if it is line 1, and prints the buffer contents onto output (in this case, all lines except 1 would be output)
    • sed '$d' faculty.details does the same, but for the last line
    • sed '2,4d' faculty.details deletes lines from 2 up to and including line 4
    • sed '/Yao/,/ran/d' faculty.details deletes lines starting from one which matches Yao up to and including one which matches ran
    • sed '/Yao/,/ran/!d' faculty.details negates the address (i.e., do not delete these lines, and delete others)


p for print

  • print lines from the buffer

  • examples:
    • sed 'p' faculty.details reads in one line at a time into the buffer and prints each. Notice that by default sed prints what is in the buffer. Therefore, you will get two copies of each line.
    • in sed -n 'p' faculty.details, the -n suppresses the default print action of sed. Therefore, this is the equivalent of doing a cat.
    • we can use the same addressing commands as before (e.g., sed -n 4,6 'p' faculty.details prints lines 4 through 6).


More sed jargon

  • = prints (just) the line number
  • a appends text at the end of the buffer; use it as a\ followed by what you want to append
  • b branches out of pattern matching (i.e., stop attempting to make more matches)


Exercises

Write sed commands/scripts to do the following:
  • delete all blank lines in the file: sed '/^$/d' faculty.details
  • print the lines pertaining to faculty who have offices in Anderson Hall: sed -n '/Anderson Hall/p' faculty.details
  • find the line numbers describing faculty who teach non-cross-listed undergraduate courses: sed -n '/[/]/=' faculty.details
  • You are that Perugini is an assistant professor, and all other professors are associate professors. Print each professor's rank on a separate line, after the given line, in the form Rank: .
      /Perugini/ {
         a\
         Rank: Assistant Professor
         p
         b
      }
      {
         a\
         Rank: Associate Professor
         p
      }
      
    Put the editing commands above in a file rank.f and invoke it as: sed -n -f rank.f faculty.details. Note that the b commands are important, otherwise since the last command is supposed to work for all lines (note the lack of addresses), everybody will also be listed as an assistant professor. Also note that you can append multiple lines, each must be followed by a \ except the last line (observe the \ after the a command). The braces { and } must be where they are (i.e., the { must end the first line and the } must be on a line by itself).

  • print the lines in the format <name>:<office>:<course> (i.e., strip the headers Name: and Office and Course:: sed 's/Name: \(.*\) Office: \(.*\) Course: \(.*\)/\1:\2:\3/' faculty.details)
  • print the lines in the format <course>:<office>:<name> sed 's/Name: \(.*\) Office: \(.*\) Course: \(.*\)/\3:\2:\1/' faculty.details
  • break down every entry onto three lines: sed 's/Name: \(.*\) Office: \(.*\) Course: \(.*\)/\1\n\2\n\3/' faculty.details


A tale of two buffers

Normally, sed reads one line at a time into its main buffer (sometimes called the pattern buffer). There is another buffer (called the hold buffer) available for use. Some commands to work with this buffer include:
  • h copies the contents of the main buffer into the hold buffer, thus overwriting whatever it was that was already in the hold buffer
  • g copies the contents of the hold buffer into the main buffer, overwriting it
  • H does the same as h, except it appends the contents of the main buffer after the last line in the hold buffer
  • G does the same as g, again in the `append' sense
  • x exchanges contents of the two buffers; what was in hold buffer is now in the pattern buffer, and vice versa; a buffer (work space), deletes it, and prints the contents of the buffer (in this case, empty)
  • N reads in an additional line and appends it to the contents of the pattern buffer; in between the original line and the newly added line, N will insert a newline (\n) character; useful for reading in multiple lines at a time (see flip example below)


More exercises

Write sed commands/scripts to (put the solutions in a separate file, and invoke using sed -n -f option):
  • Suppose the department is moving. move faculty in Anderson Hall to the Science Center and move those in the Science Center to Miriam Hall. Let faculty keep their old office numbers because they believe their numbers are lucky.
      s/Science Center/Miriam Hall/
      s/Anderson Hall/Science Center/
      
    Notice that you have to do the transformations in this order, else everybody gets assigned to Miriam Hall! This example shows that sed reads in one line at a time, applies all the commands sequentially, then picks the next line, and so on. This is in contrast to reading all lines at once, applying the first command, then reading all again, applying the second command, and so on.

  • Pretty print the file so that each line has one line before it describing what it is about (e.g., "The next line is about Zhongmei Yao") before the first line.
      {
         h  # hold buffer now contains what was matched
         s/Name: \(.*\) Office: .* Course: .*/The next line is about: \1/
         G # appends hold buffer to pattern buffer
         p
      }
      
  • Completely capitalize the names of faculty.
      {
          h # save the current line in hold buffer
          s/Name: \(.*\) Office: .* Course: .*/\1/
          y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
          G # current buffer contains a capital name, newline, old line
          s/\(.*\)\nName: \(.*\) Office: \(.*\)/Name: \1 Office: \3/
          p
      }
      
  • Flip alternate lines
      $p
      {
         N # read the next line, we now have two lines
         s/\(.*\)\n\(.*\)/\2\n\1/ # flip the two lines
         p # print it
      }
      
  • Delete all the blank lines.
      /^$/ {
         d
         b
      }
      p
      
  • Replace multiple blank lines wherever they occur with just one blank line.
      /^$/ {
         N
         /^\n$/D
      }
      p
      
    Notice that this uses a new command, namely D. D is just like d, it deletes the contents of the pattern (main) buffer. However, while d deletes the entire buffer, D deletes only until the first embedded newline.


Filters


tr (ansliterate)

  • only reads from standard input
  • syntax: tr <string1> <string2>
  • converts characters in <string1> to those, respectively, in <string2>
  • tr A-Z a-z < myfile
  • options:
    • tr -d (delete character(s) in <string1>)
    • tr -c (act on complement of <string1>)
    • tr -s (squeeze strings of repeated characters)


sort

  • can be fine-tuned to sort columns in a variety of ways
  • sort -n (numeric-sort: compare according to string numerical value)
  • sort -g (general-numeric-sort: compare according to general numerical value)
  • sort -r (reverse sort: reverse the result of comparisons)
  • sort -rn (reverse numeric-sort)
  • sort -d (dictionary order: consider only blanks and alphanumeric characters)
  • sort -b (ignore leading blanks)
  • sort -f (ignore-case: fold lower case to upper case characters)
  • sort -k=2 (sort on column 2)
  • sort -t":" -k=2 (sort on column 2 using colon delimited columns)


uniq

  • purges duplicate consecutive lines (must be adjacent)
  • fast (linear time)
  • options:
    • uniq -d (only prints the lines which are repeated)
    • uniq -u (only prints the lines which are not repeated)
    • uniq -c (count)
  • hello
    hi
    hi
    hello
    
    exercise: give output of following command lines on above input stream:
    uniq
    uniq -u
    uniq -d
    uniq -c
    
  • to purge duplicates, first sort and then apply uniq, e.g., sort name | uniq = sort -u names


Spellers

  • spell
  • ispell (interactive spell)
  • aspell
  • add following line to your .vimrc to invoke aspell on the current file in vim using <ctrl-t>:
    map ^T <CR>:!aspell --dont-backup check %<CR>:e! %<CR>


Pipeline of filters

    (recall UNIX model of computation;
    communication mechanism setup for free by the shell)
    $ spell uist2003.tex | sort | uniq
    $ spell uist2003.tex | sort | uniq | wc -l
    $ spell uist2003.tex | sort -u
    $ spell uist2003.tex | sort -u | wc -l
    $ detex 20100115/20100115.tex | nroff
    


cut and paste

  • extract or merge fields or columns from lines
  • $ who | cut -d" " -f1 | paste - -
  • join (relational database operator)


Uses of paste

  • as the vertical analog of cat (e.g., paste a b)
  • to concatenate multiple lines of one file into a single line (e.g., paste -s a)
  • using different delimiters (e.g., paste -s -d ":;|" a)


File comparison utilities

  • comm
    • syntax: comm <file1> <file2>
    • meaningful if <file1> and <file2> are sorted
    • merges 2 files and prints each line in one of 3 columns
      1. line(s) only in <file1>
      2. line(s) only in <file2>
      3. line(s) in both <file1> and <file2>
    • an apple
                  cat        both ideas
                  dog
      elephants
      
    • options: which columns to suppress

  • cmp

  • diff (find and output differences between two files or two directories)
    $ diff file1 file2
    $ diff dir1 dir2
    
  • sdiff


Printing utilities

  • script
  • lpr
  • lpd
  • lpq
  • a2ps (ascii to postscript)
  • enscript
  • nenscript
  • ghostview
  • gv
  • ggv
  • xpdf
  • acroread
  • ps2pdf
  • pdf2ps
  • latex
  • detex
  • dvips
  • xdvi
  • groff
  • troff
  • nroff
  • expand (converts tabs to spaces)
  • unexpand
  • iconv
  • indent (a pretty printer)
    $ cat .indent.pro # resource file for indent
    -br -nce -cdw -npcs -ncs -bs -brs -brf -i3
    
  • dos2unix
  • unix2dos
  • xfig
  • ppds


awk


Introduction

  • a more powerful sed
  • named after inventors: Aho, Weinberger, and Kernighan
  • follows sed style, but uses C syntax to specify commands
  • powerful for table manipulation and data summarization
  • helpful for processing columns (i.e., extracting, manipulating, or printing columns from input streams using specified delimiters)
  • mini relational database management system
  • awk (like sed) is Turing complete


Execution model

BEGIN {commands executed once before any input is read}
{main input loop executed for each line of input}
END {commands executed once after all input is read}


Simple awking

Consider the following input stream (student.grades):
    Lucy    45      55      60       90
    Linus   70      75      88      100
    Larry   75      80      85      100
    Lucia   80      70      70       95
    
  • the following awk script just cats a file; run it as you would run sed: awk -f <awk script name>:
      { print }
      
    Note that the curly braces contain commands, just as in sed. Since there is nothing before {, these commands are applied to all lines. The only difference is that instead of p in sed, we have print.

  • awk has two special patterns, BEGIN and END, where you can put commands which are executed before any line is read, and after all lines are read, respectively. For example:
      BEGIN {
         print "I am going to start reading a file. Whoopie!"
      } 
      { print }
      END {
         print "I have finished reading the file. Sigh."
      }
      
  • when awk reads a line, it automatically parses the line and puts pieces of the line into defined variables such as $1 (first field), $2 (second field), and so on. The default field separator is a tab (or space). Therefore, the awk script
      { print $1 }
      
    will just print the names. $0 stores the entire line.

  • we can also declare and manipulate variables, just like we would in a C program. The following demonstrates how you will calculate the average value of scores in the first column of numbers (which is actually the second column of the file).
      BEGIN {
         total = 0
         lc = 0
      } {
         total = total + $2
         ++lc
      } END {
         avg = total/lc
         print total, avg
      }
      
  • awk also has system variables to modify the output format (e.g., OFS stands for output field separator); we can set it in the BEGIN part by:
      BEGIN {
         total = 0
         lc = 0
         OFS = "---"
      }
      
    this will affect all subsequent outputs written using the print command; in between two variables (listed in comma separated format), awk will insert the output field separator; similarly, there is a FS which is an input field separator variable which can be used to set the input field separator to a character other than the default whitespace.

  • it is good practice to put one awk command on each line. If you use multiple commands, you will need to use a ; to separate them.


Fine tuning awk

  • character following a -F on the command line specifies the field delimiter (whitespace by default)
    awk -F: '{print $0}' faculty.details
    awk -F: '{print $1" "$2}' faculty.details
    
  • FS variable: the field separator, can be assigned a value
  • OFS variable: the output field separator, can be assigned a value
  • NF variable: stores number of fields in record
  • NR variable: the total number of input records seen so far
  • can use C statements for formatted output (e.g., printf ("%d\n", $1);)


Simple example command lines

    who | awk '{print $1}' # to see who is logged in
    
    who | awk '{print $5}' # to see from where users are logged in
    
    print "$(hostname) has been up for $(uptime | awk '{print $3}') days."
    
    awk '{print}' faculty.details # works like cat
    
    awk -F, '{print $2 " " $1}' guestlist
    
    awk -F, '{print $2, " ", $1}' guestlist # why three spaces between fields in output?
    
    awk -F, '{print $2 " " $1}' guestlist | sort # sorts by first name
    
    awk 'BEGIN {FS=":"} {print NF}' faculty.details
    
    awk 'BEGIN {FS=","; OFS=":"} {print $2, $1}' guestlist
    


Gradebook example

    awk 'BEGIN {
       ns = 0
       total = 0
    } {
       sum = $2 + $3 + $4
       avg = sum / 3
       ns++
       total += avg
       printf ("%d %s: %.2f\n", ns, $1, avg)
    } END { printf ("%d students: %.2f\n", ns, total/ns) }' scores
    
    Assuming that the file scores contains
    Peter 85 90 95
    Paul  25 25 50
    Mary 100 80 60
    
    this awk command generates the output
    1: Peter 90
    2: Paul 33.3333
    3: Mary 80
    3 students: 67.7778
    


More examples

    $ cat ouruniq
    BEGIN {
       prevline = ""
    } {
       if (NR == 1 || $0 != prevline) {
          print $0
          prevline = $0
      }
    } 
    
    $ cat uniq1line
    BEGIN {
       prevline = ""
    } {
       if (NR == 1 || $0 != prevline) {
          printf ("%s ", $0);
          prevline = $0
       }
    } END {
         printf ("\n");
      }
    
    $ sort names | awk -f ouruniq
    $ sort names | awk -f uniq1line
    


References

    [UIAN] A. Robbins. UNIX in a Nutshell. O'Reilly, Beijing, Third edition, 1999.
    [UPE] B.W. Kernighan and R. Pike. The UNIX Programming Environment. Prentice Hall, Upper Saddle River, NJ, Second edition, 1984.

Return Home