CS 497C – Introduction to UNIX Lecture 29: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang chang@cs.twsu.edu Regular Expressions * egrep’s extended set includes two special characters - + and ?. They are often used in place of * to restrict the matching scope. * + - matches one or more occurrences of the previous character. * ? - matches zero or one occurrence of the previous character. $ egrep "true?man" emp.lst Regular Expressions * The |, ( and ) can be used to search for multiple patterns. $ egrep 'wood(house|cock)' emp.lst * sed is a multipurpose too which combines the work of several filters. * Designed by Lee McMahon, it is derived from the ed line editor. * sed is used to perform noniteractive operations. sed: The Stream Editor * sed has numerous features - almost bordering on a programming language but its functions have been taken over by perl. * Everything in sed is an instruction. An instruction combines an address for selecting lines with an action to be taken on them: sed options 'address action' file(s) * The address and action are enclosed within single quotes. sed: The Stream Editor * The components of a sed instruction are shown as below: sed '1,$ s/^bold/BOLD/g' foo address action * You can have multiple instructions in a single sed command, each with its own address and action components. * Addressing in sed is done in two ways: - By line number (like 3,7p). - By specifying a pattern (like /From:/p). Line Addressing * In the first form, the address specifies either a single line or a set of two (3,7) to select a group of contiguous lines. * The second one uses one or two patterns. * In either case, the action (p, the print command) is appended to this address. * You can simulate head -3 by the 3q instruction in which 3 is the address and q is the quit action. Line Addressing $ sed '3q' emp.lst * sed uses the p (print) command to print the output. $ sed '1,2p' emp.lst * By default, sed prints all lines on the standard output in addition to the lines affected by the action. So the addressed lines are printed twice. Line Addressing * To overcome the problem of printing duplicate lines, you should use the -n option whenever you use the p command. $ sed -n '1,2p' emp.lst $ sed -n '$p' emp.lst * To reverse line selection criteria, use !. $ sed -n '3,$!p' emp.lst * To select lines from the middle, do as: sed -n '9,11p' emp.lst Line Addressing * You can select multiple sections as follows: $ sed -n -e '1,2p' -e '7,9p' -e '$p' emp.lst * The second form of addressing lets you specify a pattern (or two) rather than line numbers. * This is known as context addressing where the pattern has a / on either side. * You can locate the senders in your mailbox: sed -n '/From: /p' /var/mail/cs497c Context Addressing sed -n '/^From: /p' /var/mail/cs497c sed -n '/wilco[cx]k*s*/p' emp.lst * You can also specify a comma-separated pair of context address to select a group of contiguous lines: sed -n '/johnson/,/lightfoot/p' emp.lst * To list files which have written permission for the group: ls -l | sed -n '/^.....w/p'