CS 497C - Introduction to UNIX Lecture 28: - Filters Using Regular Expressions - grep and sed Chin-Chih Chang chang@cs.twsu.edu grep: Searching for a Pattern * The sample database shown on Page 434 can be found in http://www.mhhe.com/engcs/compsci/das/data.mhtml * grep is a filter used to search a file for a pattern. * It scans a file for the occurrence of a pattern and, depending on the options used, displays: - Lines containing the selected pattern. grep: Searching for a Pattern - Lines not containing the selected pattern. (-v) - Line numbers where the pattern occurs. (-n) - Number of lines containing the pattern. (-c) - Filenames where the pattern occurs. (-l) * Its syntax treats the first argument as the pattern and the rest as filenames: grep options pattern filename(s) $ grep sales emp.lst * When you use multiword strings as the pattern, you must quote the pattern. grep Options $ grep "neil o'bryan" emp.lst * The -c (count) option counts the occurrences. $ grep -c directory emp*.lst * The -n (number) option can be used to display the line numbers containing the pattern. $ grep -n 'marketing' emp.lst grep Options * The -l (list) option displays only the names of files where a pattern has been found. $ grep -l 'manager' *.lst * The -i (ignore) option makes the match case-insensitive. * To look for a pattern that begins with a hyphen, use the -e option. $ grep -e "-mtime" /var/spool/cron/crontabs/* grep Options * In Linux, you can use the -e option to match multiple patterns. $ grep -e woodhouse -e wood emp.lst * A regular expression is an ambiguous expression formed with some special and ordinary characters, which is expanded by a command to match more than one string. * grep uses a regular expression to match a group of similar patterns. Regular Expressions * A regular expression uses a character class that encloses a group of characters within a pair of rectangular brackets []. $ grep "wo[od][de]house" emp.lst * Regular expressions use the ^(caret) to negate the character class, while the shell use ! (bang). * A single nonalphabetic string is represented by [^a-zA-Z]. Regular Expressions * The * (asterisk) matches the zero or more occurrences of the preceding character. $ grep "wilco[cx]k*s*" emp.lst * A . (dot) matches a single character. The shell use the ? character to indicate that. * The dot along with the * (.*) signifies any number of characters, or none. $ grep "p.*woodhouse" emp.lst Regular Expressions * A pattern can be matched at the beginning of a line with a ^, and at the end with a $. * Because it is the command that interprets these characters, a regular expression should be quoted to prevent the shell from interfering. $ grep "^2" emp.lst * The . and * lose their meanings when placed inside the character class. Then you need to escape these characters. egrep and fgrep: The Other Members * egrep extends grep's capabilities. It uses | to delimit multiple patterns. $ egrep 'woodhouse|woodcock' emp.lst * fgrep accepts only fixed strings and is faster than the other two. * Both commands support the -f (file) option to take such patterns from the file. fgrep -f pat.lst emp.lst