CS 497C - Introduction to UNIX Lecture 25: - Simple Filters Chin-Chih Chang chang@cs.twsu.edu sort: Ordering a File * The sort command is used to sort individual fields, and columns within these fields. * When sort is invoked without options, the entire line is sorted in ASCII collating sequence. * Using the -t option, you can sort the file on any field. * You can sort the file on the fifth field. sort -t: +4 /etc/passwd sort: Ordering a File * You can sort on the more than one field. * If the primary key is fifth field, and the secondary key the first field. sort -t: +4 -5 +0 /etc/passwd * With the -n (numeric) option, you can sort in a numeric sequence. sort -t:0 +2 -3 -n group * The -u (unique) option lets you purge duplicate lines from a file. sort: Ordering a File cut -d':' -f3 shortlist | sort -u | tee des.lst * sort uses the -o (output) option to output the result to a file. sort -o sortedlist +3 list * You can check if the file actually been sorted with the -c (check) option. * To merge two sorted files, use the -m option. sort -m foo1 foo2 foo3 tr: Translating Characters * The tr (translate) command translates characters and can be used to change the case of letters. The syntax is: tr options expression1 expression2 < standard input * You can use tr to replace the : with a | (tilde), and the / with \. tr ':/' '|\' < /etc/group * We can change the case of the first three lines from lower to upper: head -3 /etc/group | tr '[a-z]' '[A-Z]' tr: Translating Characters * The -d (delete) option is used to delete characters. * The -s (squeeze) option is used to compress multiple consecutive characters. tr -s ' ' < shortlist * The -c (complement) option complements (negates) the set of characters in the expression. * You can also use octal values in tr. tr '|' '\012' < shortlist uniq: Locate Repeated and Nonrepeated Lines * uniq removes duplicate lines. * It is usually sort a file and pipe the process to uniq. sort dept.lst | uniq - * The -u (unique) option selects only nonduplicate lines. * The -d (duplicate) option selects only the repated ones. * The -c (count) option option displays the frequency of all lines. nl: Line Numbering * The nl command numbers only logical lines. * nl uses the tab as the default delimiter, but we can change it to the : with the -s option. * You can set the width (-w) of the number format. nl -w1 -s: calc.lst dos2unix and unix2dos: DOS and UNIX Files * UNIX and DOS files differ in structure. Lines in DOS are terminated by the carriage return - linefeed characters, while a UNIX line uses only linefeed. * Some UNIX systems feature two utilities - dos2unix and unix2dos - for converting files between DOS and UNIX. unix2dos catalog.html catalog.html cat *.html | unix2dos > combined.html Spell (ispell): Check Your Spellings * spell is used to spell-check a document. The command reads a file and generates a list of all spellings that are recognized as mistakes. * The -b (british) option uses the British dictionary. * Linux has an interactive spell-checking program - ispell. * When used with the -l option, ispell works noninteractively like spell. Applying the Filters * A three stage operation is shown as below: - Cut out the third field with cut -d'|' -f3 shortlist. - Sort it next with sort. - Finally, run uniq -c on the sorted output. * This can be done together using a pipeline: cut -d'|' -f3 shortlist | sort | uniq -c * To output the manual page in a plaintext format: man ls | col -b > ls.man