All the examples detailed here are accomplished from your terminal window if you are on a Mac, or from your cygwin window if you are on a PC.
awk
The awk command is a powerful way to manipulate the contents of textfiles. We are only going to skim the surface of what awk can do right now. Let's start with a simple example. Say you have a text file that has some columns of numbers in it. With one awk command you can rearrange the columns in a different order, or perform some arithmetic on the columns.
Download the file1.txt textfile in order to follow along with what I am doing. This is accomplished by clicking the link to the filename and then choosing Save As . . . from your browser's file menu. Save it somewhere on your computer, then in the terminal, navigate to that place. Remember how? You'll want to use cd.
This is what "file1.txt" contains. It is simply a plain text three-column six-row arrangement of numbers. The first column is the number 1, the second column is the number 2, the third column is the number 3.
Output certain columns of the file to the screen
Awk is great for quickly manipulating files that are arranged in columns, so it is a nice way to fiddle around with plain-text data files since those are frequently in columns or tables. It uses some peculiar syntax. Let's say we wanted to display just the first column from "file1.txt." Here's how to do it:
awk '{print $1}' file1.txt
The first thing you type is awk and then put single quotes and curly braces. Inside the curly braces we wrote print $1 which is the command to print column #1. The filename from which we are extracting column #1 goes next.
Let's say we wanted to display just the second column from "file1.txt." In that case we'd type:
awk '{print $2}' file1.txt
Quiz yourself!
Rearrange the columns, repeat columns, create other columns
You can output any number of columns and put them in whatever order you want. Let's say we want column 3, then column 1 but not column 2.
awk '{print $3, $1}' file1.txt
The comma between $3 and $1 tells awk to put a space between the columns.
Let's say you want to output column 1, substitute 4's in column 2, then output column 3 unchanged. That would be like this:
awk '{print $1, 4, $3}' file1.txt
The 4 inside the the curly braces doesn't have a $ in front of it because it is the actual number 4, it is not referring to a 4th column.
Quiz Yourself!
Arithmetic, text, special characters
You can do math inside the print statement of awk and you can also deal with columns that aren't numbers. Let's say I want to output the sum of columns 1 and 2 as the first column, my name as the second column, and the product of columns 2 and 3 as the third column, and then make a fourth column that is the number 25:
awk '{print $1+$2, "eliza", $2*$3, 25}' file1.txt
Here's what the output of awk '{print $1+$2, "eliza", $2*$3, 25}' file1.txt looks like.
There are a few special characters. A useful one sometimes is "\t" which tells awk that you want tab spaces in between the columns.
awk '{print $1 "\t" $2 "\t" $3 "\t"}' file1.txt
The command above will output file1.txt unchanged except for tab spaces in between the columns instead of just one space.
Redirect the output of awk
All the examples so far have output the results of the awk command to the screen. They have not altered the original file, and they haven't saved the results anywhere. To put the output of awk into a new file instead of showing it on the screen, use >. Let's say I want to make a new file that is the same as file1.txt but with the columns in reverse order:
awk '{print $3, $2, $1}' file1.txt > file2.txt
Now if I look in the folder where file1.txt is, there are two files. file1.txt is still there, but there is a new file called file2.txt as well.
On the left is the original file1.txt.
The command
awk '{print $3, $2, $1}' file1.txt > file2.txt
creates the new file2.txt, seen at right.