Lesson 7: A Little Bit of Unix, More File I/O, Arrays

Lesson 7: Introduction

Overview

Finally we have gotten to the point where we are developing the necessary skills to do some data manipulation! In this lesson we'll learn just a few simple unix-based commands that can make your life easier and faster. We'll learn to read in a data file, and do some computation with its contents.

Learning Outcomes

By the end of this lesson, you should be able to:

Do some simple file manipulation from the terminal prompt

What is due for Lesson 7?

This lesson will take us one week to complete, 30 June - 6 July 2021. The deliverable for this lesson is only a reading discussion. The Unix exercises are all in a non-graded form ("Try This!", "Quiz Yourself", etc.) but you should take the time to work through them anyway because knowing some of these simple tricks will save you a lot of headaches in Lesson 8 and anytime you want to import a dataset into Processing to make a plot out of it.

Questions?

If you have any questions, please post them to our Questions? discussion forum (not e-mail) in Canvas. I will check that discussion forum daily to respond. While you are there, feel free to post your own responses if you, too, are able to help out a classmate.

Lesson 7: Reading Discussion

This week we will read and discuss two papers:

Tversky, B., J.B. Morrison, M. Betrancourt, 2002, Animation: can it facilitate, Journal of Human-Computer Studies, 57, p. 247-262.

Höffler, Tim N., and Detlev Leutner, 2007, Instructional animation versus static pictures: A meta-analysis, Learning and Instruction, 17, p. 722-738.

These papers try to compare learning outcomes from still visuals with those from animations to see if animations are really better, or not.

Questions for discussion

As you read, consider the following questions, which we will discuss as a class:

Who decides whether an animation or a still visual is "good"?
What are the strengths and limitations of each of these two studies?
What are the congruence and apprehension principals? What is cognitive load? Why are these important when considering whether to use static or animated graphics?

Submitting your work

Once you have finished the readings, engage in a class discussion that will take place during Lesson 7. This discussion will require you to participate multiple times over that period.

Enter the Lesson 7 - Papers Discussion Forum.
You will see postings already there, each containing one of the discussion questions above.
Post a response to each question. If you feel that your response has already been "said" by another student, then post a response to someone else's remarks that expands on what has already been said, asks for clarification, asks a follow-up question, or furthers the discussion in some other meaningful way. By the end of the activity, I would like you to post at least one original thought/opinion/question and at least one thoughtful response to someone else's post.

Grading criteria

You will be graded on the quality of your participation. See the grading rubric [1] for specifics.

Lesson 7: Just Enough Unix to Make You Dangerous: Mac Version

This page is intended for Mac users. If you work on a PC, skip to the next page.

This short tutorial is meant to provide you with familiarity using a small number of unix commands to manipulate big text files of data. It is not meant to substitute for a complete understanding of unix, or programming in general, or even an exhaustive listing of useful commands but I hope that if you follow along, you'll learn enough simple file editing skills to save you some time.

Part 1: navigating your file structure from the terminal

Commands in Part 1: whoami, pwd, ls, cd, mkdir, man

The first thing we'll want to do is open a terminal window. Go to the Utilities folder inside your Applications folder and open Terminal. A window appears with some text that should be similar to the following.

Last login: Tue Apr 20 09:43:13 on console
[rockfall:~] eliza%

The first line "Last login: Tue Apr 20 09:43:13 on console" provides the date, and time from the last time that you logged into the system. On the second line, the word rockfall refers to the machine that you are logged into. In this case, rockfall refers to my machine or hard drive. The text after the : refers to which directory I am in. The ~ means that I am currently in my home directory. So, [rockfall:~] means that I am logged into the home directory on my machine. Next, "eliza" refers to my userid and the % means that is the end of the prompt and is waiting for your input.

An image of the window with the text referenced above is below.

processing code referenced above, pointing out hard drive name and userid. if you need a transcription, contact eliza

Terminal Window

E. Richardson

Here are three commands to try:

Type whoami at the prompt. The response should be your username.
Type pwd at the prompt. This stands for "present working directory" and it should tell you where you are in your computer's file structure.
Type ls at the prompt (the first character in the ls command is a lowercase L, not the number one). This command should give you a listing of all the subfolders and files inside the folder where you are. Do you recognize the list? It probably looks slightly unfamiliar to you because it is just a plain text listing as opposed to a pretty list with little icons to tell you the type of folder or file each item is. Furthermore, alphabetizing in unix is case sensitive, so if some of your folders and files are capitalized and some aren't, you might find that your list is in a different order than you are used to seeing.
Now we'll move around the file structure a little bit. In order to visualize better what you are doing, open your Finder to the same location as your terminal window. This way you'll be able to double check what you are doing. See my example below. I'm in the directory called "earth591" which is inside the directory called MedES. Note that when you are in the Finder, you can check where you are by looking down at the bottom margin. In the terminal, if you forget where you are you can always type pwd.
Now, in your terminal window, go to one of the folders inside the folder where you are. To do so, type cd foldername at the prompt. For example, in my earth591 folder, there are two subfolders called "graphics" and "readings". If I want to go to "graphics" I type cd graphics at the prompt.
To move one directory upwards, type cd .. at the prompt. That's cd, then a space, then two periods without spaces between them.
You don't need to move along one directory at a time, either. When I first started up my terminal window I was in the folder called "Users/eliza." If I wanted to go directly to the graphics folder inside the earth591 folder I could have typed cd MedES/earth591/graphics at the prompt and I would have gone straight there with zero mouse clicks.
If you type cd with no arguments after it, you will be returned to your original location (the default location where you started when you opened up Terminal).
You can create new folders and files, too. Type mkdir junk at the prompt in your terminal window. You have just made a new folder called "junk". Notice in your finder window that this new folder appears right away. You can make more than one directory at a time with just one command. If you type mkdir junk1 junk2 junk3 at the prompt you will make three directories all at the same level with those names.
The man command prints the manual page for any command to the terminal screen. Try it with the other commands in this section by typing man whoami, man ls, etc. at the prompt. The man page will give you more details than are in my little tutorial about any and all unix commands, including the various options that go with each command.

Part 2: simple text file manipulation

Commands in part 2: cp, mv, rm, more, head, tail, cat, >

Now let's take a text file and mess around with it using unix commands in the terminal window. Here is a link to a plain text file of ten days of aftershocks [2] following the 4 April 2010 Baja California earthquake. Put it in the new directory you called earth801/data1. Go to earth801/data1 and type ls to verify the file is there. Type more baja_neic.txt (in which baja_neic.txt is the actual name of the text file). The file should look like the screenshot below. If your terminal window is too small to show the whole file at once, you will get a black bar at the bottom that tells you what percentage of the file you are seeing. Hit the spacebar and you'll see another chunk of the file. Continue to hit the spacebar until you've seen the whole file and you are back at the terminal prompt. Alternatively, if you type cat baja_neic.txt the entire file will scroll by and leave you at the prompt when it's done.

plain text file. if you need a transcription, ask eliza

The command head baja_neic.txt shows you exactly the first ten lines of the file. Try it. You can also modify the head command like this:

head -5 baja_neic.txt

The -5 tells head to show the first 5 lines. Showing ten lines is the default when head has no arguments, so the following two commands are equivalent:

head baja_neic.txt

head -10 baja_neic.txt

The command tail is similar to head but works on the end of the file instead of the beginning. The commands more, cat, head, and tail return their output to the screen by default but you can also have them create a new file and put their results in it instead. The way to do this is to redirect the output with the > symbol.

For example, do this:

head -5 baja_neic.txt > newfile

and you will create a new text file called "newfile" which contains exactly the first five lines of the original file baja_neic.txt. It is important to note here that performing this command has not changed the original file in any way. You can type ls to verify that you now have two files in your data1 directory. One of them is the original baja_neic.txt and the other one is called newfile and it is a copy of the first five lines of baja_neic.txt. Use the more command to look at your newfile file. Did you get what you were expecting? When the "head" command counts lines of a file, blank lines are counted just like lines that have text characters in them, so that's why newfile looks the way it does. At this point, if you have been following along, the following three commands should give you identical output:

head -5 baja_neic.txt

more newfile

cat newfile

Okay, on to the next command of interest. The cp command copies one file to another but instead of using > you just specify the other filename. So, these two commands are equivalent ways of copying the entire file baja_neic.txt to a new file called baja_neic_copy.txt:

cp baja_neic.txt baja_neic_copy.txt

cat baja_neic.txt > baja_neic_copy.txt

If you want to rename a file without changing its contents, use mv. Like cp, mv requires two filenames, the previous one and the new one.

mv newfile baja_neic_five.txt

The above command renames the file "newfile" to "baja_neic_five.txt". You can also use "mv" to change the location of a file. Try typing

mv baja_neic_copy.txt ../data2/baja.txt

This command takes the file "baja_neic_copy.txt" and moves it from the folder data1 to the folder data2 and renames it baja.txt. You can go to data2 (remember how?) and verify there is now a file in there called baja.txt and that it is a duplicate of baja_neic.txt.

Another cool use of the cat command is to stick two or more files together and make one file. So,

cat baja_neic.txt newfile > baja2.txt

will make a file called baja2.txt which is a copy of baja_neic.txt plus a copy of "newfile" stuck together.

Try this!

Lesson 7: Just Enough Unix to Make You Dangerous - PC Version

This page is intended for PC users. If you work on a Mac, skip this page.

This short tutorial is meant to provide you with familiarity using a small number of unix commands to manipulate big text files of data. It is not meant to substitute for a complete understanding of unix, or linux, or even an exhaustive listing of useful commands but I hope that if you follow along, you'll learn enough simple file manipulation skills to save you some time.

If you are on a PC running Windows, you can emulate a unix/linux command window environment by running "cygwin." To reiterate: you aren't running linux, but it looks like you are.

To download it, go to the home of the cygwin project. [3]

Alert!

I also need to mention the following caveat: I'm not a PC user! When I was in grad school I had a Sun workstation that I used for everything, including typesetting. I didn't even write in Word, I used LaTeX. Now I use a mac. I also had to relax my principles on avoiding Word or face a lifetime of really cranky collaborators, but that's another story. The upshot here is that I am probably less of a complete doofus than your grandparents when it comes to PCs but . . . okay you get the picture. So when I made the screen casts of me attempting to present a tutorial of cygwin, I borrowed a pc and figured it out on the fly. And it basically worked okay; I give cygwin my thumbs up.

Part 1: navigating your file structure from the terminal

Commands in Part 1: whoami pwd mkdir ls man

The first thing we'll want to do is open a terminal window. Double-click the cygwin icon to open a terminal window. When the window is active there will be a blinking cursor where you can start typing. Unix commands are all typed at the prompt and by default the output of any command you type goes to the screen in the terminal window where you are typing.

Here are some commands to try:

Type whoami at the prompt. The response should be your username.
Type pwd at the prompt. This stands for "present working directory" and it should tell you where you are in your computer's file structure.
You can create new folders and files, too. Type mkdir junk at the prompt in your terminal window. You have just made a new folder called "junk". You can make more than one directory at a time with just one command. If you type mkdir junk1 junk2 junk3 at the prompt you will make three directories all at the same level with those names.
Type ls at the prompt (the first character in the ls command is a lowercase L, not a capital i). This command should give you a listing of all the subfolders and files inside the folder where you are. Do you recognize the list? It probably looks slightly unfamiliar to you because it is just a plain text listing as opposed to a pretty list with little icons to tell you the type of folder or file each item is. Furthermore, alphabetizing in unix is case sensitive, so if some of your folders and files are capitalized and some aren't, you might find that your list is in a different order than you are used to seeing.
Now is a good time to introduce the man command. man prints the manual page for any command to the terminal screen. Try it with the other commands in this section by typing man whoami, man ls, etc. at the prompt. The man page will give you more details than are in my little tutorial about any and all unix commands, including the various options that go with each command. To page through the man page for a command, hit the space bar. To get out of the manual page and back to your terminal prompt, type q.

Part 2: navigating the file structure from the terminal

Commands in Part 2: cd

In Part 2, we'll see how to use unix commands to change our location in the computer's file structure. It's analogous to clicking through the various discs and folders from the windows launched when you double-click "my computer" except that it involves no mouse clicks, only typing.

The command of interest here is called cd. The way it works is that you type "cd pathname" at the prompt, and then you will go there (you have to type the actual path, not the word "pathname"). Nested folders have to be separated by forward slashes "/". You can verify that you are where you think you are by typing pwd or by navigating to the same address via windows and noting that the folder contents are the same.

To move one directory upwards, type cd .. at the prompt.
You don't need to move along one directory at a time, either. Instead of typing cd /cygdrive, hitting return, and then typing cd d at the next prompt, you can type cd /cygdrive/d to go straight to the place you want. No mouse clicks!
If you type cd with no arguments after it, you will be returned to your original location (the default location where you started when you opened up cygwin).

Note that in order to move between the C and D drives, you'll have to start the address with "cygdrive". For example, the command cd /cygdrive/d will take you to the uppermost level of drive D and cd /cygdrive/c takes you to the uppermost level of drive C.

Try this!

Part 3: viewing portions of a file

Commands in Part 3: cat less

In the rest of this tutorial we'll do some simple things to files using unix commands.

First of all let's take a text file and mess around with it using unix commands in the terminal window. Here is a link to a plain text file of ten days of aftershocks [2] following the 4 April 2010 Baja California earthquake.

You should have done this already if you worked through the "Try this!" exercise, but in case you didn't: Make a directory called earth801 and make a directory inside of it called data1
Put the file in the new directory you called earth801/data1.
Go to earth801/data1 and type ls to verify the file is there.
Type less baja_neic.txt (in which baja_neic.txt is the actual name of the text file). The file should look like the screenshot below. If your terminal window is too small to show the whole file at once, you will get a black bar at the bottom that tells you what percentage of the file you are seeing. Hit the spacebar and you'll see another chunk of the file. Continue to hit the spacebar until you've seen the whole file and you are back at the terminal prompt.
Alternatively, if you type cat baja_neic.txt the entire file will scroll by and leave you at the prompt when it's done.

plain text file. if you need a transcription, ask instructor

Part 4: viewing specific parts of a file, redirecting output

Part 4 commands: head tail >

The command head baja_neic.txt shows you exactly the first ten lines of the file. Try it. You can also modify the head command like this:

head -5 baja_neic.txt

The -5 tells head to show the first 5 lines. Showing ten lines is the default when head has no arguments, so the following two commands are equivalent:

head baja_neic.txt
head -10 baja_neic.txt

The command tail is similar to head but works on the end of the file instead of the beginning. The commands less, cat, head, and tail return their output to the screen by default but you can also have them create a new file and put their results in it instead. The way to do this is to redirect the output with the > symbol.

For example, do this:

head -5 baja_neic.txt > newfile.txt

and you will create a new text file called "newfile.txt" which contains exactly the first five lines of the original file baja_neic.txt. It is important to note here that performing this command has not changed the original file in any way. You can type ls to verify that you now have two files in your data1 directory. One of them is the original baja_neic.txt and the other one is called newfile.txt and it is a copy of the first five lines of baja_neic.txt. Use the less command to look at your newfile.txt file. Did you get what you were expecting? When the head command counts lines of a file, blank lines are counted just like lines that have text characters in them, so that's why newfile.txt looks the way it does. At this point, if you have been following along, the following three commands should give you identical output:

head -5 baja_neic.txt
less newfile.txt
cat newfile.txt

Part 5: copying a file, renaming a file, moving a file to a different folder, deleting a file or folder

commands in Part 5: cp, mv, rm

cp baja_neic.txt baja_neic_copy.txt
cat baja_neic.txt > baja_neic_copy.txt

If you want to rename a file without changing its contents, use mv. Like cp, mv requires two filenames, the previous one and the new one.

mv newfile.txt baja_neic_five.txt

The above command renames the file "newfile.txt" to "baja_neic_five.txt". You can also use mv to change the location of a file. Try typing

mv baja_neic_copy.txt ../data2/baja.txt

Another cool use of the cat command is to stick two or more files together and make one file. So,

cat baja_neic.txt newfile.txt > baja2.txt

will make a file called baja2.txt which is a copy of baja_neic.txt with a copy of "newfile.txt" appended to the bottom.

To delete a file or a folder type rm filename. Careful here because the file won't go into a trash folder that you can change your mind about. It is really gone.

Now try this!

Lesson 7: File Editing with awk

All the examples detailed here are accomplished from your terminal window if you are on a Mac, or from your cygwin window if you are on a PC.

awk

The awk command is a powerful way to manipulate the contents of textfiles. We are only going to skim the surface of what awk can do right now. Let's start with a simple example. Say you have a text file that has some columns of numbers in it. With one awk command you can rearrange the columns in a different order, or perform some arithmetic on the columns.

cartoon Eliza alerting you that instructions come next

Download the file1.txt [4] textfile in order to follow along with what I am doing. This is accomplished by clicking the link to the filename and then choosing Save As . . . from your browser's file menu. Save it somewhere on your computer, then in the terminal, navigate to that place. Remember how? You'll want to use cd.

Enter image and alt text here. No sizes!

This is what "file1.txt" contains. It is simply a plain text three-column six-row arrangement of numbers. The first column is the number 1, the second column is the number 2, the third column is the number 3.

Output certain columns of the file to the screen

Awk is great for quickly manipulating files that are arranged in columns, so it is a nice way to fiddle around with plain-text data files since those are frequently in columns or tables. It uses some peculiar syntax. Let's say we wanted to display just the first column from "file1.txt." Here's how to do it:

awk '{print $1}' file1.txt

The first thing you type is awk and then put single quotes and curly braces. Inside the curly braces we wrote print $1 which is the command to print column #1. The filename from which we are extracting column #1 goes next.

Let's say we wanted to display just the second column from "file1.txt." In that case we'd type:

awk '{print $2}' file1.txt

Quiz yourself!

Rearrange the columns, repeat columns, create other columns

You can output any number of columns and put them in whatever order you want. Let's say we want column 3, then column 1 but not column 2.

awk '{print $3, $1}' file1.txt

The comma between $3 and $1 tells awk to put a space between the columns.

Let's say you want to output column 1, substitute 4's in column 2, then output column 3 unchanged. That would be like this:

awk '{print $1, 4, $3}' file1.txt

The 4 inside the the curly braces doesn't have a $ in front of it because it is the actual number 4, it is not referring to a 4th column.

Quiz Yourself!

Arithmetic, text, special characters

You can do math inside the print statement of awk and you can also deal with columns that aren't numbers. Let's say I want to output the sum of columns 1 and 2 as the first column, my name as the second column, and the product of columns 2 and 3 as the third column, and then make a fourth column that is the number 25:

awk '{print $1+$2, "eliza", $2*$3, 25}' file1.txt

screenshot of 4-column file. 1st column is 3, second column is eliza, third column is 6, 4th is 25

Here's what the output of awk '{print $1+$2, "eliza", $2*$3, 25}' file1.txt looks like.

There are a few special characters. A useful one sometimes is "\t" which tells awk that you want tab spaces in between the columns.

awk '{print $1 "\t" $2 "\t" $3 "\t"}' file1.txt

The command above will output file1.txt unchanged except for tab spaces in between the columns instead of just one space.

Redirect the output of awk

All the examples so far have output the results of the awk command to the screen. They have not altered the original file, and they haven't saved the results anywhere. To put the output of awk into a new file instead of showing it on the screen, use >. Let's say I want to make a new file that is the same as file1.txt but with the columns in reverse order:

awk '{print $3, $2, $1}' file1.txt > file2.txt

Now if I look in the folder where file1.txt is, there are two files. file1.txt is still there, but there is a new file called file2.txt as well.

screenshot of three-column file, columns are the numbers 1,2,3.

screenshot of three column file. column 1 is 3, column 2 is 2, column 3 is 1.

On the left is the original file1.txt.

The command

awk '{print $3, $2, $1}' file1.txt > file2.txt

creates the new file2.txt, seen at right.

Try This!

Lesson 7: File editing with vi

vi (also called vim; the two are basically the same) is a text editor that allows you to create and edit text inside a terminal window without popping up another window and without using the mouse. Truthfully, most people would never want to get rid of their mouse if they are used to using it all the time, but if you want to get into a file, do a simple thing to it, such as deleting the first 30 lines or adding one line to the bottom or something like that, then vi is handy. However, the functionality of vi does not lend itself well to making a screen capture "how-to" movie because most of the action takes place on the keyboard and you can't see my hands with a screen capture.

When you are using vi to edit a file, you will either be in "insert" mode or in "moving around" mode. While you are in "insert" mode, whatever you type becomes part of the file. (just like whatever word processor/text editor you are used to). But when you are in "moving around" mode, you use keyboard keys to move the cursor around the file. To get started, type vi filename at the terminal prompt (in which "filename" is your actual filename, not the word "filename" unless that's the name of your file .

Here's an incomplete command list (the man page for vi will do better) but it's a start:

INSERT MODE

i to insert before the cursor, I to insert at the beginning of the current line

a to insert after the cursor, A to insert at the end of the current line

o to make a new blank line below the cursor and put the cursor at the beginning of it, O to make a new blank line above the cursor and put the cursor at the beginning of it

ESC to get out of insert mode and go into moving around mode.

MOVING AROUND MODE

h moves one space to the left

l moves one space to the right

j moves one line down

k moves one line up

dd deletes the current line

x deletes the current character

typing a number before a command repeats the command that many times, so 10 dd deletes 10 lines beginning with the current line.

:w saves your work

:q quits vi

you can do these together, so :wq saves your work and quits vi all in one step.

Try This!

A harder challenge!

Now do something useful

Let's put our skills to work. Download this file of a catalog of earthquakes [5] from the USGS. Use vi and awk to make a new file that contains just one column -- the earthquake magnitudes. This is the kind of thing that will be super useful for making a frequency magnitude diagram! Try it on your own and if you get stuck, check to see how I did it. Keep in mind that there is just about always more than one way to accomplish an editing task like this. The point is to get the end result without lots of work and cumbersome steps in a non-mathematical spreadsheet program that was not intended to handle a big dataset.

Lesson 7: A Little Bit of Unix, More File I/O, Arrays

Lesson 7: Introduction

Overview

Learning Outcomes

What is due for Lesson 7?

Questions?

Lesson 7: Reading Discussion

Questions for discussion

Submitting your work

Grading criteria

Lesson 7: Just Enough Unix to Make You Dangerous: Mac Version

This page is intended for Mac users. If you work on a PC, skip to the next page.

Part 1: navigating your file structure from the terminal

Commands in Part 1: whoami, pwd, ls, cd, mkdir, man

Part 2: simple text file manipulation

Commands in part 2: cp, mv, rm, more, head, tail, cat, >

Try this!

Lesson 7: Just Enough Unix to Make You Dangerous - PC Version

This page is intended for PC users. If you work on a Mac, skip this page.

Alert!

Part 1: navigating your file structure from the terminal

Commands in Part 1: whoami pwd mkdir ls man

Part 2: navigating the file structure from the terminal

Commands in Part 2: cd

Try this!

Part 3: viewing portions of a file

Commands in Part 3: cat less

Part 4: viewing specific parts of a file, redirecting output

Part 4 commands: head tail >

Part 5: copying a file, renaming a file, moving a file to a different folder, deleting a file or folder

commands in Part 5: cp, mv, rm

Now try this!

Lesson 7: File Editing with awk

awk

Output certain columns of the file to the screen

Quiz yourself!

Rearrange the columns, repeat columns, create other columns

Quiz Yourself!

Arithmetic, text, special characters

Redirect the output of awk

Try This!

Lesson 7: File editing with vi

INSERT MODE

MOVING AROUND MODE

Try This!

A harder challenge!

Now do something useful

Try it on your own first!