Lesson 8: Arrays, File I/O

Lesson 8: Introduction

Overview

In this lesson we will build on the Unix skills from Lesson 7 and work with data files inside the Processing environment.

Learning Outcomes

By the end of this lesson, you should be able to:

Import an ascii file into a program and use its contents for computation and display
Use arrays to manage long, related, similar lists of data
Output a text file

What is due for Lesson 8?

This lesson will take us one week to complete, 7 - 13 July 2021. The deliverable for this lesson is one programming exercise, detailed on the last page of the lesson.

Questions?

If you have any questions, please post them to our Questions? discussion forum (not e-mail) in Canvas. I will check that discussion forum daily to respond. While you are there, feel free to post your own responses if you, too, are able to help out a classmate.

Lesson 8: Arrays

New syntax discussed: [], ., new

Arrays

Now we are ready to work with a new variable type called an array. An array is like a matrix. It is a variable that holds a bunch of the same type of data instead of just one piece of data. Arrays are useful for storing lists of similar things because it insulates you from having to declare so many variables. Declaring an array is similar to declaring other variables such as ints, floats, and strings, but you have to say what kind of data the array will store and use the square brackets [ ] so Processing knows that you want to make an array of that type of data.

For example, if you want to declare three integer variables called "x1", "x2", and "x3" and assign them the values 0, 25, and 6, here is one way to do it (hopefully this is old hat by now):

int x1 = 0;
int x2 = 25;
int x3 = 6;

But you could make an array of integers instead. Let's do that and call the array "x." There are a few different ways to do this. Here they are:

Way #1: declare, create, and assign in three different steps

int[] x; //declare it here. I'm telling Processing 
//to make an array of integers called "x"
 
void setup() {
   size(400, 400);
   x = new int[3]; // Creating it with the "new" command.
   x[0] = 0;       // Assigning values to each element.
   x[1] = 25;
   x[2] = 6;
}
 
void draw() {
 
// The rest of the program
 
}

In the program above, we first told Processing we wanted to make an array of integers called "x". Then inside the setup block we created the array using new and inside of the square brackets we tell Processing how big x will be, meaning how many integers it will hold. This allows your computer to allocate the right amount of memory to store the array. Then we assign values to each element in x. Remember that you always start counting at zero, not one in Processing! That means the first element of the array is denoted x[0], and the second element of the array is denoted x[1]. The number inside the square brackets tells you which element it is, not the value of that element. For example, x[2] = 6; means that the third element of the integer array x is assigned the value 6.

Way #2: declare, create, and assign in two steps

int[] x = new int[3]; //declare and create it here
 
void setup() {
   size(400, 400);
   x[0] = 0; //assigning here
   x[1] = 25;
   x[2] = 6;
}
 
void draw() {
 
//the rest of the program
 
}

In the above example, we declared the array of integers up at the top before the setup() block. When we declared it we also told Processing how many elements the array would have in it. So we saved a step compared to the first example. Since we did this up at the top of the program before setup(), the array x is available inside both setup() and draw(). We are filling up the array with values later on the program. It is important to note that there is a difference between an empty element in an array and an element that is assigned the value zero. When we wrote the command

int[] x = new int[3];

This told Processing to make an array that has room for three integers but we didn't tell it what those integers would be. Until we do, the array is empty, which means that there are no values assigned to it. When we wrote

x[0] = 0;

We assigned the value zero to the first spot in the array, which was previously nothing.

Way #3: declare, create, and assign all in one step

int[] x = {0, 25, 6};
 
void setup() {
   size(100, 100);
}
 
void draw() {
 
//rest of the program
 
}

In the example above, you don't have to use new if you do the declaration, creation, and assignment all at the same time. And you don't have to tell Processing how big the array is going to be because you are assigning the spots in the array to numbers right away.

Using arrays to draw a shape

The examples above told you how to make an array, but those programs don't really do anything, do they? So here's an example of a program that uses an array to draw a shape:

//use two arrays to draw a star
 
int[] a= {20, 40, 20, 40, 50, 60, 80, 60, 80, 50};
int[] b = {20, 45, 60, 60, 80, 60, 60, 45, 20, 35};
 
void setup(){
  size(100,100);
  noLoop(); //just a static shape drawn once
}
 
void draw(){
  beginShape();
  for (int i = 0; i< a.length; i++){
    vertex(a[i], b[i]);
  }
  endShape(CLOSE);
}

Display window output of the star-drawing program

E. Richardson

cartoon Eliza alerting you that instructions come next

In the program above, I declared, created, and assigned two arrays at the beginning of the program before setup() and draw(). Inside setup() I just set the size of the display window. Inside draw() I made a beginShape()/endShape() pair and in between them I wrote a for loop that loops through the arrays and sets the vertices one by one. This is the first time you have seen the dot operator . in a program. The dot operator in Processing is used to access some attribute of the variable and it is analogous to an apostrophe in English to designate possession. For instance, when I wrote a.length that means in English "the length of the array called a," or "a's length." For an array, its length is the number of elements in it, so a.length == 10 in the star-drawing program.

2. Use a for loop to assign elements to an array

It is also useful to use a for loop to fill up an array with numbers instead of assigning each element by hand. Here's an example of a program that does this:

//five balls, each ball is faster than the one above it
    int num = 5;
    float[] xpos = new float[num];
    float[] speed = new float[num];
    float dy = 60;
 
    void setup(){
    size(400,400);
    for (int i = 0; i<num; i++){
      xpos[i] = i;
      speed[i] = i+0.1;
      }
    }
 
    void draw(){
    background(0);
    for (int i = 0; i<num; i++){
      float y=(i+dy)*i;
      ellipse(xpos[i],y,20,20);
      xpos[i]+=speed[i];
      if (xpos[i]>width-10){
        xpos[i]=0;
      } 
   }

Here is a 12 second video demonstrating the code above (video is silent).

Quiz Yourself!

3. Store mouseX and mouseY history in arrays

Another useful thing to do with arrays is to use them store the history of mouseX and mouseY positions.

Here is an example of that:

//snake of circles follows the mouse
 
int num = 50;
int[] x = new int[num];
int[] y = new int[num];
 
void setup(){
   size(400, 400);
   noStroke();
   smooth();
   fill(2S5, 100);
 
}
 
void draw(){
   background(0);
   //go backwards through the loop
   //and shift all values to the right
   for (int i = num-1; i>0; i--){
      x[i] = x[i-1];
      y[i] = y[i-1];
   }
   //put current values of mouseX and mouseY at beginning
   x[0] = mousex;
   y[0] = mouseY;
    
   //draw circles
   for (int i = 0; i < num; i++){
      ellipse(x[i],y[i], 20, 20);
   }
}

In the program above, I declare and create two arrays before setup() and draw(). I use setup() for the usual things. Inside draw() there is a for loop that steps backwards through the two arrays and shifts all the array values to the right, then adds the current values of mouseX and mouseY to the beginning of each array. Think of this as a conveyer belt that runs from left to right. You are continuously adding a new value to the left side while the rightmost value falls off the other side and gets thrown away. Then I use another for loop that goes forwards through the array to draw all the circles. The whole effect is that a trail of however many circles assigned to the value "num" (in this case 50) follow the mouse around like a snake.

Lesson 8: File Input/Output with Data: Baseball example

Three examples of reading in a file, doing something with its contents, plotting the result.

New syntax: loadStrings, split, map, log

0. Make a scatter plot from data in a plain text file

The box below contains the contents of a plain text file named "cards_data.txt" that I created using vi and have dragged and dropped onto my sketch. There are three columns separated by tabs. The first column is a list of last names of 2002 St. Louis Cardinals position players (no pitchers). The second column lists the number of RBIs each of them earned that year and the final column is each of their salaries in millions of dollars. The format of this file looks kind of ugly because some of the names are too long for the tabbing to work out right, but Processing won't care about this! We are going to read this text file into Processing using loadStrings. Then we are going to make a plot out of it.

Cairo	23	0.85
Drew	56	3.6
Edmonds	83	8.33
Marrero	66	1.5
Martinez	75	7.5
Matheny	35	3.25
Palmeiro	31	0.7
Perez	26	0.5
Pujols	127	0.9
Renteria	83	6.5
Robinson	15	0.32
Rolen	110	7.625
Vina	54	5.33

Here's the program, and a screenshot of the plot I made.

//this data is the number of RBIs in 2002 for Cards position players 
//and their salaries (in millions of $)
//we will read the data in from a 3-column plain text file
 
String[] cards; //make the array and fill it with data later
 
void setup() {
   size(200, 200);
   background(255);
   PFont font1;
   font1 = loadFont("AbadiMT-CondensedLight-14.vlw");
   textFont(font1);
   smooth();
   cards = loadStrings("cards_data.txt"); //this is how we read in the file contents
   noLoop(); //just drawing a static plot once
}
 
void draw() {
   //make a grid for plotting. use translate to leave some blank space for labels
   translate(50, -50);
   stroke(200);
   for (int i = 0; i< 100; i=i+20) {
   line(i, 60, i, height); //vertical gridlines
   line(0, height-i, 140, height-i); //horizontal gridlines
}
 
//plot the data
stroke(0);
fill(75);
println("number of lines in data file is " +cards.length);
 
//go through the array called "cards" line by line
for (int i = 0; i<cards.length; i++) {
 
    //split each line where there is a tab
    //create a new array of strings called "data" to hold this info
    String[] data = split(cards[i], '\t');
     
       String Name = (data[0]); //player name in first column
       int Rbi = int(data[1]); //Rbi in the second column
       float Salary = float(data[2]); //Salary in third column
        
       //make a scatter plot of Rbi v. salary
       ellipse(Rbi, height-Salary*10, 10, 10); //want the axes origin at lower left, so do (height - y data)
   }
    
   //label the axes
   //I did these by trial-and-error until I got them to look right
   fill(0);
   text("RBIs", (width/2)-50, height+30);
   text("Salary $ mil", -50, 100, 30, 100);
   text("20", 15, height+15);
   text("60", 55, height+15);
   text("100", 95, height+15);
   text("2", -10, height-15);
   text("6", -10, height-55);
   text("10", -15, height-95);
}

Screenshot of plot generated from code above.

The main purpose of this plot is to show you how easy it would be to ditch your job and start making millions as an agent. Look how well RBIs correlate to salary! Albert Pujols was glaringly underpaid (making $900,000 and accumulating 127 RBIs) according to this metric but he negotiated a new contract with the Cardinals after the 2003 season that paid him upwards of $10 million per year, which puts him right in line with where the RBI prediction says he should be.

The secondary purpose of this plot is to demonstrate a few new commands and how to deal with an external data file. Inside setup() we used loadStrings to read the file into the program. You want to do all the reading-in of external files in setup() because that block just runs once and you don't want your cpu hogged by re-loading your files every time you run through draw(). The file will be loaded in as lines of String variables. The first thing we want to do is tell Processing that we actually want three columns, not 13 lines. So, we go through the data file line by line and split each line where there are tabs. The syntax '\t' tells Processing to look for a tab.

The fact that the data comes in as strings works out great for the player names because they are words. But if we want to do some arithmetic with the numbers, or otherwise treat them as numbers, then we have to convert them to other variable types. Inside the for loop where we run through the data file, we first split each line into three pieces, making a three-element array named data. Then we rename each element in the data array and convert it to another variable type if we want to. For example, we made an integer array out of the RBI data, and we made a float array out of the salary data. The chunk of code that does all that is here:

String[] data = split(cards[i], '\t');
String Name = (data[0]);
int Rbi = int(data[1]);
float Salary = float(data[2]);

Then we plot Rbi v. Salary.

ellipse(Rbi, height-Salary*10, 10, 10);

The rest of the program is devoted to doing the background work that spreadsheet and other canned plotting programs do for you. Here's where we make some gridlines:

//make a grid for plotting. use translate to leave some blank space for labels
   translate(50, -50);
   stroke(200);
   for (int i = 0; i< 100; i=i+20) {
   line(i, 60, i, height); //vertical gridlines
   line(0, height-i, 140, height-i); //horizontal gridlines
} 

Here's where we label the axes:

//label the axes
   //I did these by trial-and-error until I got them to look right
   fill(0);
   text("RBIs", (width/2)-50, height+30);
   text("Salary $ mil", -50, 100, 30, 100);
   text("20", 15, height+15);
   text("60", 55, height+15);
   text("100", 95, height+15);
   text("2", -10, height-15);
   text("6", -10, height-55);
   text("10", -15, height-95);

Of course this is a slightly more tedious way to make a simple plot -- you would probably rather just paste this little datafile into your favorite program and not spend time tinkering with the way the plotting grid looks, right? Sure, but the point is that you can do it this way and you have complete control over the way it looks, which is cool!!

Lesson 8: File Input/Output with Data: World map example

A World Map

I got this data file from the NOAA coastline extractor, which is now obsolete but you can find a similar version of it at the CIA World Data Bank II [1]. I'm not giving you a screenshot of the datafile this time because it is a 1.2 Mb file with over 62,000 lines. And that's the low-res version! Try pasting that one into Excel! However the program that makes this plot is quite simple:

//plotting a map of the world
 
String[] coast;
 
void setup() {
   size(600,300);
   coast = loadStrings("coastText.txt");
   noLoop();
 
}
 
void draw() {
   background(255);
   float[] coastLon = new float[coast.length];
   float[] coastLat = new float[coast.length];
   float[] newCoastLon = new float[coast.length];
   float[] newCoastLat = new float[coast.length];
    
   for (int i=0; i<coast.length; i++){
      String[] data = split(coast[i], ' ');
      coastLon[i] = float(data[0]);
      coastLat[i] = float(data[1]);
   }
    
   for (int i=0; i<coastLon.length; i++){
      newCoastLon[i] = map(coastLon[i],-180,180,0,width);
      newCoastLat[i] = map(coostLat[i],-90,90,height,0);
   }
    
   stroke(50);
   for(int i=0; i<coastLon.length; i++){
   point(newCoastLon[i],newCoastLat[i]);
 
}

We read the file in, then because each line of the data file has two numbers, longitude and latitude, we split each line and populate two new arrays, one for longitude and one for latitude. In this data file there's just a blank space in between the numbers, not a tab, so that's why the second option in split has an empty space surrounded by single quotes. There's another for loop in which I use map to make the data plot in a way that exactly fills the display window. map takes 5 options. They are: the value itself, the original min and max range of that value, and then the min and max of the range you are changing it to. So for longitude, the "value" is just whatever the longitude in the data file is, the range of longitude is the whole Earth's longitude, so it's -180 to +180. Then the range we are plotting to is the window size, so between zero and the width of the window. map is great because it does the work for you of having to figure out the scale of things. Why would you want to spend time trying to calculate where 60 degrees east should go when map can do it for you?

Map of the world coastline. Display produced by program above.

I should point out here that it is just a coincidence that I used map to make an actual map. In fact, map is handy anytime you have a variable with a natural range to it but you want it to be expanded or contracted proportionally to a different range. For example, here is a program where map is used to expand the greyscale [2], which normally goes from 0 to 255, to a range that goes from 0 to 400, the width of the screen:

// demo use of "map"
 
float x;
float y;
 
void setup() {
   size(500,200);
}
 
void draw(){
   x=random(width);
   y=random(height);
   int a= int(x);
   color colr = int(map(a,0,width,0,255));
   fill(colr);
   ellipse(x,y,20,20);
}

Screen capture of image generated from code above.

Lesson 8: Self-check: using the map command

Quiz Yourself!

Further explanation: This is a for loop. The loop variable goes from zero to five by ones. Look at the two lines of code inside the for loop. One of them is the command to draw a line. Drawing a line has four arguments and they are x1, y1, x2, y2. In this program, the x1 is always 35 and the x2 is always 50. The y1 and y2 look like a mess but they are the same as each other, so this code draws six horizontal lines.

The lines are evenly spaced between the top and bottom of the display window. That is what map does for us. We do not have to calculate where each line will be. We just map the values onto the range we want in the display window. I mapped the six lines from height-1 to 1 instead of height to 0 because lines plotted right on the border of the display window would not have shown up.

The other command inside the for loop puts a text label next to each horizontal line. In fact it writes the value of i, which is a number. You can see that text is placed with its origin at the bottom left, so that's why the number 5 is cut off.

The lines are black and the text is white because we didn't set fill or stroke. Processing therefore uses the defaults:fill(255) and stroke(0).

Lesson 8: More tinkering with data and plots

Another take on the Cardinals RBI v. Salary plot

The program on this page demonstrates a lot of the skills we have learned this semester so I'm going to go through it piece by piece. First of all, here is the plot that it draws and the whole code.

scatter plot of Cardinals rbi versus salary for the 2002 season

Scatter plot of Runs Batted In vs. Salary ($million) for the 2002 St. Louis Cardinals.

plot by E. Richardson, data from baseballreference.com

//plot some data from an array
//this data is the number of RBIs in 2002 for Cards position players 
//and their salary (in millions of $)
//we will read the data in from a 3-column plain text file
 
 
String[] cards; //make the array and fill it with data later
int rbiMax=140; //actual maximum in the datafile is 127, this gives some room
int salMax=10; //actual maximum in the datafile is 8.33, this gives some room
int nudge = 30; //gives some border room for the plot
 
void setup() {
 size(500, 500);
 background(255);
 PFont font1;
 font1 = loadFont("AbadiMT-CondensedLight-14.vlw");
 textFont(font1);
 smooth();
 //this is how we read in the file contents
 cards = loadStrings("cards_data.txt");
 noLoop(); 
}
  
void draw() { 
 //make a grid for plotting. use map to put the grid where I want. 
 //horizontal gridlines and labels
 //SALARY data on y axis. data range is 0.5-8.33, make it 0 to 10
  
 stroke(200);
   
 for (int i = 0; i< salMax; i++) { 
  line(map(0,0,rbiMax,nudge*2,width),map(i,0,salMax,height-(2*nudge),0),
    map(rbiMax,0,rbiMax,nudge*2,width),map(i,0,salMax,height-(2*nudge),0)); 
  text(i,map(-5,0,rbiMax,nudge*2,width),map(i,0,salMax,height-(2*nudge),0)); 
 } 
 
 //vertical gridlines and labels
 //RBI data on y axis. data range is 15-127, make it 0-140. 
 
 for (int i = 0; i<rbiMax; i=i+10){ 
  line(map(i,0,rbiMax,nudge*2,width),map(0,0,salMax,height-(2*nudge),0),
    map(i,0,rbiMax,nudge*2,width),map(salMax,0,salMax,height-(2*nudge),0)); 
  text(i,map(i,0,rbiMax,nudge*2,width),map(-0.5,0,salMax,height-(2*nudge),0)); 
 } 
 
 //plot the data 
 //go through the array called "cards" line by line 
 //split each line where there is a tab 
 //create a new array of strings called "data" to hold this info
 //player name in first column
 //Rbi in the second column
 //Salary in third column
 
 for (int i = 0; i<cards.length; i++) {
  String[] data = split(cards[i], '\t'); 
  String Name = (data[0]); 
  float Rbi = map(float(data[1]),0,rbiMax,nudge*2,width); 
  float Salary = map(float(data[2]),0,salMax,height-(2*nudge),0); 
 
  //make a scatter plot of Rbi v. salary 
  ellipse(Rbi,Salary,2,2); 
  text(Name,Rbi,Salary); 
 } 
 
 //label the axes 
 fill(0); 
 textAlign(CENTER); 
 text("Runs Batted In, 2002", (width/2), height-20); 
 pushMatrix(); 
 translate(30,height/2); 
 rotate(-PI/2); 
 text("Salary ($ millions)",0,0); 
 popMatrix(); 
}
 
void mousePressed(){
save("cardsRbiData3.png");
}

Let's break it down:

Philosophy

Let's break down this program into chunks instead of trying to understand the whole thing at once. Think of this as how you would organize a paper, or a lab report. For example, in a scientific paper, you have to start with an introduction and some background knowledge or literature review, then explain your methods, then display your results, then interpret your results, and finally make some general conclusions. You can't tinker with this order too much or else your paper will not flow logically. You wouldn't want to jump right in with the interpretation of your results before you even explain what you were trying to find out and what measurements you made, right?

Similarly, there are some parts of a data-plotting program that have to go in order, as you already know. For example, if you want a shape to be outlined in blue, you have to set that color first and then draw the shape. If you want to plot some data from an external file, you first have to read the data into the program, then you can plot it. If you are making a scatter plot by hand on graph paper you first have to figure out where your origin will be, then figure out the range of the axes before you start plotting the points. Otherwise you won't know where your points should go.

If you use a software plotting application, the whole exercise of figuring out the range of the axes is done for you by the app. You can modify the axes after the fact, but you don't usually have to spend any time up front on that task. This is beneficial for saving time, but not beneficial if you want to teach your students the art of plotmaking. You want your students to look carefully at their data before just tossing it into a plotting program and hoping for the best.

Preamble

I usually use the first few lines of a program to write a note to my future self about what the program is supposed to do, and where I got the data. This is also the place to declare global variables. Here's the preamble from the RBI plotter:

//plot some data from an array
//this data is the number of RBIs in 2002 for Cards position players
//and their salary (in millions of $)
//we will read the data in from a 3-column plain text file 
String[] cards; //make the array and fill it with data later 
int rbiMax=140; //actual maximum in the datafile is 127, this gives some room 
int salMax=10; //actual maximum in the datafile is 8.33, this gives some room 
int nudge = 30; //gives some border room for the plot

The first four lines are just notes. Then I declare an array of strings which is going to hold the data that I read in later. I set three global integers. I already looked at my data and I know that the maximum number of RBIs in my file is 127 and the maximum salary is 8.33 million. This tells me approximately what range I ought to use for my x and y axes. It's useful to use a variable here instead of an actual number because what if I write a really long program that refers to the x or y axis range a bunch of times? If I go back and want to change the range for aesthetic reasons or whatever then I'll have to go back and find each place where that number appears. If instead I set that number to what it represents up at the top then if I want to change it I can just change it one time.

The setup() block

If there is a draw(), there has to be a setup(). setup() runs exactly once and the commands are run in order. Variables declared in setup() are not available outside of setup(). Anything that does not need to be changed while the program is running can be put in setup() to save computation time. Here’s a list of things that are commonly in setup().

size() tells the display window how big to be, and if you will use a 3D renderer.
Importing external files to be used: fonts, images, and data text files.
background() if you want to the screen not to be continuously refreshed

The setup() block for this program is:

void setup() {
 size(500, 500);
 background(255);
 PFont font1;
 font1 = loadFont("AbadiMT-CondensedLight-14.vlw");
 textFont(font1);
 smooth();
 //this is how we read in the file contents
 cards = loadStrings("cards_data.txt");
 noLoop(); 
}

I set the size and background, I load a font, and I read in the data from a plain text file. It's important that I already declared the array cards before setup() because now I can use that array to hold the information I'm reading in and I can also use it later in draw() when I want to do something with it.

The Draw() Block

If there is a setup() there has to be a draw(). draw() runs immediately after setup() and continues to run over and over again until you stop the program. You can tell draw() to go looking for other functions that come after it but you can’t tell it to look in setup() for something. Here’s a list of what’s usually in draw():

background() if you do want the background to be continually refreshed.
Stuff that involves updating such as continuous movement and if tests.
Calls to functions that occur later in the program.

The draw() block for this program is:

void draw() { 
 //make a grid for plotting. use map to put the grid where I want. 
 //horizontal gridlines and labels
 //SALARY data on y axis. data range is 0.5-8.33, make it 0 to 10
  
 stroke(200);
   
 for (int i = 0; i< salMax; i++) { 
  line(map(0,0,rbiMax,nudge*2,width),map(i,0,salMax,height-(2*nudge),0),
    map(rbiMax,0,rbiMax,nudge*2,width),map(i,0,salMax,height-(2*nudge),0)); 
  text(i,map(-5,0,rbiMax,nudge*2,width),map(i,0,salMax,height-(2*nudge),0)); 
 } 
 
 //vertical gridlines and labels
 //RBI data on y axis. data range is 15-127, make it 0-140. 
 
 for (int i = 0; i<rbiMax; i=i+10){ 
  line(map(i,0,rbiMax,nudge*2,width),map(0,0,salMax,height-(2*nudge),0),
    map(i,0,rbiMax,nudge*2,width),map(salMax,0,salMax,height-(2*nudge),0)); 
  text(i,map(i,0,rbiMax,nudge*2,width),map(-0.5,0,salMax,height-(2*nudge),0)); 
 } 
 
 //plot the data 
 //go through the array called "cards" line by line 
 //split each line where there is a tab 
 //create a new array of strings called "data" to hold this info
 //player name in first column
 //Rbi in the second column
 //Salary in third column
 
 for (int i = 0; i<cards.length; i++) {
  String[] data = split(cards[i], '\t'); 
  String Name = (data[0]); 
  float Rbi = map(float(data[1]),0,rbiMax,nudge*2,width); 
  float Salary = map(float(data[2]),0,salMax,height-(2*nudge),0); 
 
  //make a scatter plot of Rbi v. salary 
  ellipse(Rbi,Salary,2,2); 
  text(Name,Rbi,Salary); 
 } 
 
 //label the axes 
 fill(0); 
 textAlign(CENTER); 
 text("Runs Batted In, 2002", (width/2), height-20); 
 pushMatrix(); 
 translate(30,height/2); 
 rotate(-PI/2); 
 text("Salary ($ millions)",0,0); 
 popMatrix(); 
}

First I use a for loop to make horizontal gridlines and label them with numbers. Then I use a for loop to make vertical gridlines and label them with numbers. Note use of map() and the global variables salMax and rbiMax to create the ranges for the axes. Next I use a for loop to go through the cards array. This array is holding the information from the external file I read in setup(). Note that I use map() to put the data inside the ranges that I set with salMax and rbiMax. I make a scatter plot with ellipse() and I also label each plotted point with the corresponding player's name using the text() command. At the end I give the axes titles. Notice the use of pushMatrix(), popMatrix(), translate(), and rotate() to make the title of the y axis appear sideways.

After the draw() Block

This is where you put functions that are called in draw(). This is also where you put commands to save the results of computations or save the contents of the display window.

Here's what comes after draw() in this program:

void mousePressed(){
save("cardsRbiData3.png");
}

When I press the mouse inside the display window, an image file is saved into the data folder of this program called cardsRbiData3.png. That's it! The whole program!

Lesson 8 Assignment: Make a frequency-magnitude diagram

Analyze Data with Processing

Remember in Earth 501 when I made you create a frequency-magnitude diagram of a year's worth of earthquakes around the world? I know you were all hating me as you wrestled with the huge dataset and how to do all the sorting and counting by hand. The example program below is a better way to make that plot. First I went to the USGS earthquake catalog search page and made a catalog of all earthquakes for the year 2012. I used vi to get rid of the header info cluttering up the top and bottom of the file, then I used awk to extract the 9th column where the magnitudes are. Since that's the only data I care about for frequency-magnitude diagram purposes, why waste cpu reading in a bigger file? I'll just read in a 1-column file that contains the magnitudes.

//plot some data from an array
//this data comes from a usgs catalog file of global earthquakes in 2012.
//we will read the data in from a 1-column plain text file
 
String[] mags; //make the array and fill it with data later
 
void setup() {
   size(400, 400);
   background(255);
   mags = loadStrings("mags.txt"); //this is how we read in the file contents
   noLoop(); //just drawing a static plot once
}
 
void draw() {
   
   //working with the data follows from here
   println("number of earthquakes in the data file is " +mags.length);
   //we want to draw a cumulative frequency-magnitude diagram
   //we will plot magnitude on the x axis and number of eq's >= mag on the y axis.
   //so we have to make an array to hold this information, then go through the data file and count.
    
   //I know my smallest value is 1.1 and my biggest is 8.6 but let's go from 0 to 9 by tenths. That's a 90-element array.
    
   float[] xValue = new float[90];
   float[] cumHist = new float[xValue.length];
   float[] newCumHist = new float[xValue.length];
    
   //cumulative histogram
   //I nested the for loops so that we go through the whole data 
//file each time we populate one spot in the cumHist array
   for (int i=0;i<cumHist.length;i++){
      for (int j=0;j<mags.length; j++){
         float magvalue = float(mags[j]); //convert string to float
         int newMagvalue = int(magvalue*10); // turns a number like 3.4 into 34 for example
         if (newMagvalue>=i){
            cumHist[i]++;
         }
      }
   }
 
   for (int i=0; i<cumHist.length; i++){
      newCumHist[i]=map(log10(cumHist[i]),0,log10(max(cumHist)),height,0);
      xValue[i]=map(i,0,xValue.length,0,width);
      line(xValue[i],newCumHist[i],xValue[i],height);
   }
 
}
// Calculates the base-10 logarithm of a number
    float log10 (float x) {
    return (log(x) / log(10));
    }
 
//save a plot when mouse clicked in display window
void mousePressed(){
  save("fmplot.jpg");
}

So this program is more complicated than the cardinals rbi plotter or the world map plotter because we are actually doing some calculations with the data we are reading in. In order to make a histogram, I created an array to hold the histogram values, then I use a for loop to populate each entry in the histogram array. Each time I go to the next value of the histogram array I loop through the entire datafile of values to see if each one is greater than or equal to the place where I am in the histogram. If it is, I add one. At the end, I've populated the whole histogram. Remember that we need to plot logarithm of the cumulative number of earthquakes to make the plot have the slope of -1. log() in Processing is natural log (base e), so to get base 10, you have to take the natural log of the value and divide that by the natural log of 10. I do that in a function that comes after draw().

// Calculates the base-10 logarithm of a number
float log10 (float x) {
return (log(x) / log(10));
}

Note that instead of writing the log10 function with void at the beginning, I wrote float at the beginning instead. That's because this function is designed to output a number. It's much more like a function the way you learned it in your math classes. This function takes a number, calculates the log base 10 of it, and spits it out.

The plotting action of this program happens when I make a line at each place in the array whose height is log10 of the number of earthquakes greater than or equal to that magnitude. I could have made a scatter plot or whatever, but I thought a bar-graph-looking figure would be fun. That happens in this line:

line(xValue[i],newCumHist[i],xValue[i],height);

bar graph. display produced by program above.

Okay, okay! I know this plot violates all my own rules for good plot-making, such as, it doesn't have any labels anywhere, no title, etc. But the content is correct. The next step is to fix up the plot so that it has all of those things. Guess what! That's an exercise for you!

Exercise

Go back to the New Madrid frequency-magnitude problem set [3] from Earth 501, and pick one plot to recreate using Processing skills to make a plot. You can pick one year of earthquakes from New Madrid, or one year from the southern California catalog, or one year from the world (similar to what I did, above), or some combination of those.

What I'm looking for

In your program, I'd like to see correct use of loading a text file and working with array data. In your finished plot I'd like to see a correct cumulative frequency-magnitude diagram with nice-looking labels and a title.

Submit your program and all ancillary files needed to run your program to the Exercise 8.0 dropbox. Remember to zip your folder and submit that so that I have all the extra files needed.