In this lesson we will build on the Unix skills from Lesson 7 and work with data files inside the Processing environment.
By the end of this lesson, you should be able to:
This lesson will take us one week to complete, 7 - 13 July 2021. The deliverable for this lesson is one programming exercise, detailed on the last page of the lesson.
If you have any questions, please post them to our Questions? discussion forum (not e-mail) in Canvas. I will check that discussion forum daily to respond. While you are there, feel free to post your own responses if you, too, are able to help out a classmate.
Now we are ready to work with a new variable type called an array. An array is like a matrix. It is a variable that holds a bunch of the same type of data instead of just one piece of data. Arrays are useful for storing lists of similar things because it insulates you from having to declare so many variables. Declaring an array is similar to declaring other variables such as ints, floats, and strings, but you have to say what kind of data the array will store and use the square brackets [ ] so Processing knows that you want to make an array of that type of data.
For example, if you want to declare three integer variables called "x1", "x2", and "x3" and assign them the values 0, 25, and 6, here is one way to do it (hopefully this is old hat by now):
int x1 = 0; int x2 = 25; int x3 = 6; |
But you could make an array of integers instead. Let's do that and call the array "x." There are a few different ways to do this. Here they are:
int [] x; //declare it here. I'm telling Processing //to make an array of integers called "x" void setup () { size ( 400 , 400 ); x = new int [ 3 ]; // Creating it with the "new" command. x[ 0 ] = 0 ; // Assigning values to each element. x[ 1 ] = 25 ; x[ 2 ] = 6 ; } void draw () { // The rest of the program } |
In the program above, we first told Processing we wanted to make an array of integers called "x". Then inside the setup block we created the array using new and inside of the square brackets we tell Processing how big x will be, meaning how many integers it will hold. This allows your computer to allocate the right amount of memory to store the array. Then we assign values to each element in x. Remember that you always start counting at zero, not one in Processing! That means the first element of the array is denoted x[0], and the second element of the array is denoted x[1]. The number inside the square brackets tells you which element it is, not the value of that element. For example, x[2] = 6; means that the third element of the integer array x is assigned the value 6.
int [] x = new int [ 3 ]; //declare and create it here void setup () { size ( 400 , 400 ); x[ 0 ] = 0 ; //assigning here x[ 1 ] = 25 ; x[ 2 ] = 6 ; } void draw () { //the rest of the program } |
In the above example, we declared the array of integers up at the top before the setup() block. When we declared it we also told Processing how many elements the array would have in it. So we saved a step compared to the first example. Since we did this up at the top of the program before setup(), the array x is available inside both setup() and draw(). We are filling up the array with values later on the program. It is important to note that there is a difference between an empty element in an array and an element that is assigned the value zero. When we wrote the command
int[] x = new int[3]; |
This told Processing to make an array that has room for three integers but we didn't tell it what those integers would be. Until we do, the array is empty, which means that there are no values assigned to it. When we wrote
x[0] = 0; |
We assigned the value zero to the first spot in the array, which was previously nothing.
int [] x = { 0 , 25 , 6 }; void setup () { size ( 100 , 100 ); } void draw () { //rest of the program } |
In the example above, you don't have to use new if you do the declaration, creation, and assignment all at the same time. And you don't have to tell Processing how big the array is going to be because you are assigning the spots in the array to numbers right away.
The examples above told you how to make an array, but those programs don't really do anything, do they? So here's an example of a program that uses an array to draw a shape:
//use two arrays to draw a star int [] a= { 20 , 40 , 20 , 40 , 50 , 60 , 80 , 60 , 80 , 50 }; int [] b = { 20 , 45 , 60 , 60 , 80 , 60 , 60 , 45 , 20 , 35 }; void setup (){ size ( 100 , 100 ); noLoop (); //just a static shape drawn once } void draw (){ beginShape (); for ( int i = 0 ; i< a. length ; i++){ vertex (a[i], b[i]); } endShape ( CLOSE ); } |
In the program above, I declared, created, and assigned two arrays at the beginning of the program before setup() and draw(). Inside setup() I just set the size of the display window. Inside draw() I made a beginShape()/endShape() pair and in between them I wrote a for loop that loops through the arrays and sets the vertices one by one. This is the first time you have seen the dot operator . in a program. The dot operator in Processing is used to access some attribute of the variable and it is analogous to an apostrophe in English to designate possession. For instance, when I wrote a.length that means in English "the length of the array called a," or "a's length." For an array, its length is the number of elements in it, so a.length == 10 in the star-drawing program.
It is also useful to use a for loop to fill up an array with numbers instead of assigning each element by hand. Here's an example of a program that does this:
//five balls, each ball is faster than the one above it int num = 5 ; float [] xpos = new float [num]; float [] speed = new float [num]; float dy = 60 ; void setup (){ size ( 400 , 400 ); for ( int i = 0 ; i<num; i++){ xpos[i] = i; speed[i] = i+ 0.1 ; } } void draw (){ background ( 0 ); for ( int i = 0 ; i<num; i++){ float y=(i+dy)*i; ellipse (xpos[i],y, 20 , 20 ); xpos[i]+=speed[i]; if (xpos[i]> width - 10 ){ xpos[i]= 0 ; } } |
Here is a 12 second video demonstrating the code above (video is silent).
Another useful thing to do with arrays is to use them store the history of mouseX and mouseY positions.
Here is an example of that:
//snake of circles follows the mouse int num = 50 ; int [] x = new int [num]; int [] y = new int [num]; void setup (){ size ( 400 , 400 ); noStroke (); smooth (); fill (2S5, 100 ); } void draw (){ background ( 0 ); //go backwards through the loop //and shift all values to the right for ( int i = num- 1 ; i> 0 ; i--){ x[i] = x[i- 1 ]; y[i] = y[i- 1 ]; } //put current values of mouseX and mouseY at beginning x[ 0 ] = mousex; y[ 0 ] = mouseY ; //draw circles for ( int i = 0 ; i < num; i++){ ellipse (x[i],y[i], 20 , 20 ); } } |
In the program above, I declare and create two arrays before setup() and draw(). I use setup() for the usual things. Inside draw() there is a for loop that steps backwards through the two arrays and shifts all the array values to the right, then adds the current values of mouseX and mouseY to the beginning of each array. Think of this as a conveyer belt that runs from left to right. You are continuously adding a new value to the left side while the rightmost value falls off the other side and gets thrown away. Then I use another for loop that goes forwards through the array to draw all the circles. The whole effect is that a trail of however many circles assigned to the value "num" (in this case 50) follow the mouse around like a snake.
Three examples of reading in a file, doing something with its contents, plotting the result.
The box below contains the contents of a plain text file named "cards_data.txt" that I created using vi and have dragged and dropped onto my sketch. There are three columns separated by tabs. The first column is a list of last names of 2002 St. Louis Cardinals position players (no pitchers). The second column lists the number of RBIs each of them earned that year and the final column is each of their salaries in millions of dollars. The format of this file looks kind of ugly because some of the names are too long for the tabbing to work out right, but Processing won't care about this! We are going to read this text file into Processing using loadStrings. Then we are going to make a plot out of it.
Cairo 23 0.85 Drew 56 3.6 Edmonds 83 8.33 Marrero 66 1.5 Martinez 75 7.5 Matheny 35 3.25 Palmeiro 31 0.7 Perez 26 0.5 Pujols 127 0.9 Renteria 83 6.5 Robinson 15 0.32 Rolen 110 7.625 Vina 54 5.33
Here's the program, and a screenshot of the plot I made.
//this data is the number of RBIs in 2002 for Cards position players //and their salaries (in millions of $) //we will read the data in from a 3-column plain text file String[] cards; //make the array and fill it with data later void setup () { size ( 200 , 200 ); background ( 255 ); PFont font1; font1 = loadFont( "AbadiMT-CondensedLight-14.vlw" ); textFont (font1); smooth (); cards = loadStrings( "cards_data.txt" ); //this is how we read in the file contents noLoop (); //just drawing a static plot once } void draw () { //make a grid for plotting. use translate to leave some blank space for labels translate ( 50 , - 50 ); stroke ( 200 ); for ( int i = 0 ; i< 100 ; i=i+ 20 ) { line (i, 60 , i, height ); //vertical gridlines line ( 0 , height -i, 140 , height -i); //horizontal gridlines } //plot the data stroke ( 0 ); fill ( 75 ); println ( "number of lines in data file is " +cards. length ); //go through the array called "cards" line by line for ( int i = 0 ; i<cards. length ; i++) { //split each line where there is a tab //create a new array of strings called "data" to hold this info String[] data = split(cards[i], '\t' ); String Name = (data[ 0 ]); //player name in first column int Rbi = int (data[ 1 ]); //Rbi in the second column float Salary = float (data[ 2 ]); //Salary in third column //make a scatter plot of Rbi v. salary ellipse (Rbi, height -Salary* 10 , 10 , 10 ); //want the axes origin at lower left, so do (height - y data) } //label the axes //I did these by trial-and-error until I got them to look right fill ( 0 ); text ( "RBIs" , ( width / 2 )- 50 , height + 30 ); text ( "Salary $ mil" , - 50 , 100 , 30 , 100 ); text ( "20" , 15 , height + 15 ); text ( "60" , 55 , height + 15 ); text ( "100" , 95 , height + 15 ); text ( "2" , - 10 , height - 15 ); text ( "6" , - 10 , height - 55 ); text ( "10" , - 15 , height - 95 ); } |
The secondary purpose of this plot is to demonstrate a few new commands and how to deal with an external data file. Inside setup() we used loadStrings to read the file into the program. You want to do all the reading-in of external files in setup() because that block just runs once and you don't want your cpu hogged by re-loading your files every time you run through draw(). The file will be loaded in as lines of String variables. The first thing we want to do is tell Processing that we actually want three columns, not 13 lines. So, we go through the data file line by line and split each line where there are tabs. The syntax '\t' tells Processing to look for a tab.
The fact that the data comes in as strings works out great for the player names because they are words. But if we want to do some arithmetic with the numbers, or otherwise treat them as numbers, then we have to convert them to other variable types. Inside the for loop where we run through the data file, we first split each line into three pieces, making a three-element array named data. Then we rename each element in the data array and convert it to another variable type if we want to. For example, we made an integer array out of the RBI data, and we made a float array out of the salary data. The chunk of code that does all that is here:
String[] data = split(cards[i], '\t' ); String Name = (data[ 0 ]); int Rbi = int (data[ 1 ]); float Salary = float (data[ 2 ]); |
Then we plot Rbi v. Salary.
ellipse (Rbi, height -Salary* 10 , 10 , 10 ); |
The rest of the program is devoted to doing the background work that spreadsheet and other canned plotting programs do for you. Here's where we make some gridlines:
//make a grid for plotting. use translate to leave some blank space for labels translate ( 50 , - 50 ); stroke ( 200 ); for ( int i = 0 ; i< 100 ; i=i+ 20 ) { line (i, 60 , i, height ); //vertical gridlines line ( 0 , height -i, 140 , height -i); //horizontal gridlines } |
Here's where we label the axes:
//label the axes //I did these by trial-and-error until I got them to look right fill ( 0 ); text ( "RBIs" , ( width / 2 )- 50 , height + 30 ); text ( "Salary $ mil" , - 50 , 100 , 30 , 100 ); text ( "20" , 15 , height + 15 ); text ( "60" , 55 , height + 15 ); text ( "100" , 95 , height + 15 ); text ( "2" , - 10 , height - 15 ); text ( "6" , - 10 , height - 55 ); text ( "10" , - 15 , height - 95 ); |
Of course this is a slightly more tedious way to make a simple plot -- you would probably rather just paste this little datafile into your favorite program and not spend time tinkering with the way the plotting grid looks, right? Sure, but the point is that you can do it this way and you have complete control over the way it looks, which is cool!!
I got this data file from the NOAA coastline extractor, which is now obsolete but you can find a similar version of it at the CIA World Data Bank II [1]. I'm not giving you a screenshot of the datafile this time because it is a 1.2 Mb file with over 62,000 lines. And that's the low-res version! Try pasting that one into Excel! However the program that makes this plot is quite simple:
//plotting a map of the world String[] coast; void setup () { size ( 600 , 300 ); coast = loadStrings( "coastText.txt" ); noLoop (); } void draw () { background ( 255 ); float [] coastLon = new float [coast. length ]; float [] coastLat = new float [coast. length ]; float [] newCoastLon = new float [coast. length ]; float [] newCoastLat = new float [coast. length ]; for ( int i= 0 ; i<coast. length ; i++){ String[] data = split(coast[i], ' ' ); coastLon[i] = float (data[ 0 ]); coastLat[i] = float (data[ 1 ]); } for ( int i= 0 ; i<coastLon. length ; i++){ newCoastLon[i] = map (coastLon[i],- 180 , 180 , 0 , width ); newCoastLat[i] = map (coostLat[i],- 90 , 90 , height , 0 ); } stroke ( 50 ); for ( int i= 0 ; i<coastLon. length ; i++){ point (newCoastLon[i],newCoastLat[i]); } |
I should point out here that it is just a coincidence that I used map to make an actual map. In fact, map is handy anytime you have a variable with a natural range to it but you want it to be expanded or contracted proportionally to a different range. For example, here is a program where map is used to expand the greyscale [2], which normally goes from 0 to 255, to a range that goes from 0 to 400, the width of the screen:
// demo use of "map" float x; float y; void setup () { size ( 500 , 200 ); } void draw (){ x= random ( width ); y= random ( height ); int a= int (x); color colr = int ( map (a, 0 , width , 0 , 255 )); fill (colr); ellipse (x,y, 20 , 20 ); } |
Further explanation: This is a for loop. The loop variable goes from zero to five by ones. Look at the two lines of code inside the for loop. One of them is the command to draw a line. Drawing a line has four arguments and they are x1, y1, x2, y2. In this program, the x1 is always 35 and the x2 is always 50. The y1 and y2 look like a mess but they are the same as each other, so this code draws six horizontal lines.
The lines are evenly spaced between the top and bottom of the display window. That is what map does for us. We do not have to calculate where each line will be. We just map the values onto the range we want in the display window. I mapped the six lines from height-1 to 1 instead of height to 0 because lines plotted right on the border of the display window would not have shown up.
The other command inside the for loop puts a text label next to each horizontal line. In fact it writes the value of i, which is a number. You can see that text is placed with its origin at the bottom left, so that's why the number 5 is cut off.
The lines are black and the text is white because we didn't set fill or stroke. Processing therefore uses the defaults:fill(255) and stroke(0).
The program on this page demonstrates a lot of the skills we have learned this semester so I'm going to go through it piece by piece. First of all, here is the plot that it draws and the whole code.
//plot some data from an array //this data is the number of RBIs in 2002 for Cards position players //and their salary (in millions of $) //we will read the data in from a 3-column plain text file String[] cards; //make the array and fill it with data later int rbiMax= 140 ; //actual maximum in the datafile is 127, this gives some room int salMax= 10 ; //actual maximum in the datafile is 8.33, this gives some room int nudge = 30 ; //gives some border room for the plot void setup () { size ( 500 , 500 ); background ( 255 ); PFont font1; font1 = loadFont( "AbadiMT-CondensedLight-14.vlw" ); textFont (font1); smooth (); //this is how we read in the file contents cards = loadStrings( "cards_data.txt" ); noLoop (); } void draw () { //make a grid for plotting. use map to put the grid where I want. //horizontal gridlines and labels //SALARY data on y axis. data range is 0.5-8.33, make it 0 to 10 stroke ( 200 ); for ( int i = 0 ; i< salMax; i++) { line ( map ( 0 , 0 ,rbiMax,nudge* 2 , width ), map (i, 0 ,salMax, height -( 2 *nudge), 0 ), map (rbiMax, 0 ,rbiMax,nudge* 2 , width ), map (i, 0 ,salMax, height -( 2 *nudge), 0 )); text (i, map (- 5 , 0 ,rbiMax,nudge* 2 , width ), map (i, 0 ,salMax, height -( 2 *nudge), 0 )); } //vertical gridlines and labels //RBI data on y axis. data range is 15-127, make it 0-140. for ( int i = 0 ; i<rbiMax; i=i+ 10 ){ line ( map (i, 0 ,rbiMax,nudge* 2 , width ), map ( 0 , 0 ,salMax, height -( 2 *nudge), 0 ), map (i, 0 ,rbiMax,nudge* 2 , width ), map (salMax, 0 ,salMax, height -( 2 *nudge), 0 )); text (i, map (i, 0 ,rbiMax,nudge* 2 , width ), map (- 0.5 , 0 ,salMax, height -( 2 *nudge), 0 )); } //plot the data //go through the array called "cards" line by line //split each line where there is a tab //create a new array of strings called "data" to hold this info //player name in first column //Rbi in the second column //Salary in third column for ( int i = 0 ; i<cards. length ; i++) { String[] data = split(cards[i], '\t' ); String Name = (data[ 0 ]); float Rbi = map ( float (data[ 1 ]), 0 ,rbiMax,nudge* 2 , width ); float Salary = map ( float (data[ 2 ]), 0 ,salMax, height -( 2 *nudge), 0 ); //make a scatter plot of Rbi v. salary ellipse (Rbi,Salary, 2 , 2 ); text (Name,Rbi,Salary); } //label the axes fill ( 0 ); textAlign ( CENTER ); text ( "Runs Batted In, 2002" , ( width / 2 ), height - 20 ); pushMatrix (); translate ( 30 , height / 2 ); rotate (- PI / 2 ); text ( "Salary ($ millions)" , 0 , 0 ); popMatrix (); } void mousePressed (){ save( "cardsRbiData3.png" ); } |
Let's break down this program into chunks instead of trying to understand the whole thing at once. Think of this as how you would organize a paper, or a lab report. For example, in a scientific paper, you have to start with an introduction and some background knowledge or literature review, then explain your methods, then display your results, then interpret your results, and finally make some general conclusions. You can't tinker with this order too much or else your paper will not flow logically. You wouldn't want to jump right in with the interpretation of your results before you even explain what you were trying to find out and what measurements you made, right?
Similarly, there are some parts of a data-plotting program that have to go in order, as you already know. For example, if you want a shape to be outlined in blue, you have to set that color first and then draw the shape. If you want to plot some data from an external file, you first have to read the data into the program, then you can plot it. If you are making a scatter plot by hand on graph paper you first have to figure out where your origin will be, then figure out the range of the axes before you start plotting the points. Otherwise you won't know where your points should go.
If you use a software plotting application, the whole exercise of figuring out the range of the axes is done for you by the app. You can modify the axes after the fact, but you don't usually have to spend any time up front on that task. This is beneficial for saving time, but not beneficial if you want to teach your students the art of plotmaking. You want your students to look carefully at their data before just tossing it into a plotting program and hoping for the best.
I usually use the first few lines of a program to write a note to my future self about what the program is supposed to do, and where I got the data. This is also the place to declare global variables. Here's the preamble from the RBI plotter:
//plot some data from an array //this data is the number of RBIs in 2002 for Cards position players //and their salary (in millions of $) //we will read the data in from a 3-column plain text file String[] cards; //make the array and fill it with data later int rbiMax= 140 ; //actual maximum in the datafile is 127, this gives some room int salMax= 10 ; //actual maximum in the datafile is 8.33, this gives some room int nudge = 30 ; //gives some border room for the plot |
The first four lines are just notes. Then I declare an array of strings which is going to hold the data that I read in later. I set three global integers. I already looked at my data and I know that the maximum number of RBIs in my file is 127 and the maximum salary is 8.33 million. This tells me approximately what range I ought to use for my x and y axes. It's useful to use a variable here instead of an actual number because what if I write a really long program that refers to the x or y axis range a bunch of times? If I go back and want to change the range for aesthetic reasons or whatever then I'll have to go back and find each place where that number appears. If instead I set that number to what it represents up at the top then if I want to change it I can just change it one time.
If there is a draw(), there has to be a setup(). setup() runs exactly once and the commands are run in order. Variables declared in setup() are not available outside of setup(). Anything that does not need to be changed while the program is running can be put in setup() to save computation time. Here’s a list of things that are commonly in setup().
The setup() block for this program is:
void setup () { size ( 500 , 500 ); background ( 255 ); PFont font1; font1 = loadFont( "AbadiMT-CondensedLight-14.vlw" ); textFont (font1); smooth (); //this is how we read in the file contents cards = loadStrings( "cards_data.txt" ); noLoop (); } |
I set the size and background, I load a font, and I read in the data from a plain text file. It's important that I already declared the array cards before setup() because now I can use that array to hold the information I'm reading in and I can also use it later in draw() when I want to do something with it.
If there is a setup() there has to be a draw(). draw() runs immediately after setup() and continues to run over and over again until you stop the program. You can tell draw() to go looking for other functions that come after it but you can’t tell it to look in setup() for something. Here’s a list of what’s usually in draw():
The draw() block for this program is:
void draw () { //make a grid for plotting. use map to put the grid where I want. //horizontal gridlines and labels //SALARY data on y axis. data range is 0.5-8.33, make it 0 to 10 stroke ( 200 ); for ( int i = 0 ; i< salMax; i++) { line ( map ( 0 , 0 ,rbiMax,nudge* 2 , width ), map (i, 0 ,salMax, height -( 2 *nudge), 0 ), map (rbiMax, 0 ,rbiMax,nudge* 2 , width ), map (i, 0 ,salMax, height -( 2 *nudge), 0 )); text (i, map (- 5 , 0 ,rbiMax,nudge* 2 , width ), map (i, 0 ,salMax, height -( 2 *nudge), 0 )); } //vertical gridlines and labels //RBI data on y axis. data range is 15-127, make it 0-140. for ( int i = 0 ; i<rbiMax; i=i+ 10 ){ line ( map (i, 0 ,rbiMax,nudge* 2 , width ), map ( 0 , 0 ,salMax, height -( 2 *nudge), 0 ), map (i, 0 ,rbiMax,nudge* 2 , width ), map (salMax, 0 ,salMax, height -( 2 *nudge), 0 )); text (i, map (i, 0 ,rbiMax,nudge* 2 , width ), map (- 0.5 , 0 ,salMax, height -( 2 *nudge), 0 )); } //plot the data //go through the array called "cards" line by line //split each line where there is a tab //create a new array of strings called "data" to hold this info //player name in first column //Rbi in the second column //Salary in third column for ( int i = 0 ; i<cards. length ; i++) { String[] data = split(cards[i], '\t' ); String Name = (data[ 0 ]); float Rbi = map ( float (data[ 1 ]), 0 ,rbiMax,nudge* 2 , width ); float Salary = map ( float (data[ 2 ]), 0 ,salMax, height -( 2 *nudge), 0 ); //make a scatter plot of Rbi v. salary ellipse (Rbi,Salary, 2 , 2 ); text (Name,Rbi,Salary); } //label the axes fill ( 0 ); textAlign ( CENTER ); text ( "Runs Batted In, 2002" , ( width / 2 ), height - 20 ); pushMatrix (); translate ( 30 , height / 2 ); rotate (- PI / 2 ); text ( "Salary ($ millions)" , 0 , 0 ); popMatrix (); } |
First I use a for loop to make horizontal gridlines and label them with numbers. Then I use a for loop to make vertical gridlines and label them with numbers. Note use of map() and the global variables salMax and rbiMax to create the ranges for the axes. Next I use a for loop to go through the cards array. This array is holding the information from the external file I read in setup(). Note that I use map() to put the data inside the ranges that I set with salMax and rbiMax. I make a scatter plot with ellipse() and I also label each plotted point with the corresponding player's name using the text() command. At the end I give the axes titles. Notice the use of pushMatrix(), popMatrix(), translate(), and rotate() to make the title of the y axis appear sideways.
This is where you put functions that are called in draw(). This is also where you put commands to save the results of computations or save the contents of the display window.
Here's what comes after draw() in this program:
void mousePressed (){ save( "cardsRbiData3.png" ); } |
When I press the mouse inside the display window, an image file is saved into the data folder of this program called cardsRbiData3.png. That's it! The whole program!
Remember in Earth 501 when I made you create a frequency-magnitude diagram of a year's worth of earthquakes around the world? I know you were all hating me as you wrestled with the huge dataset and how to do all the sorting and counting by hand. The example program below is a better way to make that plot. First I went to the USGS earthquake catalog search page and made a catalog of all earthquakes for the year 2012. I used vi to get rid of the header info cluttering up the top and bottom of the file, then I used awk to extract the 9th column where the magnitudes are. Since that's the only data I care about for frequency-magnitude diagram purposes, why waste cpu reading in a bigger file? I'll just read in a 1-column file that contains the magnitudes.
//plot some data from an array //this data comes from a usgs catalog file of global earthquakes in 2012. //we will read the data in from a 1-column plain text file String[] mags; //make the array and fill it with data later void setup () { size ( 400 , 400 ); background ( 255 ); mags = loadStrings( "mags.txt" ); //this is how we read in the file contents noLoop (); //just drawing a static plot once } void draw () { //working with the data follows from here println ( "number of earthquakes in the data file is " +mags. length ); //we want to draw a cumulative frequency-magnitude diagram //we will plot magnitude on the x axis and number of eq's >= mag on the y axis. //so we have to make an array to hold this information, then go through the data file and count. //I know my smallest value is 1.1 and my biggest is 8.6 but let's go from 0 to 9 by tenths. That's a 90-element array. float [] xValue = new float [ 90 ]; float [] cumHist = new float [xValue. length ]; float [] newCumHist = new float [xValue. length ]; //cumulative histogram //I nested the for loops so that we go through the whole data //file each time we populate one spot in the cumHist array for ( int i= 0 ;i<cumHist. length ;i++){ for ( int j= 0 ;j<mags. length ; j++){ float magvalue = float (mags[j]); //convert string to float int newMagvalue = int (magvalue* 10 ); // turns a number like 3.4 into 34 for example if (newMagvalue>=i){ cumHist[i]++; } } } for ( int i= 0 ; i<cumHist. length ; i++){ newCumHist[i]= map (log10(cumHist[i]), 0 ,log10(max(cumHist)), height , 0 ); xValue[i]= map (i, 0 ,xValue. length , 0 , width ); line (xValue[i],newCumHist[i],xValue[i], height ); } } // Calculates the base-10 logarithm of a number float log10 ( float x) { return (log(x) / log( 10 )); } //save a plot when mouse clicked in display window void mousePressed (){ save( "fmplot.jpg" ); } |
So this program is more complicated than the cardinals rbi plotter or the world map plotter because we are actually doing some calculations with the data we are reading in. In order to make a histogram, I created an array to hold the histogram values, then I use a for loop to populate each entry in the histogram array. Each time I go to the next value of the histogram array I loop through the entire datafile of values to see if each one is greater than or equal to the place where I am in the histogram. If it is, I add one. At the end, I've populated the whole histogram. Remember that we need to plot logarithm of the cumulative number of earthquakes to make the plot have the slope of -1. log() in Processing is natural log (base e), so to get base 10, you have to take the natural log of the value and divide that by the natural log of 10. I do that in a function that comes after draw().
// Calculates the base-10 logarithm of a number float log10 ( float x) { return (log(x) / log( 10 )); } |
Note that instead of writing the log10 function with void at the beginning, I wrote float at the beginning instead. That's because this function is designed to output a number. It's much more like a function the way you learned it in your math classes. This function takes a number, calculates the log base 10 of it, and spits it out.
The plotting action of this program happens when I make a line at each place in the array whose height is log10 of the number of earthquakes greater than or equal to that magnitude. I could have made a scatter plot or whatever, but I thought a bar-graph-looking figure would be fun. That happens in this line:
line (xValue[i],newCumHist[i],xValue[i], height ); |
Okay, okay! I know this plot violates all my own rules for good plot-making, such as, it doesn't have any labels anywhere, no title, etc. But the content is correct. The next step is to fix up the plot so that it has all of those things. Guess what! That's an exercise for you!
Go back to the New Madrid frequency-magnitude problem set [3] from Earth 501, and pick one plot to recreate using Processing skills to make a plot. You can pick one year of earthquakes from New Madrid, or one year from the southern California catalog, or one year from the world (similar to what I did, above), or some combination of those.
In your program, I'd like to see correct use of loading a text file and working with array data. In your finished plot I'd like to see a correct cumulative frequency-magnitude diagram with nice-looking labels and a title.
Submit your program and all ancillary files needed to run your program to the Exercise 8.0 dropbox. Remember to zip your folder and submit that so that I have all the extra files needed.