NOTE: For this assignment, you will need to record your work on a word processing document. Your work must be submitted in Word (.doc or .docx) or PDF (.pdf) formats. A formatted answer sheet is available on CANVAS as a convenience for students enrolled in the course.
Each problem (#2 through #6) is equally weighted for grading and will be graded on a quality scale from 1 to 10 using the general rubric as a guideline. Thus, a score as high as 50 is possible, and that score will be recorded in the grade book.
The objective of this problem set is for you to work with some of the data analysis/statistics concepts and mechanics covered in Lesson 2, namely the coefficient of variation and multi-variate regression. You are welcome to use any software or computer programming environment you wish to complete this problem set, but the instructor can only provide support for Excel and an online tool that is introduced in this problem set should you need help. The instructions also will assume you are using Excel and that online tool.
Now, in CANVAS, in the menu on the left-hand side for this course, there is an option called Files. Navigate to that, and then navigate to the folder called Problem Sets, inside of which is another folder called PS#2. Download the data file PS2.xlsx in that folder to your computer, and open the file in Excel. You should see five time series: HURDAT (“unadjusted”) tropical-cyclone (TC) count for the Atlantic basin, Vecchi-Knutson (2008) (“adjusted”) TC count, August-October Main Development Region (MDR) sea-surface temperature (SST), December-March North Atlantic Oscillation (NAO) index, and December-February Niño3.4 index (which measures ENSO phase). All five series cover 1878 to 2019.
On the other hand, the MDR SST, NAO, and Niño3.4 time series should be thought of as independent variables, or predictor variables. The justification for the choice of these predictor variables is that warmth of the surface of the seawater in the Main Development Region [3] (an area of the Atlantic Ocean roughly east of the Caribbean Sea), pressure patterns in the North Atlantic region, and pressure patterns in the equatorial Pacific region, respectively corresponding to MDR SST, NAO, and Niño3.4, are thought to be related to TC activity in the Atlantic basin.
Considering the “unadjusted” TC count as the predictand, calculate the linear trend line equation (i.e., single-variable regression equation) for each of the three predictor variables. Also calculate the correlation coefficient r and the coefficient of variation R2 for each of the three regression equations. Moreover, for each of the three regressions, state how much of the variation in the predictand is explained by the predictor. Report your results on the answer sheet.
A sidebar: Excel could be used to construct a multi-variate regression, but it is not the best tool for this task. In the real world, you might leverage your computer programming skills to accomplish this task, but knowledge of programming is not a pre-requisite for this course. More likely, you might have available a statistical software package for the task, but such packages generally are proprietary and cost money and are not a required technical capability for this course. In past offerings of this course, for this problem set, we asked students to use the regression tool that you saw used in video demonstrations in Lesson 3, but it presents technical problems depending on the Web browser being used. Therefore, we suggest use of this online multi-variate regression calculator [4], but you may use any tool you wish.
Using this online multi-variate regression calculator [4] (or any tool you wish), calculate two multi-variate regressions using all three predictor variables available to you: one to predict the “unadjusted” TC count and one to predict the “adjusted” TC count. The equation should be given in the form
where y is the predictand, is a constant, and , , and are coefficients for predictor variables , , and respectively. If you are using the online calculator, display output to four decimal places, and include no interactions. Find the coefficient of variation for each regression, and interpret it, comparing it to the coefficients of variation for the single-variable regressions you calculated in #3 and #4. Note that is an output of the online calculator. Report your results and discussion on the answer sheet.
Links
[1] https://www.aoml.noaa.gov/hrd/hurdat/Data_Storm.html
[2] https://www.gfdl.noaa.gov/bibliography/related_files/gav0802.pdf
[3] https://www.researchgate.net/figure/The-MDR-for-tropical-cyclones-hurricanes-in-the-tropical-North-Atlantic-between-9-and_fig11_249611398
[4] https://stats.blue/Stats_Suite/multiple_linear_regression_calculator.html