METEO 815
Applied Atmospheric Data Analysis

Best Fit-Regression Example: Precip, Tmax, & PZI

Prioritize...

After finishing this page, you should be able to set up data for a regression analysis, execute the analysis, and interpret the results.

Read...

In our previous lesson, we saw a strong linear relationship between precipitation and drought. We will continue with our study on the relationship between temperature, precipitation, and drought. 

Precipitation and Drought

Let’s begin by loading in the dataset and extracting out precipitation and drought (PZI) variables.

Show me the code...

Your script should look something like this:

# load in all annual meteorological variables available
mydata <- read.csv("annual_US_Variables.csv")
precip <- mydata$Precip
drought <- mydata$PZI

We know the data is matched up, we already saw that the relationship was linear, and we’ve already calculated the correlation coefficient, so let’s go right into creating the linear model. Use the following code to create a linear model between precipitation and drought. 

Note that the only parameter I used was the formula, which tells the function how I want to model the data. That is, I want to use precipitation to estimate drought. The 'summary' command provides the coefficient estimates, standard error estimates for the coefficients, the statistical significance of each coefficient, and the R-squared values. 

Let’s pull out the R-squared value and estimate the MSE. We do this by using the output variable ‘residuals’.

The result is an MSE of 0.17 and an R-squared of 0.80, which means that the model captures 80% of the variability in the drought index. The last step is to plot our results. Use the code below to overlay the linear model on the observations.

Here is a version of the figure:

Best fit linear model: Annual Precipitation vs.Annual Drought Index (PZI) for the US
The figure above is the annual precipitation versus the drought index, PZI, for the United States with the best fit linear model. 
Credit: J. Roman

Visually and quantitatively, the linear model looks good. Let’s look at maximum temperature next and see how it compares as a predictor of drought. 

Maximum Temperature and Drought

Let’s look at the linear model for maximum temperature and drought. I have already extracted out the maximum temperature (Tmax). Fill in the missing code below to create a linear model.

I've pulled out the R-squared value, fill in the missing code below to estimate the MSE value and save it to the variable 'MSE_droughtTmax'. 

You should get a MSE of 0.56 and an R-Squared of 0.37. Quantitatively, the fit is not as good compared to the precipitation. Let’s confirm this visually. I've set up the plot for the most part. You need to add in the correct coefficient index (1 or 2) for the plotting at 'b'. 

Here is a larger version of the figure:

Best fit linear model: Annual Maximum Temperature vs. Annual Drought Index (PZI) for the US
The figure above is the annual maximum temperature versus the drought index, PZI, for the United States with the best fit linear model. 
Credit: J. Roman

You can see that the line doesn't fit these data as well. Compared to the precipitation model (with an R^2 of 0.80), the maximum temperature does not do as well at predicting the drought index (R^2 of 0.37). 

Application

We use linear models for prediction. Let’s predict the 2016 PZI value based on the observed 2016 precipitation and maximum temperature values. Since we are using the model for an application, we should check that the model meets the assumption of normality. To do this, I suggest plotting the histogram of the residuals. If you believe the model meets the requirement then continue on.

I've already set up the PZI prediction using precipitation, fill in the missing code to predict PZI using temperature. 

The PZI index is predicted to be 0.76 using precipitation and -1.45 for maximum temperature. We can provide a range by using the RMSE from each model. For precipitation, the RMSE is 0.42 and for temperature it is 0.75. At the 95% confidence, the PZI is estimated to be in the range of -0.08 and 1.59 using precipitation and -2.94 and 0.05 using the maximum temperature. The actual 2016 PZI value was -0.2 which is within the range predicted for both models.

One thing to note. The 2016 observed value for precipitation and maximum temperature was within the range of observations used to create the linear models. This means applying the linear model to the 2016 value was a realistic action. We did not extrapolate the model to values outside the data range.