Prioritize...
In this section, we will again use R's ability to loop over a list of data. Such loops are important in automating tasks such as retrieving data from a list of model runs.
Read...
The NOMADS database archives as many as 50 past model runs. For example, at any one time, you are able to retrieve any GFS model output for the past 14 days. This means that we can do some fairly interesting data retrievals including looking at the initialization state of a model variable over time and studying how a predicted variable changes in time (sometimes referred to as "D-model, D-T"). The problem arises in that each model run is stored separately in the database. Unlike retrieving all of a single model's forecasts in a single call, we now need to retrieve all of the models' single forecasts. This will require some automation on R's part. Let's start a new script.
First, we start with a modified rNOMADS script...
# check to see if rNOMADS has been installed # if not, install it if (!require("rNOMADS")) { install.packages("rNOMADS") } library(rNOMADS) #get the available dates and urls for this model (14 day archive) model.urls <- GetDODSDates("gfs_0p50") model.url.list <- model.urls$url # Here's the point I want my_lat <- 40.793313 my_lon <- -77.866796 # Define our retrieval parameters variable <- "tmp2m" time <- c(0, 0) # Analysis run, index starts at 0 lon <- c(560, 580) # Just a rough estimate of lat <- c(260, 270) # where the point is located lev<-NULL # DO NOT include level if variable is non-level type # This will be our placeholder for the specific point data we want model.timeseries <- data.frame()
Notice that in this script, we do not worry about the latest model date, nor do we care what the latest model run is. The reason is that we are going to retrieve them all and build a time series of the result. Therefore, we need to define two functions. The first takes specific model runs on a specific day, retrieves the partial grid of data, and extracts the value from the grid point nearest my location. This result is then added as a row to the model.timeseries
data frame.
retrieve_data <- function (run, model) { # get the partial grid model.data <- DODSGrab(model, run, variable, time, levels=lev, lon, lat) # reorder the lat/lon model.data$lon <- ifelse(model.data$lon>180,model.data$lon-360,model.data$lon) # select the closest point model.value <- model.data$value[which(abs(my_lon-model.data$lon)<0.5/2 & abs(my_lat-model.data$lat)<0.5/2)] model.fdate <- model.data$forecast.date[which(abs(my_lon-model.data$lon)<0.5/2 & abs(my_lat-model.data$lat)<0.5/2)] #convert to degrees F model.fvalue <- (model.value-273.15)*9/5+32 # add the row to the data frame # remember that we have to use the <<- operator # to pass the value outside the function row <- data.frame(Date=model.fdate, Value=model.fvalue) model.timeseries <<- rbind(model.timeseries,row) }
Now, we need a function that will loop over all of the daily runs for a given model. Notice that in this function we use sapply
which says, "take all of the data in model.runs$model.run
and apply it to the function retrieve_data
. The parameter "model
" is an extra variable that gets passed to the retrieve_data
function as well. If you think in terms of programming "loops", this is the inner loop.
loop_over_runs <- function (y) { # Get the model runs for a date model.runs <- GetDODSModelRuns(y) # run the loop_over_runs function for each time in model.runs sapply(model.runs$model.run, retrieve_data, model=y) }
Now that we have our two functions, we can add the rest of the code. It's relatively simple. First, there's the primary sapply call that sends the list of models to the function loop_over_runs. This command is the outer loop of the retrieval process. Next, there is just the standard plotting routine that we have been using in the last few tutorials.
# for each model in model.url.list, loop over all of the runs sapply(model.url.list, loop_over_runs) # Display times in GMT attr(model.timeseries$Date, "tzone") <- "GMT" # Make a plot plot(model.timeseries$Date,model.timeseries$Value, type="n", xlab = "Date/Time (GMT)", ylab = "Temperature (F)", main = paste0("GFS Initialized 2-m Temperature\n", format(head(model.timeseries$Date,1), "%m-%d-%Y %HZ")," to ", format(head(model.timeseries$Date,1), "%m-%d-%Y %HZ")) ) lines(model.timeseries$Date,model.timeseries$Value,lwd=2,lty=1,col="red")
Voila! Here is the plot I generated. Can you think of other interesting things we might want to look at using this approach? What about how the temperature forecast for a certain time is changing as the forecast interval decreases? What would such a plot tell you about the stability of that temperature forecast?