It is an exciting time for Cloud GIS. There has been a huge upswing in interest in geospatial data, and the computing infrastructure to support the visualization and analysis of this data has been developed as well. While the infrastructure to support cloud GIS is now accessible to everyone, the exact forms it will actually take is still unknown.
The purpose of this course is twofold: first, to give you practice with using a variety of cloud GIS services, and second, to give you an understanding of what cloud computing is more broadly, and how it should and should not be applied in various GIS problem contexts.
You will be creating a variety of cloud GIS services during this course. Most of these will fall under the heading of server GIS. The platforms for this will include GIS infrastructure as a service, GIS platforms as a service, and GIS software as a service. These platforms will be defined in subsequent lessons, along with the five essential aspects of cloud computing.
By the time this course is over, you should feel comfortable setting up and using server GIS using cloud computing, and you should have an understanding of how cloud computing can help solve GIS problems.
The first lesson begins with a discussion of the definition of cloud computing followed by the use of the Amazon Web Services (EC2) to create your own cloud by starting and stopping a virtual machine. Finally, we have our first cloud computing discussion, on the overall topic of Cloud GIS.
At the successful completion of this lesson you should be able to:
Cloud computing is a concept and a phrase that has become increasingly popular. However, there are a number of competing definitions for what "cloud computing" entails. One very effective definition of cloud computing, which consists of five essential characteristics, three service models, and four deployment models, comes from the National Institute of Standards and Technology (NIST). The final version of this definition [1] was published in October of 2011, and is available from NIST. During this course, we will explore how the essential characteristics and service models in particular can be used in a GIS context. Let's consider them now in turn.
The five essential characteristics can be hard to recall at times. During one sleepless night, I came up with the mnemonic NO-REM as a way to remember them.
N stands for Network access. Cloud computing services can be accessed from a variety of networked devices, such as workstations, mobile phones, and other servers. A GIS example is a geospatial information service that allows access from browsers and from other servers.
O stands for On-demand self-service. Cloud computing services should be accessible at will, without having to consult and get permission from a human being. A GIS example is the ability to start multiple map servers by using a browser interface.
R stands for Resource pooling. Cloud computing resources such as processing power, storage, and input-output (to use the Von Neumann architecture [2]) are provisioned for different clients from a common set of physical assets. Clients need not know (and often cannot know) exactly where the physical assets are. A GIS example is sharing computers owned and administered by Esri, Amazon, or Microsoft, without knowing or caring how these computers are being provisioned (as long as they stay up!)
E stands for Elasticity. Cloud computing services can be scaled up and down to meet demand and decrease waste. A GIS example is processing a large spatial data set quickly using many cloud computers, which are then discarded when the task is done.
M stands for Measured service. Cloud computing services are paid for by resources used (such as processing power, storage capacity, or number of user accounts). A GIS example is paying for a map server only for the hours it is up and the bandwidth it uses, rather than a whole computer.
All of the GIS examples of the essential characteristics are ones that you will experience during this course.
The three service models have a definite order to them. Infrastructure as a service (IaaS) is the more fundamental layer, followed by Platform as a service (PaaS) and Software as a Service (SaaS). I was quite impressed with the diagram explaining IaaS, PaaS, and SaaS [3] at venturebeat.com, so I slightly changed it and re-made it below:
As you can see, in the traditional, computer-under-your-desk model, you manage everything. Moving to a Cloud Infrastructure as a Service (IaaS) means that the von Neumann troika of IO, storage, and processors, are managed by the provider, and you bring everything else. In a GIS context, this would mean that you rent computing power from a cloud provider, and use it to solve GIS problems. We'll use the Amazon Elastic Compute Cloud (EC2) running ArcGIS Server to study IaaS.
A Cloud Platform as a Service means that the vendor provides the physical layer, the operating system, and the runtime. You are still able to add your own data and write software that runs on the cloud platform. Facebook, from a developer's perspective, offers a PaaS because APIs (programming tools) are available to write programs that run in Facebook. Google App Engine is another example: Google gives you the computing power and you write the apps.
Software as a Service (SaaS) is a bit easier to understand. You just use it without having to install anything or write any code. Online email is a very good example of SaaS. So is Google Maps. Many GIS and mapping companies are offering mapping and spatial data processing through a SaaS model.
Next, we will get started on the leading IaaS: Amazon's EC2.
The Amazon Elastic Compute Cloud (EC2) is an infrastructure as a service (IaaS) cloud. This means that it provides computing power and resources that you can use for a fee. You take care of running the software; Amazon EC2 provides the hardware.
To understand Amazon EC2, it’s important to understand the concept of virtualization. When you use your computer at home, it’s very likely that you have one physical “box” sitting on or below your desk, with a power button, disk drives, a video card, and so on. The relationship between the physical machine and the machine you log into is 1 to 1. Virtualization, however, is the idea of hosting multiple “virtual machines” on a single physical box. These virtual machines share some hardware resources, but they appear to the end user as distinct machines that can be logged into and administered separately.
You may have used virtual machines at your place of employment; many companies are using them in the workplace because they are more flexible and cost efficient. Most often, an IT administrator will purchase or choose a powerful machine and configure it to be a “virtual server”, which is a physical machine that hosts multiple virtual machines. Obviously, it takes a powerful computer to act as a virtual server, and it takes a fair amount of IT administration skill to set one up.
Enter Amazon EC2. When you work with Amazon EC2, you create and run virtual machines in Amazon’s data centers. You don’t have to know too much about the details of the virtual servers (nor does Amazon want to reveal this). The idea is that you can focus on the software on your server and let Amazon take care of the hardware needs.
Of course, there is a cost for using these resources. You are charged hourly fees for the computing power used, and for the amount of data that you store on Amazon EC2. Most of the things you can do or use on Amazon EC2 have some sort of fee associated with them, but unless you are running a high-traffic site with many gigabytes of data, computing power and disk space are the two biggest cost concerns.
The benefits of Amazon EC2 can be enormous in some situations. Here are a few of the immediate advantages:
Before going forward, there are two important vocabulary terms that you should understand regarding Amazon EC2:
Esri has created an AMI that has ArcGIS Server installed and configured. You will use these AMIs to create EC2 instances, thereby getting the server software running on Amazon EC2. Once you get the instance running, you can log into it using an application called Windows Remote Desktop. This is the same way that you would remotely log in to any other computer in your network, except this time the machine is outside your network, running on Amazon EC2.
You can perform all of these steps on your own home computer as long as it has an Internet connection. In fact, it's recommended that you use your home computer because some workplace IT departments have placed restrictions on accessing computers outside the firewall (like Amazon EC2 instances) using Remote Desktop. Please note that you cannot use a personal hotspot through a mobile phone to log in to your EC2 instances.
Security is one of the biggest issues that causes organizations to hesitate when they consider cloud computing. It is a natural reaction, after all for most modern organizations, their data is their lifeblood. How could we entrust that to people we don't know and have little control over?
It may surprise you to know that in the eyes of many security experts, using cloud computing can make your data more secure, not less secure, as long as the correct procedures are followed. Which makes sense when you think about it, it's another aspect of the benefits of scale. I have a lot of confidence in the computer security folks here at Penn State; they are excellent. I'm also pretty confident that Amazon and Google employ even better computer security experts. However, no experts can protect you from yourself; if you start up a server, expose it to the world, and fail to patch it, it will get hacked. Therefore, the key is to follow the correct procedures.
This page describes how to safely complete this course and gives some general guidance and pointers to further information on how to safely administer a server in production.
In this course, we will be learning about and experimenting with a variety of cloud computing technologies. The most important method you should follow to use them safely is simply to follow the directions in the course in their entirety. Don't skip steps. Further general guidelines for security in this course can be divided into two parts. We will first use Infrastructure as a Service, by starting server instances on Amazon's compute cloud. Later in the course, we will use Platform as a Service and Software as a Service services like ArcGIS Online, Carto, and Mapbox. Good security practices for the second part of the course (platform and software as services) are easily described; use a strong, unique password for each service. If managing multiple strong passwords is an issue for you (I think it is for most humans), consider using software like 1Password [4]. The rest of this section gives some guidelines for safe usage of Amazon server instances for the Infrastructure as a Service portion of the course.
Even if you are not going to be sharing copies of your instances, you may need to think about how to safely put your server into production.
A production server is one that is serving your business content, live, to end users in a highly available and highly secure manner. A complete guide to production server security is unfortunately outside the scope of this course. We are not setting up servers that are ready to go into production. However, we will be covering some aspects of server security as we go along in the course. In addition, here are some general principles that you should know:
A good resource for how to safely run a server in production is the National Institute for Standards and Technology's "General Guide to Server Security [7]," accessible at csrc.nist.gov/publications/nistpubs/800-123/SP800-123.pdf. It describes the necessary planning steps, how to secure your operating system, how to secure your server software, and how to maintain your security. It also describes the multiple personnel roles that are involved in good security practices. If you will be involved in helping to administer a server in production, I suggest you read this guide and follow its recommendations.
OK, enough of the heavy stuff; it's time to start your first cloud computer in this course, an Amazon instance!
Let's create an EC2 instance that is running Windows. The purpose of this exercise is to get you familiar with the basics of Amazon EC2 using some familiar software. Before you attempt this part of the lesson, you need to make sure you've obtained an Amazon account and enabled it for use with Amazon EC2. This should have been covered during the course orientation.
If you have any doubt about the above, contact the course instructor.
Here are the steps for getting Windows running on Amazon EC2. Since Amazon can potentially update their site at any given time, some minor adjustments may be required for these steps. Contact the instructor if you have questions, or, if you find an issue that you are able to work around, please mention it in a comment in the Technical Discussion forum.
Once you launch an instance, the instance starts automatically and your Amazon bill begins accruing. It's very important to understand that you begin amassing charges right away; Amazon does not wait until you log in to your instance to begin charging you. In order to control costs, you need to stop your instance whenever you aren't using it. Before you take a break, please immediately continue reading the next section of the lesson to understand how to properly stop and start your instance.
Fortunately, you don't have to repeat all the previous steps to complete the Launch Instance wizard every time you want to use Amazon EC2. Once you have an instance created, it's fairly easy to log in, start, and stop it. Before we talk about logging in, let's cover the basics of how to stop and start the instance. You'll need to begin using these techniques immediately, every time you use your instance, in order to keep costs down.
When people begin using Amazon EC2, they often ask about the difference between logging out, stopping, or terminating an instance.
If you fail to stop your instances after you have finished working, you will quickly use up the Amazon Free-Tier credits and start seeing charges to your credit card.
Below are some reference instructions that you can use to stop and start your instance (Do not stop your instance for at least 10 minutes after you first launch it. It needs time to configure Windows for the first time.)
You can return to this page throughout the course if you need help remembering how to stop and start your instance.
Use the instructions below to stop a Windows instance like the one you created in Lesson 1. Do not use these instructions for ArcGIS Server instances.
This stops the clock on the charges for running your instance. When an instance is stopped, no one can use your server and you cannot log in.
Use the instructions below to start a Windows instance like the one you created in Lesson 1. When you start your instance, it takes a few minutes to boot up, but you shouldn't have to wait the full 10 minutes that you waited when you first launched the instance. Always follow the instructions below when you start your instance:
After a few minutes your instance will be ready to use with its Elastic IP. After enough times of repeating this action you should have these instructions memorized.
To view your accrued charges at any time, go to the AWS Management Console Billing page [11]. You can see detailed reports of your usage of each part of the service by clicking the Bill Details button at the upper right of the main Billing page.
I recommend that you view your credits after every lesson so that you understand whether you are in danger of excessive charges. If you consistently stop your instances after you are finished working, your costs will remain small.
After you've given your instance about 10 minutes to configure Windows, you can get ready to log in to the instance and start working with your software. The first thing you need to do is get the Administrator password so you can log in.
If something goes wrong with your instance, you can terminate it and create a new instance.
Typically when you log in to your instance, you'll open Remote Desktop Connection and type the user name Administrator, followed by the new password that you set above. To end your session, you can just close the remote desktop window. If you are going away for more than an hour, also make sure to stop your instance in the AWS Management Console.
The first few lessons of this course have relatively lengthy technical walkthroughs with lots of moving parts, therefore your assignments will largely consist of showing evidence via screen captures that you were able to complete these steps. I'll also ask one or two reflection questions that you will answer to accompany these images. In Lesson 4, you'll have the opportunity to complete a more complex walkthrough and make a video demonstration of your work.
Please create a new document and put the following in it:
Submit your document to the lesson drop box on Canvas.
Cloud GIS is an emerging set of technologies, concepts, and work practices, rather than a settled set of services. This course is intended to provide you with both practical skills that you need to use cloud computing in your everyday work, and also the background and critical thinking you need to make decisions about how cloud computing should be used in your organization.
Most lessons feature a cloud computing discussion page that presents an important aspect of cloud computing and encourages you to envision its potential impact on GIS systems. Many of these will be accompanied by required readings that will guide you toward the discussion. We'll use several free online chapters of Rosenburg and Mateos' The Cloud At Your Service. Buying the entire book is optional, but I will list other chapters from it that will be helpful in some of the discussions.
The trends we will cover this term are:
I'll ask you to participate in threaded discussions with your classmates based on several prompts that I will provide on the topics above. These constitute an important part of your participation grade.
On the next page, you'll find your first cloud computing assignment. In this assignment, we will examine what cloud computing is, how we are using, and how we plan to use cloud computing in GIS.
First, please read Chapter 1 in Rosenberg and Mateos's The Cloud at Your Service [13]. The book is available as a free preview from the publisher's website. Purchasing the entire book is optional for this class, and not required.
Also, please read this paper:
and this one:
Cloud Computing: A Solution to Geographical Information Systems (GIS) [15].
Please note that I am not endorsing the opinions expressed in these papers, instead, I am hopeful that these papers will stimulate a good discussion. Please feel free to agree or disagree with the authors.
Links
[1] http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
[2] https://en.wikipedia.org/wiki/Von_Neumann_architecture
[3] http://venturebeat.com/2011/11/14/cloud-iaas-paas-saas/
[4] http://1password.com
[5] http://alestic.com/2009/09/ec2-public-ebs-danger
[6] https://docs.rightscale.com/cm/dashboard/clouds/generic/security_groups_concepts.html
[7] https://csrc.nist.gov/publications/detail/sp/800-123/final
[8] http://aws.amazon.com/console
[9] http://aws.amazon.com/pricing/ec2
[10] https://docs.aws.amazon.com/vpc/latest/userguide/vpc-ip-addressing.html
[11] https://console.aws.amazon.com/billing/home
[12] http://windows.microsoft.com/en-US/windows-vista/Remote-Desktop-Connection-frequently-asked-questions
[13] https://www.manning.com/books/the-cloud-at-your-service
[14] https://doi.org/10.1080/17538947.2011.587547
[15] http://www.enggjournals.com/ijcse/doc/IJCSE11-03-02-006.pdf