The links below provide an outline of the material for this lesson. Be sure to carefully read through the entire lesson before returning to Canvas to submit your assignments.
Spatial data is special and can be problematic. However, as we will see later in this course, there are instances when this can be useful, so understanding how we deal with it is important.
By the end of this lesson, you should be able to:
Lesson 1 is one week in length. (See the Calendar in Canvas for specific due dates.) The following items must be completed by the end of the week. You may find it useful to print this page out first so that you can follow along with the directions.
Step | Activity | Access/Directions |
---|---|---|
1 | Work through Lesson 1 | You are in the Lesson 1 online content now. Be sure to carefully read through the online lesson material. |
2 | Reading Assignment | Before we go any further, you need to complete the reading from the course text plus an additional reading from another book by the same author:
|
3 | Weekly Assignment | Examine the Modifiable Areal Unit Problem while analyzing voting results |
4 | Term Project | Post term-long project topic idea to the project topic discussion forum in Canvas. |
5 | Lesson 1 Deliverables |
|
Please use the 'Discussion Forum' to ask for clarification on any of these concepts and ideas. Hopefully, some of your classmates will be able to help with answering your questions, and I will also provide further commentary where appropriate.
Read the course textbook, Chapter 1: pages 1-21.
Also read: Chapter 3, The Modifiable Areal Unit Problem, pages 29-44 in Lloyd, C. D. (2014). Exploring Spatial Scale in Geography. West Sussex, UK: Wiley Blackwell. This text is available electronically through the PSU library catalog.
The source of all the problems with applying conventional statistical methods to spatial data is spatial autocorrelation [3]. This is a big word for a very obvious phenomenon: things that are near each other tend to be more related than things that are far apart. If this were not true, the world would be a very strange and rather scary place. For example, if land elevation were not spatially autocorrelated, huge cliffs would be everywhere. Turning the next corner, we would be as likely to face a 1000-meter cliff (up or down, take your pick!), as a piece of ground just a little higher or a little lower than where we are now. An uncorrelated, or random, landscape would be extremely disorienting.
The problem this creates for statistical analysis is that much of statistical theory is based on samples of independent observations that are not dependent on one another in any way. In geography, once we pick a study area, we are immediately dealing with a set of observations that are interrelated in all sorts of ways (in fact, that's what we are interested in understanding more about).
Having identified the problem, what can we do about it? Depending on how deeply you want to go into it, quite a lot. At the level of this course, we don't go much beyond acknowledging the problem and developing some methods for assessing the degree of autocorrelation (Lesson 4). Having said that, there are some methods that recognize the problem and take advantage of the presence of spatial autocorrelation to improve analysis. These include point pattern analysis (Lesson 3) as well as interpolation and some related methods (Lesson 6) that recognize the problem and even take advantage of the presence of spatial autocorrelation to improve the analysis.
The ecological fallacy may seem obvious, but it is routinely ignored. It is always worth keeping in mind that statistical relations are meaningless unless you can explain them. Until you can develop a plausible explanation for a statistical relationship, it is unsafe to assume that it is anything more than a coincidence. Of course, as more and more statistical evidence accumulates, the urgency of finding an explanation increases, so statistics remain useful.
In Lesson 1's reading we learned about some of the reasons why spatial data is special, including spatial autocorrelation, spatial dependence, spatial scale, and the ecological fallacy.
This week in our project we will look closely at another pitfall, the Modifiable Areal Unit Problem (MAUP).
Often, MAUP is considered to consist of two separate effects:
Both effects are evident in the example in Figure 1.2 and further emphasized in Figure 1.3. The shape effect refers to the difference that may be observed in a statistic as a result of different zoning schemes at the same geographic scale. This is the difference between the 'north-south' and 'east-west' schemes. The scale or aggregation effect is observable in the difference between the original data and either of the two aggregation schemes.
MAUP is, if anything, more problematic than spatial autocorrelation. It is worth emphasizing just how serious the MAUP effect can be: in a 1979 paper, Openshaw and Taylor demonstrated by simulation that different aggregation (i.e., zoning) schemes could lead to variation in the apparent correlation between two variables from -1 to +1, in other words, the total range of variation possible in the correlation between two variables.
In practice, very little research has been done on how to cope with MAUP, even though the problem is very real. MAUP is familiar to politicians, who often seek to redistrict areas to their spatial advantage in a practice commonly referred to as "gerrymandering." In the practical work associated with this lesson, you will take a closer look at this issue in the context of redistricting in the United States.
In this week's project, we use an example from American electoral politics to revisit the modifiable areal unit problem (introduced in the reading for Lesson 1) and also as a reintroduction to ArcGIS, in case you've gotten rusty. This lesson's project is based on a real dataset. You will begin using the Spatial Analyst extension and learn to convert data between different spatial types. The ease with which you can do this should convince you that many of the distinctions made between different spatial data types are less important than they may at first appear.
As you complete certain tasks in Project 1, you will be asked to submit them to your instructor for grading.
The final page of the lesson's project instructions gives a description of the form the weekly project reports should take and content that we expect to see [5] in these reports. In this course you will not only practice conducting geographical analysis but also learn about how to communicate analytical results.
To give you an idea of the work that will be required for this project, here is a summary of the minimum items you will create for Project 1. You should also get involved in discussions on the course Discussion Forum about which approach of the three described in this lesson (polygon to point, KDE, or uniform distribution) is most appropriate, before choosing one.
Please use the 'Week 1 lesson discussion' forum to ask for clarification on any of these concepts and ideas. Hopefully, some of your classmates will be able to help with answering your questions, and I will also provide further commentary there, where appropriate.
Put this map in your write-up, along with a brief commentary (a few sentences or short paragraph will suffice) on what it shows: Are Republican districts more rural or more urban? What other patterns do you observe, if any?
Note that counties and congressional districts are not a precise fit inside one another, so many of the units in tx_voting108data are parts of counties that were subdivided among two or more districts.
Comment on the redistricting plan. What would you expect it to do to the balance of the electoral outcome? Can you tell, just by examining this map? Put your comments in your write-up.
In the next few pages, the steps required to estimate possible outcomes of the 2004 election based on the new districting plan by three different methods (Polygon to point, KDE, and uniform distribution) are described, along with an explanation of what each method will do.
After reviewing these methods, you should get involved in the discussions on the course Discussion Forum for this week's project, and then choose one of these methods and proceed to complete the project by producing a map of the estimated 2004 election result. Completion of the project also requires you to comment on your choice of method.
Before using any of the methods, you should check that the Spatial Analyst extension is enabled. You can do this in Project - Licensing. You should see that it says 'Yes' in the Licensed column in the Esri Extensions table.
Once Spatial Analyst is enabled, you should also select the following settings from the Analysis - Environments... menu:
With these settings completed, you are ready to try the alternative methods for generating voter population surfaces.
The first option is to make a set of points, one for each polygon in our 2002 voting data, and to use these to represent the distribution of the vote that might be expected in 2004. This approach assumes that it is close enough to assign all the voters in each polygon to a single point in the middle of that polygon.
This is a two-step process, creating a point layer, and converting the point layer to a raster.
To make the point layer:
NOTE: this is a step that requires an Advanced level license. Your student license should be an Advanced level license.
HOWEVER... because this is Lesson 1, we have provided the results of this step (with the 'force inside' option not selected), in the layer tx_voting108_centers layer.
By whatever means you arrive at a point layer, make sure you understand what is going on here. In particular, check to see if all polygons have an associated center point. Are all the 'centers' inside their associated polygons? (If not, why not?)
Once you have the point layer, you can make raster layers (one for Republican voters, and one for Democratic voters) as follows:
You will get a raster layer with No Data values in most places, and higher values at each location where there was a point in the centroids layer.
The second option is to use kernel density estimation (which we will look at in more detail in Lesson Three) to create smooth surfaces representing the voter distribution across space. This method requires you to choose a radius that specifies how much smoothing is applied to the density surface that is produced.
The steps required are as follows:
The search radius value here specifies how far to 'spread' the point data to obtain a smoothed surface. The higher the value, the smoother the density surface that is produced.
If you encounter problems, post a message to the boards, and also check that the map projection units you are using are meters (the easiest way to check this is to look at the coordinate positions reported at the bottom of the window as you move the cursor around the map view).
When processing is done, ArcGIS Pro adds a new layer to the map, which is a field of density estimates for voters of the requested type. You should repeat steps 1 and 2 to get a second field layer for the other political party, making sure that you calculate both fields with the same parameters set in the Kernel Density tool.
NOTE: If you have changed the Analysis Environment Cell Size setting from the suggested 1000 meters, then the density values you get are correct, but when it comes to summing them (in a couple more steps' time), they will not produce correct estimates of the total number of votes cast for each party. This is because the density values are per sq. km, but there is not one density estimate for every sq. km. For example, if you set the resolution to 5000 meters, then there will be one density estimate for every 25 sq. kms. To correct for this, you need to use the raster calculator to multiply the density surface by an appropriate correction factor: in this case, you would multiply all the estimates by 25.
NOTE 2: If you are running ArcGIS Pro 2.8 or later, there has been a change to how the KDE tool works. To produce the expected result, you will need to change the processing extent in the Environments tab to either the same as the 'tx_voting108' layer or to the 'union of inputs'. Otherwise, you'll see a result that appears to include only North Texas.
The third option is to assume that voters are evenly distributed across the areas in which they have been counted. We can build a surface under that assumption and base the final estimated votes in the new districts based on that. This method takes a couple of steps and creates two intermediate raster layers using the Spatial Analyst extension on the way to the final estimate.
A number of steps are required:
ArcGIS Pro will think about things for a while and should eventually produce a new layer (in this case called dems_sqkm). This layer contains in each cell an estimate of the number of voters of the specified party in that cell.
Whichever approach you have chosen to make voter population surfaces, it is the difference between the votes for each party that will determine the estimated election results; so, at this point, it is necessary to combine the two estimated surfaces in a 'map calculation'.
You should get an output surface that is positive in some areas (Republican majority) and negative in others (Democratic majority).
NOTE: If you are interested in comparing the results of the three methods, you need to calculate the difference between the votes for each party described above, and the step described on the next page for each of the methods (i.e., three times). The comparison between the three methods is optional.
Whichever approach you chose to make the Republican majority surface, the final step is to sum the estimated majorities that fall inside each new Congressional District in the newDistricts2003 layer to get a predicted outcome for the 2004 elections:
This will make a table of values, one for each new district, which is the SUM of the Republican majority surface values inside that district.
Once you've done this summation, you should be able to join the table produced to the newDistricts2003 layer via the DISTRICT field, so that there is now an estimated Republican majority for each of the new districts. Using this new attribute, you can make a map of the 'predicted' electoral outcome for the new districts similar to the original map for the 2002 election, but based on the estimated Republican majority results.
You should insert this new map into your write-up. Feel free to provide additional commentary on this topic. Points to consider include:
Here is a summary of the minimal deliverables for Project 1. Note that this summary does not supersede elements requested in the main text of this project (refer back to those for full details). Also, you should include discussions of issues important to the lesson material in your write-up, even if they are not explicitly mentioned here.
Your report should present a coherent narrative about your analysis. You should structure your submission as a report, rather than as a bullet list of answers to questions.
Part of the learning in this course relates to how to write up the results of a statistical analysis, and the weekly project reports are an opportunity to do this and to get feedback before you have to report on your term-long projects.
In your report, you should include:
Please put one of the following into the assignment dropbox for this lesson:
Make sure you have completed each item!
That's it for Project 1!
Throughout this course, a major activity is a personal GIS project that you will develop and research on your own (with some input from everyone else taking the course). To ensure that you make regular progress toward completion of the term project, I will assign project activities for you to complete each week.
The topic of the project is completely up to you, but you will have to get the topic approved by me. Pick a topic of interest, and use the different methods applied during this class to better understand the topic.
This week, the project activity is to become familiar with the weekly term project activities and to think about possible topics and post an idea you have in mind. Each week, the project activity requirements for that week will be spelled out in more detail on a page labeled 'Term Project', located in the regular course menu.
The breakdown of activities and points are as follows:
Below is an outline of the weekly project activities for the term-long projects. You should refer back to this page periodically as a handy guide to the project 'milestones'.
Week | Detailed description of weekly activity on term project |
---|---|
1 | Read this overview! Identify and briefly describe a possible project topic (or topics). Post this information to the 'Term Project: Project Idea' discussion forum as a new message. This posting should include a paragraph of no more than 1 page max!, 250 words max!, single spaced, and 11pt or 12pt sized font. |
2 | Submit a more detailed project proposal (2 pages max!, 600 words max!, single spaced, and 11pt or 12pt sized font) to the 'Term Project: Preliminary Proposal' discussion forum. This week, you should research your topic a bit more and start to obtain the data you will need for your project. Do not underestimate the amount of time you will need to devote to formatting and manipulating your data. The proposal must identify at least two (preferably more) data sources. Inspect your data sources carefully. It's important to get started on finding and examining your data early. You do not want to find out in Week 8 that your dataset is not viable or will take you two weeks just to format your data for use in the software! Over the next few weeks, you will be further developing your proposal, which will be reviewed by other students and by me, and revised to a more complete form due in Week 6. |
3 | This is a busy week, so no term project activity is due. Start getting your interactive peer review meeting date and time organized with your group. |
4 | Refine your project proposal and post it to the 'Term Project: Revised Proposal' discussion forum for peer review in Week 5. (2 pages max!, 800 words max!, single spaced, and 11pt or 12pt sized font) |
5 | Interactive peer review of term project proposals. You will meet with your group and provide interactive feedback. These reviews are intended to help you further refine your project idea and plans. |
6 | A final project proposal is due this week. This will commit you to some targets in your project and will be used as a basis for assessment of how well you have done. The final proposal should be submitted through the 'Term Project: Final Project Proposal' dropbox. (3 pages max!, 1,000 words max!, single spaced, and 11pt or 12pt sized font) |
7 | You should aim to make steady progress on the project this week. |
8 | You should aim to make steady progress on the project this week. |
9 | This week, you should complete your project work and post it as a PDF attachment on the 'Term Project: Final Discussion' discussion forum and let the class know that you are finished. The report should be suitable for anyone involved with the course to read and understand. Note that there are no other course activities at all this week, to give you plenty of time to work on completion of the project. You should also submit the final term project to the 'Term Project: Final Project Submission' dropbox. (20 pages max! inclusive of all required elements, approximately 10,000 words, single spaced, and 11pt or 12pt sized font) |
10 | Finally, the whole class, including the instructor, will use the posted project reports as a basis for reviewing what we have all learned (hopefully!) from the course. Contributions to discussions of one another's projects will be evaluated, as well as the projects themselves. Think of this as a virtual version of an in-class presentation of your project with an opportunity for members of the class (and the instructor) to ask questions, make suggestions, share experiences, review ideas, and so on. |
In addition to the weekly project, it is also time to start to think about your term project.
Deliverable: Post your topic ideas to the 'Term Project: Project Topic' Discussion Forum. One new topic for each student, please! Even at this early stage, if you have constructive suggestions to make, then by all means make them by posting comments in reply to their topic.
Questions?Please use the General Issuesdiscussion forum to ask any questions now or at any point during this project.
Submit a brief project proposal (2 pages max!, 600 words max!, single spaced, and 11pt or 12pt sized font) to the 'Term Project: Preliminary Proposal' discussion forum. This week, you should start to obtain the data you will need for your project. The proposal must identify at least two (preferably more) likely data sources for the project work, since this will be critical to success in the final project. Inspect your data sources carefully. It's important to get started on this early. You do not want to find out in Week 8 that your dataset is not viable! Over the next few weeks, you will be refining your proposal. During Week 5, you will receive feedback from other students. This will help you revise your final proposal which will be due in Week 6.
This week, you must organize your thinking about the term project by developing your topic/scope from last week into a short proposal.
Your proposal should include the following section headers and content for each section:
some background on the topic particularly, why it is interesting or a worthwhile research pursuit;
research question(s). What, specifically, do you hope to find out?
Data: list and discuss the data required to answer the question(s). Be sure to clearly explain the role each dataset will play.
Analysis Methods: What sort of statistical analysis and spatial analysis do you intend to carry out? I realize, at this point, that you may feel that your knowledge is too limited for this. Review Figure 1.2 and skim through the lessons to identify the methods you will be using. If you don't know the technical names for the types of analysis you would like to do, then at least try to describe the types of things you would like to be able to say after finishing the analysis (e.g., one distribution is more clustered than another). This will give me and other students a firmer basis for making constructive suggestions about the options available to you. Also, look through the course topics for ideas.
what sort of maps or outputs you will create
references to papers you may have cited in the background or methods section. Include URLs to data sources here (if you didn't include the URLs in the Data section.
The proposal does not have to be detailed at this stage. Your proposal should be no longer than 2 pages max!, 600 words max!, single spaced, and 11pt or 12pt sized font. Make sure that your proposal covers all the above points, so that I (Lesson 3 & 4) and others (Lesson 5 – peer review) evaluating the proposal can make constructive suggestions about additions, changes, other sources of data, and so on.
Additional writing and formatting guidelines are provided in the document (TermProjectGuidelines.pdf) in 'Term Project Overview' in Canvas.
No set deliverable this week. Read through other proposals and make comments. Continue to refine your project proposal.
Project Proposal: I will be providing each of you with feedback this week on the Preliminary Project Proposals you submitted last week (Week 2).
Peer-review Groups: I will be assigning you groups this week so that you have plenty of time to set up a meeting time during Week 5.
Revising and finalizing your project proposal. Over the next few weeks, you will be refining and extending your term project proposal and receiving feedback from me and your peers. To make this task less daunting and more manageable, we have broken down the process into a series of steps that allows you to evaluate new methods and their applicability to your project as well as receive feedback. Below is a quick overview of the steps each week.
Refine your project proposal and post the proposal to the 'Term Project: Peer-Review' discussion forum so that your peer review group can access the proposal.
Your revised proposal should take into account the feedback provided by the instructor in Week 3. Keep the revised proposal to be no more than 2 pages max!, 800 words max!, single spaced, and 11pt or 12pt sized font.
Deliverable: Post your project proposal to the 'Term Project: Revised Proposal' discussion forum and share it with your group.
This week, you will be meeting with your group to discuss your proposed project idea.
You should consider the following aspects:
Remember... you will be receiving reviews of your own proposal from the other students in the group, so you should include the types of useful feedback that you would like to see in those commentaries. Criticism is fine, provided that it includes constructive inputs and suggestions. If something is wrong, how can it be fixed?
Now, you will complete peer reviews. You will be reviewing the other group members' proposals for this assignment. Your instructor will divide the class into groups. The peer reviews will take place using Zoom. You should have arranged the time of the meeting with your group in Week 3 or 4.
Zoom: As a PSU student, you should have access to Zoom [7]. Once you have been assigned a group, work with your group to set up a mutually agreed upon date and time to meet via Zoom. One team member should agree to be "host". If you have not used Zoom yet, then use the following instructions to set up a meeting [8].
Deliverable: Post a summary of the comments and feedback you received from others about your term-long project in your group to the 'Term Project: Peer Review' discussion forum. Your peer review comments are due by the end of week 5.
Based on the feedback that you received from other students and from the instructor, revise your project proposal and submit a final version this week. Note that you may lose points if your proposal suggests that you haven't been developing your thinking about your project.
In your final proposal, you should respond to as many of the comments made by your reviewers as possible. However, it is OK to stick to your guns! You don't have to adjust every aspect of the proposal to accommodate reviewer concerns, but you should consider every point seriously, not just ignore them.
Your final proposal should be between 600 and 800 words in length (3 pages max!, 1,000 words max!, single spaced, and 11pt or 12pt sized font). The maximum number of words you can use is 800. You will lose points if your word count exceeds 800. Make sure to include the same items as before:
Additional writing and formatting guidelines are provided in the document (TermProjectGuidelines.pdf) in 'Term Project Overview' in Canvas.
Deliverable: Post your final project proposal to the Term Project: Final Proposal dropbox.
There is no specific deliverable required this week, but you really should be aiming to make some progress on your project this week!
There is no specific deliverable required this week, but you really should be aiming to make some progress on your project this week!
Your report should describe your progress on the project with respect to the objectives you set for yourself in the final version of your proposal. The final paper should be no more than 20 pages max! inclusive of all required elements specified in the list below, approximately 10,000 words, single spaced, and 11pt or 12pt sized font. As a reminder, the overall sequence and organization of the report should adhere to the following section headers and their content:
Paper Title, Name, and Abstract -
This information can be placed on a separate page and does not count toward the 20 page maximum.
Make sure your title is descriptive of your research, including reference to the general data being used, geographic location, and time interval.
Don’t forget to include your name!
The abstract should be the revised version of your proposal (and any last-minute additions or corrections based on the results of your analysis). The abstract should be no longer than 300 words.
Introduction - one or two paragraphs introducing the overall topic and scope, with some discussion of why the issues you are researching are worth exploring.
Previous Research - provide some context on others who have looked at this same problem and report on their conclusions that helped you intellectually frame your research.
Methodology
Data - describe the data sources, any data preparation/formatting that you performed, and any issues/limitations with the data that you encountered.
Methods - discuss in detail the statistical methods you used, and the steps performed to carry out any statistical test. Make sure to specify what data was used for each test.
Results - individually discuss the results with respect to each research objective. Be sure to reflect back on your intended research objectives, linking the results of your analysis to whether or not those objectives were met. This discussion should include any maps, table, charts, and relevant interpretations of the evidence presented by each.
Reflection - reflect on how things went. What went well? What didn't work out as you had hoped? How would you do things differently if you were doing it again? What extensions of this work would be useful, time and space permitting?
References - include a listing of all sources cited in your paper (this page does not count toward the 20 page maximum).
Next week, the whole class will be involved in a peer-review where you will discuss each other's work. You will be reviewing the members of your group from the initial peer-review session in week 5. It is important that you meet this deadline to give everyone a clear opportunity to look at what you have achieved.
Think of this as a virtual version of an in-class presentation of your project with an opportunity for members of the class (and the instructor) to reflect on each other's work.
In order to earn points for this deliverable, you should read through the term papers of those who were in your peer-review zoom session during week 5. Then, post your comments on the papers written by the members of your peer review session in the discussion forum. Here are a few things to consider as you review your group member's write-ups.
These comments can include, but are not limited to, feedback on interpreting the results, make suggestions regarding the methodology, share experiences on the writing process, mention other ideas on the research topic, and so on.
Contributions to discussions of one another's projects will be evaluated, as well as the projects themselves.
In addition to the weekly project, it is also time to start to think about your term project.
Again, we are looking for a "big picture" description of your term project for this deliverable.
Post your topic idea to the 'Term Project: Topic Idea' Discussion Forum. One new topic for each student, please!
Even at this early stage, if you have constructive suggestions to make for other students, then by all means make them by posting comments in reply to the topic.
Please use the Discussion - General Questions and Technical Help discussion forum to ask any questions now or at any point during this project.
NOTE: When you have completed this week's project, please submit it to the Canvas drop box for this lesson.
You have reached the end of Lesson 1! Double-check the to-do list on the Lesson 1 Overview page [10] to make sure you have completed all of the activities listed there before you begin Lesson 2.
For those of you who work with environmental data, this article might be of interest:
Dark, S. J. & D. Bram. (2007). The modifiable areal unit problem (MAUP) in physical geography. Progress in Physical Geography, 31(5): 471-479.
Links
[1] https://creativecommons.org/licenses/by-nc-sa/4.0/
[2] https://catalog.libraries.psu.edu/catalog/29321250
[3] https://www.e-education.psu.edu/geog586/node/873#spatial_autocorrelation
[4] https://sites.psu.edu/psugis/software/
[5] https://www.e-education.psu.edu/geog586/667
[6] https://pro.arcgis.com/en/pro-app/latest/get-started/pro-quickstart-tutorials.htm
[7] https://psu.zoom.us/
[8] https://agsci.psu.edu/it/how-to/create-a-zoom-meeting
[9] https://www.e-education.psu.edu/geog586/828
[10] https://www.e-education.psu.edu/geog586/809