data:image/s3,"s3://crabby-images/1123b/1123b51541cca3e799251b7e714b5fc456afee08" alt="Print Print"
3.2 Installing the required packages for this lesson
This lesson will require quite a few different Python packages. We will take care of this task right away so that you then won't have to stop for installations when working through the lesson content. We will use our Anaconda installation from Lesson 2 and create a fresh Python environment within it. In principle, you could perform all the installations with a number of conda installation commands from the command line. However, there are a lot of dependencies between the packages, and it is easy to run into some conflicts that are difficult to resolve. Therefore, we provide a YAML (.yml) file that lists all the packages we want in the environment with the exact version and build numbers we need. We create two new environments by importing the .yml files using conda in the command line interface ("Anaconda Prompt").
The first environment (ACP3_NBK) will be for the Assignment and will contain the packages needed. The second environment (ACP3_WKLTGH) will be for the walkthrough and is optional. This environment will contain the packages needed for the species distribution walkthrough if you want to follow along. For reference, we also provide the conda commands used to create this environment at the end of this section and a separate section with some troubleshooting steps you can try if you are unable to get a working environment.
One of the packages we will be working with in this lesson is the ESRI ArcGIS for Python API, which will require a special approach to authenticate with your PSU login. You will see this approach further down below, and it will then be explained further in Section 3.10.
Creating the ACP3_NBK Anaconda Python environment
Please follow the steps below and let us know if you run into issues.
1) Download the .zip file containing the .yml files from this link: ACP3_YML.zip, then extract the file .yml's it contains. There are 4 files included in this .zip. One (ACP3_NBK) for the Assignment and one (ACP3_WLKTGH) with packages needed for the Species distribution walkthrough. Two files having "_Clean" at the end that are provided as alternatives if the using the more explicit yml does not create a working environment. You may want to have a quick look at the content of the ACP3_NBK.yml text file to see how, among other things, it lists the names of all packages for this environment with version and build numbers. Using a YAML file greatly speeds up the creation of the environment as the files are downloaded and dependencies don't need to be resolved on the fly by conda.
2) Open the program called "Anaconda Prompt" located in the start menu Anaconda Program folder, which was created during the Anaconda installation from Lesson 2.
3) Make sure you have at least 5GB space on your C: drive (the environment will require around 3.5-4GB). Then type in and run the following conda command to create a new environment called ACP311_NBK (for Anaconda Python 3.11 Notebook) from the downloaded .yml file. You will have to replace the ... to match the name of the .yml file, and adapt the path to the .yml file depending on where you have it stored on your harddisk.
conda env create --name ACP311_NBK -f "C:\489\...\ACP3_NBK.yml"
Conda will now create the environment called ACP3x_NBK (x being 11) according to the package list in the YAML file. This can take quite a lot of time; in particular, it will just say "Solving environment" for quite a while before anything starts to happen. If you want, you can work through the next few sections of the lesson while the installation is running. The first section that will require this new Python environment is Section 3.6. Everything before that can still be done in the ArcGIS environment you used for the first two lessons. When the installation is done, the ACP3x_NBK environment will show up in the environments list in the Anaconda Navigator and will be located at C:\Users\<user name>\Anaconda3\envs\ACP3x_NBK.
4) Let's now do a quick test to see if the new environment works as intended. In the Anaconda Prompt, activate the new environment with the following command (you'll need to activate your environment every time you want to use it):
activate ACP311_NBK
Then type in python and in Python run the following commands; all the modules should import without any error messages:
import pandas import cartopy import matplotlib from osgeo import gdal import geopandas import shapely import arcgis from arcgis.gis import GIS
As a last step, let's test connecting to ArcGIS Online with the ArcGIS for Python API mentioned at the beginning. Run the following Python command:
gis = GIS('https://pennstate.maps.arcgis.com', client_id='fuFmRsy8iyntv3s2')
Now a browser window should open up where you have to authenticate with your PSU login credentials (unless you are already logged in to Penn State). After authenticating successfully, you will get a window saying "OAuth2 Approval" and a box with a very long token at the bottom. In the Anaconda Prompt window, you will see a prompt saying "Enter token obtained on signing in using SAML:". Use CTRL+A and CTRL+C to copy the entire code, and then do a right-click with the mouse to paste the code into the Anaconda Prompt window. The code won't show up, so just continue by pressing Enter.
If you are having troubles with this step, Figure 3.18 in Section 3.10 illustrates the steps. You may get a short warning message (InsecureRequestWarning) but as long as you don't get a long error message, everything should be fine. You can test this by running this final command:
print(gis.users.me)
This should produce an output string that includes your pennstate ArcGIS Online username, so e.g., <User username:xyz12_pennstate>. More details on this way of connecting with ArcGIS Online will be provided in Section 3.10.
If creating the environment from the .yml file did NOT work:
As we wrote above, importing the .yml file with the complete package and version number list is probably the most reliable method to set up an exact clone of a Python environment for this lesson but there have been cases in the past where using this approach failed on some systems. Sometimes conda can fail to resolve the dependencies and removing some of the obscure packages from the .yml can help. Repeat the steps from above starting at step 2 using the version of the yml that has "_Clean". This yml only contains the main packages and sets the required version on the most important packages. If that does not work, you can try the troubleshooting steps outlined in section 3.2.1 or try creating the environment from scratch. Let your instructor know if you have reached this point and have not been able to get a working env.
Building from the Commandline
Maybe you are interested in the steps that were taken to create the environment from scratch. We therefore list the conda commands used from the Anaconda Prompt for reference below.
1) Create a new conda Python 3.11 environment called ACP311_NBK with some of the most critical packages:
conda create -n ACP311_NBK -c conda-forge -c esri python=3.11 nodejs arcgis=2.3.1 gdal geopandas cartopy matplotlib jupyter ipywidgets
2) As we did in Lesson 2, we activate the new environment using:
activate ACP311_NBK
3) Then we add the remaining packages:
conda install -c rpy2 maptools geopandas cartopy
4) Once we have made sure that everything is working correctly in this new environment, we can export a YAML file similar to the one we have been using in the first part above using the command:
conda env export > "C:\<output path>\ACP311_NBK.yml
Creating the ACP311_WLKTGH Anaconda Python environment
As stated above, this environment will contain packages used for the species distribution walkthrough in section 3.6. Be sure to get the ACP3_NBK environment working first before attempting this one. This environment contains more packages and increases the complexity of the dependency tree solving.
1) Using the ACP3_WLKTGH.yml file from the .zip file that we downloaded, follow steps 2 and 3 from above but switch ACP3_NBK to ACP3_WLKTGH
Let's now do a quick test to see if the new environment works as intended. In the Anaconda Prompt, activate the new environment with the following command (you'll need to activate your environment every time you want to use it):
activate ACP311_WLKTGH
Then type in python and in Python run the following commands; all the modules should import without any error messages:
import arcgis import bs4 import pandas import cartopy import matplotlib from osgeo import gdal import geopandas import rpy2 import shapely
If creating the environment from the .yml file did NOT work:
NOTE that this is only for walkthrough and is not required for the assignment. It is only needed if you want to execute the walkthrough code on your own. Before creating this environment, ensure that your Notebook environment for the assignment from above is working.
As we wrote above, repeat the steps from above starting at step 2 using the yml with the suffix _Clean. If that does not work, you can try the troubleshooting steps outlined in section 3.2.1 or try creating the environment from scratch. Let your instructor know if you have reached this point and have not been able to get a working env.
Potential issues
There is a small chance that the from osgeo import gdal will throw an error about DLLs not being found on the path which looks like the below:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\professor\anaconda3\envs\AC3x\lib\site-packages\osgeo\__init__.py", line 46, in <module>
_gdal = swig_import_helper()
File "C:\Users\professor\anaconda3\envs\AC3x\lib\site-packages\osgeo\__init__.py", line 42, in swig_import_helper
raise ImportError(traceback_string + '\n' + msg)
ImportError: Traceback (most recent call last):
File "C:\Users\professor\anaconda3\envs\AC3x\lib\site-packages\osgeo\__init__.py", line 30, in swig_import_helper
return importlib.import_module(mname)
File "C:\Users\professor\anaconda3\envs\AC3P\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 657, in _load_unlocked
File "<frozen importlib._bootstrap>", line 556, in module_from_spec
File "<frozen importlib._bootstrap_external>", line 1166, in create_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: DLL load failed while importing _gdal: The specified module could not be found.
On Windows, with Python >= 3.8, DLLs are no longer imported from the PATH. If gdalXXX.dll is in the PATH, then set the USE_PATH_FOR_GDAL_PYTHON=YES environment variable to feed the PATH into os.add_dll_directory().
In the event this happens the fix is to (every time you want to import gdal you would need to do this):
import os
os.environ["USE_PATH_FOR_GDAL_PYTHON"]="YES"
from osgeo import gdal
It's possible the above fix doesn't work and the error is still thrown which will require checking the PATH environment variable in the Anaconda Prompt by typing "path" and checking that c:\osgeo4w\bin or osgeo4w64\bin is in the list and if not, either add it using set path=%PATH%;c:\osgeo4w\bin
or going to the System Properties -> Environment Variables -> System Variables -> Path and add it to the list. Sometimes paths to these \bin directories reference old versions or versions that were uninstalled and cause import errors. Keeping the paths updated within your System Environment can help avoid errors like these. Verify that the paths listed here are valid by navigating to them in Windows Explorer. If you have (or had) mixed 32-bit and 64-bit QGIS applications on your PC at some point, it may be worth cleaning the older, unused versions from your pc.