Python Scripting with GDAL

Daniel Clement
7 min readApr 20, 2022

A tutorial on running GDAL processes within a Python script.

The Geospatial Data Abstraction Library (GDAL) is an extremely powerful, efficient, and ubiquitous Python library for processing both raster and vector geospatial data. This library is distributed as Free and Open-Source Software (FOSS). As such, GDAL has been widely adopted by the geospatial and remote sensing industries, underlying processes in pervasive GIS software such as ESRI’s ArcGIS, QGIS, as well as over +100 other software programs.

GDAL is most commonly used through the command line, which while useful for one-off processes, is not optimal when you want to string together more complex processes in a Python script or include functionality in a larger piece of software.

When I was first starting out in my coding journey, I would frequently come across this library, but was still lacking the command line skills necessary to utilize it, and therefore tended to avoid it. There are certainly other packages such as Rasterio, which depend on GDAL for their processing, that are more user friendly when it comes to using their capabilities in a python script directly versus the command line. I may cover some of these packages in future articles, however, this article will focus on GDAL.

GDAL contains a large number of functions, allowing for complex manipulation of raster data. You can project, mosaic, and clip raster datasets using GDAL Warp, convert raster data from one format to another using GDAL Translate, get detailed characteristic information on a raster dataset using GDAL Info, or create virtual raster mosaics in GDAL’s XML driven file format called VRT’s using GDAL Build VRT.

GDAL also has several functions to process and manipulate vector data through OGR. You can convert feature data from one format to another using OGR 2 OGR, or combine multiple datasets into one using ORG Merge.

For this tutorial, we will explore how to use GDAL Warp to clip a raster dataset to the extent of a specific area of interest (AOI) using a vector polygon shapefile.

As mentioned previously, GDAL processes are typically performed in the command line, and in order to access GDAL’s functionality inside of a Python script, we will employ Pythons built in os module. This module allows access to operating system specific operations, including things such as opening programs, creating/deleting files and folders, path manipulations, and much more.

Of interest to us, is the os.system() function. This function, when called on a windows machine, runs the given command (in string format) on the command line. Therefore, we can build our GDAL commands within the Python script, and then execute those commands using the os.system() function. Lets put this into practice!

Install and Import

The first step would be to set up an environment and install GDAL. Instructions on how to do so can be found here. For this tutorial, we will assume you have an environment with GDAL already installed. I use Anaconda as my package manager of choice. Installing GDAL in a conda environment is as easy as activating the environment and entering:
conda install -c conda-forge gdal

Once GDAL is installed in your environment, we must import the packages we will use for our script. Here were importing Python’s os module.

Creating Parameter Variables

Running a GDAL command can be done in as little as 1 or 2 lines of code. However, to keep our script extensible and to illustrate good object-oriented programming methodologies, we will create a function that will run our GDAL command.

First we need to think about what input we will need when running this process. These inputs will be made into several parameters in the form of variables. We will need the following:

  • in_raster: An input raster which you want to clip
  • clip_feature: A polygon shapefile representing the geometry you wish to clip the input raster to
  • out_raster: The resulting clipped raster file

Example Data

I will be using a Sentinel-2 satellite image of the Sardoba Dam in Uzbekistan. This image was captured after the dam breached and flooded the surrounding area in April of 2020. I will be clipping the image to a smaller extent in order to conduct some further processing and analysis. The image and AOI (red polygon) can be seen below.

Sardoba Dam, Uzbekistan | Sentinel-2 Natural Color Image | May 5th, 2020

Building a GDAL Warp Command

We now need to formulate our GDAL Warp command. To do this we will create a variable named command and set this variable to a string, which will will fill in with the required inputs. To do this, we will use a relatively new, but very cool string formatting option which debuted in Python 3.6 called f-strings. This is a method of inserting values into a string, and is much cleaner than using Pythons .format() or the % methods. If you are using a Python version earlier than 3.6, you can simply use the .format or % method instead. That being said we need to discuss the actual make up of a GDAL Warp command.

GDAL Warp takes a number of different arguments however, in order to keep this relatively simple, we will focus on a few key options. First will be the -dstnodata option. This is us telling GDAL Warp what value we want the areas outside of our AOI extent to have in the resulting raster. In our case, we want them to have a value of NoData. To do this, we will set the following option:
-dstnodata NoData.

We also need to specify our clipping feature, that is the polygon extent which we want to clip the raster to. GDAL Warp option to do specify this is called the -cutline , which is then followed by the path to the shapefile we want to use.
Example: -cutline C:\data\aoi_polygon.shp

The last two required inputs for the GDAL Warp command are going to be the input raster, and the resulting output raster, in that order. These are represented by strings corresponding to their path on the hard drive.

When you put all of these options together, you would get a command that looks like this:
gdalwarp -dstnodata NoData -cutline C:\data\aoi_polygon.shp C:\data\example.tif C:\data\example_clipped.tif

This command is great, if you only want to do this once. However, if this is a process you perform frequently, or need to perform on a large number of files, then having a dynamically formatted command where you can swap in and out the input raster, clipping feature, and output raster is extremely valuable. As such, we will use the f-strings we mentioned before to insert the values of the script parameters into the command string on the fly. F-strings are simple to format and are done so by simply adding the letter f in front of your string quotation mark, and then adding the variable name you want to insert inside of curly brackets, at the location in the string you want to insert it at: {variable}. For example, you could create a variable my_name, and set it equal to your name: my_name = “Dan” . Next, if you want to print what your name is you could simply do this print(f“My name is {my_name}”) . This would result in the following string being printed to the console My name is Dan .

We will also create the command variable, setting it equal to the the command string. This should look like the the code below:

Creating & Calling Our Function

Next, we will use the input variables created earlier, as arguments for our function. We will define our function def , naming it clip_raster, and set the clip_feature, in_raster, and out_raster variables as the arguments:
def clip_raster(clip_feature, in_raster, out_raster):

Finally, we utilize the os.system() function we mentioned earlier. This function will take the string command we just created and run it in a command line using the python environment you’re using. Doing so is extremely simple. All you need to do is put the variable containing the command string inside the parenthesis of the os.system() function: os.system(command).

When we put this all together we get the following function:

To actually call the function within the script you just need to write the name of the function, followed by the script input parameter variables inside parenthesis like this:
clip_raster(clip_feature, in_raster, out_raster)

Reaping the Rewards!

Running our newly created clip_raster function on our example image will produce the following result:

Clipped image with NoData values outside the AOI polygon.

This same method can be used to run any GDAL command, or any code in the command line from within Python. I hope this tutorial may be helpful for you, and keep you from struggling as I once did. Now, go forth and write code!

Lets Connect!

If you have enjoyed this tutorial, please follow me on Medium. And if you would like to connect, you can find me on LinkedIn or my GitHub page.

--

--