Skip to main content

Python

Python is a fantastic tool for analyzing geospatial data and building applications. It has an incredibly rich and active community for all topics in computing.

Tutorials

Relevant Python Packages

Here we'll focus on some popular and useful packages for processing geospatial data. There is also a huge wealth of other packages out there waiting to be used! Don't be afraid to search for packages tailored to your specific use case.

Rasterio & GDAL

tip

GDAL can be difficult to install on Windows systems! One easy way around this problem is using Anaconda instead of the regular Python package system.

Rasterio is a Python library for reading, writing, and manipulating geospatial raster data. It is built on top of the popular GDAL library, which is a powerful tool for working with geospatial data of all kinds. Rasterio makes it easy to read and write raster data in various formats, and provides a range of functions and methods for manipulating and analyzing raster data. You can use Rasterio to perform operations like cropping, resampling, and reprojecting raster data, as well as to extract metadata and other information from raster datasets. Rasterio is a valuable tool for anyone working with geospatial raster data, whether you are a GIS specialist, data scientist, or just want to perform some simple raster analysis tasks.

NumPy

Numpy is a powerful Python library for working with large, multi-dimensional arrays and matrices of numerical data (exactly what geospatial rasters are!). It is a key component of many scientific computing projects and is widely used in machine learning and data analysis. Numpy provides a range of functions and methods for creating and manipulating arrays, as well as performing mathematical operations on them. It is known for its efficiency and performance, making it a popular choice for working with large datasets.

info

Numpy is able to compute complex array operations like index calculations for NDVI, NDWI, VREI2, etc! Check out the article on Index Calculation and Plant Health Indices for more details on calculating indices.

GeoPandas

Geopandas is a Python library that allows you to work with geospatial data, such as maps and geographical information. It is built on top of the popular data manipulation library pandas, and makes it easy to work with geospatial data in a pandas-like way. Geopandas allows you to perform spatial operations, such as merging and joining spatial data (vector data especially), as well as visualize and plot spatial data on maps. Geopandas is built on top of other popular libraries like pandas and numpy, and is an essential part of many data analysis projects.

Xarray

Xarray is a Python library for working with multi-dimensional arrays and labeled data. It is particularly useful for working with large and complex datasets, such as those found in the Earth sciences, and offers a range of tools for manipulating and analyzing such data. Xarray introduces labeled dimensions and coordinates to NumPy arrays, which allows you to work with data that has explicit meaning beyond just its numerical values. This makes it easier to understand and work with your data, and also enables you to perform operations that preserve the metadata and structure of your data. Xarray is built on top of NumPy and other scientific computing libraries, and is a valuable tool for anyone working with multi-dimensional data.

Dask

Dask is a flexible parallel computing library for Python that allows you to scale your computations beyond the limits of a single machine. It is particularly useful for working with large and complex datasets that don't fit in memory, and can be used as a drop-in replacement for NumPy, pandas, or other popular scientific computing libraries. Dask provides a range of tools for parallelizing your code and distributing your computations across multiple cores or even distributed across a cluster of machines. It is a powerful tool for data scientists and analysts working with large datasets, and is often used in combination with other libraries like NumPy and pandas.

Juptyer Notebooks

Jupyter notebooks are useful for hyperspectral analysis with Python because they provide a flexible platform for working with geospatial data and libraries. For example, notebooks can be used to load, manipulate, and visualize geospatial data using libraries such as GeoPandas and Folium. This allows users to quickly explore and analyze their data in a more interactive and intuitive way. Additionally, notebooks support the ability to mix code, text, and visualizations in a single document, making it easy to communicate and share geospatial analysis and results. Overall, Python notebooks are a powerful tool for working with geospatial data and performing geospatial analysis.

Spectral Python (SPy)

Spectral Python is a Python library for interacting with hyperspectral imagery. It is able to read in a number of file formats, including ENVI and AVIRIS. SPy has functions for a number of different algorithms, including dimensionality reduction, and classification using supervised & unsupervised methods. SPy is also capable of reading in spectral libraries for materials, including building ENVI spectral libraries.

PySptools

PySptools is also a Python library for interacting with hyperspectral data. It contains a number of different functions ranging from machine learning to endmembers extraction algorithms. PySptools also contains a Scikit-learn interface (in alpha!), which allows it to seamlessly integrate with the Scikit-learn ecosystem. This enables users to integrate PySptools into scikit-learn pipelines for easy ML model development.

More Resources

python-geospatial (list compiled by Qiusheng Wu)