A segment of the RGB satellite map of Budapest based on the 10 m resolution Sentinel data.
How to acquire, analyze, and visualize satellite images using Python
All images in this article were created by the author.
The European Space Agency has been running its Sentinel missions supporting Copernicus, the Earth Observation component of the European Union’s space program, with various types of remote sensing data, such as radar and multi-spectral imaging instruments for land, ocean and atmospheric monitoring.
There are six ongoing Sentinel missions, three of which can be easily accessed via their Python API. These are, quoting the official source:
Sentinel-1 is a polar-orbiting, all-weather, day-and-night radar imaging mission for land and ocean services. Sentinel-1A was launched on 3 April 2014, and Sentinel-1B on 25 April 2016. Both were taken into orbit on a Soyuz rocket from Europe’s Spaceport in French Guiana. The mission ended for Sentinel-1B in 2022 and plans are in force to launch Sentinel-1C as soon as possible.Sentinel-2 is a polar-orbiting, multispectral high-resolution imaging mission for land monitoring to provide, for example, imagery of vegetation, soil and water cover, inland waterways and coastal areas. Sentinel-2 can also deliver information for emergency services. Sentinel-2A was launched on 23 June 2015 and Sentinel-2B followed on 7 March 2017.Sentinel-3 is a multi-instrument mission to measure sea-surface topography, sea- and land-surface temperature, ocean colour and land colour with high-end accuracy and reliability. The mission supports ocean forecasting systems, as well as environmental and climate monitoring. Sentinel-3A was launched on 16 February 2016 and Sentinel-3B joined its twin in orbit on 25 April 2018.
After some additional digging, we can learn that Sentinel-1 data goes down to a few meters when it comes to spatial resolution. Then, the highest resolution for visual data for Sentinel-2 is 10 meters, while Sentinel-3 operates at a much larger scale, depending on the sensor type, at the scale of 100km.
Alright, so we know where to get the satellite data, and it also looks like there is a huge selection of the source (sensor) and spatial resolution. One may point out that this is still the tip of the iceberg, as this list of satellite data sources also outlines. And then, what do we use these different types of satellite data for? Well, for starters, here is a selection of 50+ use cases.
As a general rule of thumb, I would say that the use case, the specifics of the problem and the geospatial characteristics and terrain of the target area are all important factors in determining the right data source for you. However, in daily practice, in my experience, these are the main driving factors:
price (preferably to explore, free, which applies to Sentinel)has a spatial resolution of a few meters, and even smaller urban structures can be capturedwith at least a few bands, such as visible and near-infrared.temporal frequency
These aspects make Sentinel-2 probably the most widely used satellite data source in the geospatial data community. Building on these components, in this article, I will show you how to get Sentinel data and what you should expect when downloading it. I will also thoroughly explore the different possibilities and the temporal evolution of the image records and the information stored within.
1. Data acquisition
First, I will set up the API connection following the official documentation and sample code. Additionally, I will need a target area from which I download images. For easy debugging, I picked my hometown, Budapest. I will use OSMNx to download its admin boundaries.
import osmnx as ox # version: 1.0.1
import matplotlib.pyplot as plt # version: 3.7.1
city = ‘Budapest’
admin = ox.geocode_to_gdf(city)
admin.plot()The admin boundaries of Budapest.
Now tap on the Sentinel API:
from sentinelsat import SentinelAPI, read_geojson, geojson_to_wkt # version 0.14
# to get an account, sign up here: https://apihub.copernicus.eu/apihub
user = <add your user name
password = < add your password >
api = SentinelAPI(user, password, ‘https://apihub.copernicus.eu/apihub’)
To do a query, it’s better to have a smooth polygon that specifies the location. For this purpose, I created the convex hull of the admin area of Budapest:
# to simplify the query, I extract the convex hull of the input polygon
admin_polygon = admin.convex_hull.geometry.to_list()[0]
admin_polygon
Output:
The convex hull of Budapest.
Search for satellite images at our specific location, within a given time-frame and a platform of our choosing. The latter shall be Sentinel-A. Besids, we can also filter based on cloud coverage — maning that if an image is too cloudy then we just discard it right away.
# here we can specifcy the location (based on a polygon)
# the time frame
# the space probe
# and the level of cloud-coverage accepted
products = api.query(admin_polygon,
date=(‘20150623’, ‘20231006’),
platformname=’Sentinel-2′,
cloudcoverpercentage=(0, 100))
len(products)
As the output of these cells shows, following the Sentinel docs, it turns out that between 2015–06–23 (beginning of the mission) and 2023–10–06 (when I was writing this article), there were altogether 3876 sat images recorded that overlap with the admin boundaries of Budapest. I set the cloud cover percentage to 100, meaning no filter was applied based on cloud coverage. Therefore, we should have all the image identifiers from the past eight years.
I also note here that the resulting list of products contains the identifier and metadata of all the satellite images however, not the images themselves. Additionally, if I would repeat the same with Sentinel-3, I would get nearly 20k image records as a result — although at a much lower resolution.
2. Explore the meta data
Let’s transform the list of products into a Pandas DataFrame and start analyzing it!
import pandas as pd # version: 1.4.2
products_gdf = api.to_geodataframe(products)
products_gdf = products_gdf.sort_values([‘beginposition’], ascending=[True])
print(products_gdf.keys())
print(len(products_gdf.keys()))
products_gdf.head(3)
Result of this block:
Preview on a query result.
After counting the number of keys in the table, indexed by the satellite image identifiers, one may already get a sense of how rich this data is with the 41 feature columns in it.
While there is a lot of technical information in these fields, I would like to take a closer look at just a few features. On the one hand, the space and time dimensions are encoded in generation date and begin position (as date-time information) and geometry (as a polygon, GIS, data type). On the other hand, there are a couple of interesting metrics characterizing land-cover type based on the images: cloudcoverpercentage that we have already seen at the queries, vegetationpercentage, waterpercentage**, and snowicepercentage. These environmental indices are derived from the different spectral properties of different materials. Note: these values are all aggregated scores capturing the grand average of the entire tile. More on this topic here.
3. The spatial dimension
As we have a geometry dimension, let’s take a look at how this looks on a map! I will do so by visualizing a set of random tiles, which, after a few runs, are totally representative. For the visualization, I used Folium with the CartoDB Dark_Matter base map.
import folium
import geopandas as gpd
x, y = admin_polygon.centroid.xy
m = folium.Map(location=[y[0], x[0]], zoom_start=8, tiles=’CartoDB Dark_Matter’)
# visualize a set of random tiles
polygon_style = { ‘fillColor’: ‘#39FF14’, ‘color’: ‘black’, ‘weight’: 3, ‘opacity’: 0}
geojson_data = products_gdf[[‘geometry’]].sample(10).to_json()
folium.GeoJson(
geojson_data,
style_function=lambda feature: polygon_style
).add_to(m)
# add the admin boundaries on top
admin_style = {‘fillColor’: ‘#00FFFF’, ‘color’: ‘black’,’weight’: 3, ‘opacity’: 100.0 }
admin_geojson_data = admin[[‘geometry’]].to_json()
folium.GeoJson(
admin_geojson_data,
style_function=lambda feature: admin_style
).add_to(m)
# show the map
m
The output of this code block:
A random sample of satellite tiles overlapping or intersecting with the admin area of Budapest.
As one may look at this visual, it becomes clear that a few sections of tiles keep getting repeated. It is also apparent that, for some reason, a few of these tiles split the city’s admin boundaries in two. This can lead to an unforgeable situation where you would like to analyze data entirely covering your desired target area, just to realize it got split in half. One possible workaround is to filter out the tiles that do not fully overlap with the desired admin area:
def compute_overlapping_area(tile, admin):
return tile.intersection(admin_polygon).area / admin_polygon.area
products_gdf[‘overlapping_area_fraction’] = products_gdf.geometry.apply(lambda x: compute_overlapping_area(x, admin_polygon))
products_gdf_f = products_gdf[products_gdf.overlapping_area_fraction==1]
print(len(products_gdf))
print(len(products_gdf_f))
products_gdf_f.head(3)
The result of this cell:
Preview of the filtered Satellite image product dataset
By applying this filter, I dropped about half of the tiles. Let’s take a look at the map now and see how well they all overlap with the city’s boundaries without any tiles splitting the city in half:
import folium
import geopandas as gpd
x, y = admin_polygon.centroid.xy
m = folium.Map(location=[y[0], x[0]], zoom_start=8, tiles=’CartoDB Dark_Matter’)
# visualize a set of random tiles
polygon_style = { ‘fillColor’: ‘#39FF14’, ‘color’: ‘black’, ‘weight’: 3, ‘opacity’: 0}
geojson_data = products_gdf_f[[‘geometry’]].sample(10).to_json()
folium.GeoJson(
geojson_data,
style_function=lambda feature: polygon_style
).add_to(m)
# add the admin boundaries on top
admin_style = {‘fillColor’: ‘#00FFFF’, ‘color’: ‘black’,’weight’: 3, ‘opacity’: 100.0 }
admin_geojson_data = admin[[‘geometry’]].to_json()
folium.GeoJson(
admin_geojson_data,
style_function=lambda feature: admin_style
).add_to(m)
# show the map
mA random sample of satellite tiles completely overlapping with the admin area of Budapest.
4. The temporal dimension of the data set
First, let’s see the number of images we have per day, week, and month, covering Budapest. To measure time, I will rely on the field beginposition.
# Assuming ‘beginposition’ is a Timestamp column in your GeoDataFrame
# You can convert it to a DateTime index
products_gdf_f_cntr = products_gdf_f.copy()
products_gdf_f_cntr[‘beginposition’] = pd.to_datetime(products_gdf_f_cntr[‘beginposition’])
products_gdf_f_cntr.set_index(‘beginposition’, inplace=True)
# Resample the data to count rows per day, week, and month
daily_counts = products_gdf_f_cntr.resample(‘D’).count()
weekly_counts = products_gdf_f_cntr.resample(‘W’).count()
monthly_counts = products_gdf_f_cntr.resample(‘M’).count()
fig, ax = plt.subplots(1, 3, figsize=(15, 5))
for idx, (count_name, count_val) in enumerate([(‘Daily Counts’, daily_counts), (‘Weekly Counts’, weekly_counts), (‘Monthly Counts’, monthly_counts), ]):
ax[idx].plot(count_val.index[0:250], count_val[‘geometry’].to_list()[0:250])
ax[idx].set_xlabel(‘Date’)
ax[idx].set_ylabel(‘Count’)
ax[idx].set_title(count_name)
plt.tight_layout()
plt.suptitle(‘Number of satellite images taken in various time-frames’, fontsize = 20, y = 1.15)
plt.show()The number of satellite images captured in the Budapest target area per day, week, and month.
These figures show the first 250 days, first 250 weeks, and first 250 months (the entire span) of the Sentinel-2 probe. The first figure shows that every other day, one snapshot is taken. The second graph shows, taking the weekly average of the previous one, shows that during the first two years, the satellites captured Budapest once or twice per week, and then, from 2017 to 2018, it increased to 5–6 images per week. The last graph, showing the entire time span, shows the same trends and how, after 3 years at work, these 25 images per month became the standard level.
5. The temporal dimension of the land-cover variables
Now, a look at the temporal evolution of vegetationpercentage, waterpercentage, snowicepercentage, and *cloudcoverpercentage. As the previous image shows, the early years may show different, probably noise results, so let’s stay cautious. Here, I wouldn’t drop those years of data, as we have eight years altogether, and cutting 3 out of that would potentially be too much-lost information. First, just see the raw values over time with a weekly aggregation:
import pandas as pd
import matplotlib.pyplot as plt
# Assuming ‘beginposition’ is a Timestamp column in your GeoDataFrame
# You can convert it to a DateTime index
products_gdf_f_cntr = products_gdf_f.copy()
products_gdf_f_cntr[‘beginposition’] = pd.to_datetime(products_gdf_f_cntr[‘beginposition’])
products_gdf_f_cntr.set_index(‘beginposition’, inplace=True)
# Resample the data to calculate weekly averages
weekly_averages = products_gdf_f_cntr[[‘vegetationpercentage’, ‘waterpercentage’, ‘snowicepercentage’, ‘cloudcoverpercentage’]].resample(‘W’).mean()
# Create a multi-plot figure with four subplots
fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, 1, figsize=(10, 15))
# Plot ‘vegetationpercentage’ with a green line
ax1.plot(weekly_averages.index, weekly_averages[‘vegetationpercentage’], color=’green’, label=’Weekly Average Vegetation Percentage’)
ax1.set_xlabel(‘Date’)
ax1.set_ylabel(‘Percentage’)
ax1.set_title(‘Weekly Average Vegetation Percentage’)
ax1.legend()
# Plot ‘waterpercentage’ with a blue line
ax2.plot(weekly_averages.index, weekly_averages[‘waterpercentage’], color=’blue’, label=’Weekly Average Water Percentage’)
ax2.set_xlabel(‘Date’)
ax2.set_ylabel(‘Percentage’)
ax2.set_title(‘Weekly Average Water Percentage’)
ax2.legend()
# Plot ‘snowicepercentage’ with a cyan line
ax3.plot(weekly_averages.index, weekly_averages[‘snowicepercentage’], color=’cyan’, label=’Weekly Average Snow/Ice Percentage’)
ax3.set_xlabel(‘Date’)
ax3.set_ylabel(‘Percentage’)
ax3.set_title(‘Weekly Average Snow/Ice Percentage’)
ax3.legend()
# Plot ‘cloudcoverpercentage’ with a gray line
ax4.plot(weekly_averages.index, weekly_averages[‘cloudcoverpercentage’], color=’gray’, label=’Weekly Average Cloud Cover Percentage’)
ax4.set_xlabel(‘Date’)
ax4.set_ylabel(‘Percentage’)
ax4.set_title(‘Weekly Average Cloud Cover Percentage’)
ax4.legend()
plt.tight_layout()
plt.show()
The temporal evolution of vegetation, water, snow, and cloud-cover percentages over time, with weekly aggregation.
And the same with monthly aggregation:
The temporal evolution of vegetation, water, snow, and cloud-cover percentages over time, with monthly aggregation.
These time series are telling us a couple of interesting things:
The vegetation percentage clearly shows how seasonality changes, how each spring everything turns green and then this greenery fades away during fall, dropping from 50–60% to near-zero.Compared to that, the water percentage fluctuates around 0.8% the whole year and the whole observation period. This is because the amount of surface water is pretty tiny in the studied area. Still, drops seem to come more often during wintertime, implying that some of the freshwater entities freeze.Regarding snow, the most prominent peaks — about 4–8% omes during the winters. Still, as I have been around for most of these winters, from personal experience, I can tell we didn’t have much snow. To that end, values measuring just 1–2%, especially not during winter, may cause some noise or even misclassification of clouds.When it comes to clouds, we see that they mostly go hand-in-hand with the vegetation, following seasonal patterns.
Some of these observations are also visible in the correlation of these measures:
products_gdf_f_cntr[[‘vegetationpercentage’, ‘waterpercentage’, ‘snowicepercentage’, ‘cloudcoverpercentage’]].corr()The correlation of the environmental variables over time.
6. Download satellite images
First, do a query for both Sentinel-2 and Sentinel 3, going for the same week in August this year and limiting cloud coverage as much as possible. See the number of snapshots available:
# query tile product ids
products_sent = api.query(admin_polygon, date=(‘20230806’, ‘20230813’), platformname=’Sentinel-2′, cloudcoverpercentage=(0, 1))
products_sent = api.to_geodataframe(products_sent)
f, ax = plt.subplots(1,1,figsize=(6,4))
admin.plot(ax=ax, color = ‘none’, edgecolor = ‘k’)
ax.set_title(‘Sentinel-2, number of tiles = ‘ + str(len(products_sent)))
products_sent.plot(ax=ax, alpha = 0.3)
# filter out the tiles not fully overlapping with Budapest
products_sent[‘overlapping_area_fraction’] = products_sent.geometry.apply(lambda x: compute_overlapping_area(x, admin_polygon))
products_sent = products_sent[products_sent.overlapping_area_fraction==1]
f, ax = plt.subplots(1,1,figsize=(6,4))
admin.plot(ax=ax, color = ‘none’, edgecolor = ‘k’)
ax.set_title(‘Sentinel-2, number of tiles = ‘ + str(len(products_sent)))
products_sent.plot(ax=ax, alpha = 0.3)
len(products_sent)
The result of this code block:
The queried tiles.
Now download the image data based on the
# download the first tiles as sat images
product_ids = products_sent.index.to_list()
for prod in product_ids:
api.download(prod)
Note — here you may get this notifiation, in this case ust wait a few hours and then run the downloader again.
Product a3c61497-d77d-48da-9a4d-394986d2fe1d is not online. Triggering retrieval from long term archive.
7. Open and visualize the downloaded image
Here you will find a detailed description of the data format with really nice visuals on the folder structure. Once opening the image directory, one may find the different bands. The meaning of each band as well as its spatial resolution is well-summarized in this article, as the spatial resolution of the 13 bands range from 10 to 60 meters. A few highlights:
The blue (B2), green (B3), red (B4), and near-infrared or NIR (B8) channels have a 10-meter resolution.Then, Next, its vegetation red edge (B5), near-infrared NIR (B6, B7, and B8A), and short-wave infrared SWIR (B11 and B12) have a 10-meter resolution.Finally, its coastal aerosol (B1) and SWIR cirrus band (B10) have a 60-meter pixel size.
This is how it
# after unzipping the downloaded folder:
import os
image_path = ‘S2B_MSIL1C_20230810T094549_N0509_R079_T34TCT_20230810T124346.SAFE/GRANULE/L1C_T34TCT_A033567_20230810T095651/IMG_DATA’
sorted(os.listdir(image_path))The list of satellite image bands stored in .jp2 format.
This is how an entire tile looks like when visualizing B4, the red band, using rasterio:
import rasterio
from rasterio.plot import show
image_file = ‘T34TCT_20230810T094549_B04.jp2’
with rasterio.open(image_path + ‘/’ + image_file) as src:
image = src.read(1) # Change the band index as needed
plt.figure(figsize=(10, 10))
plt.imshow(image, cmap=’Reds’) # You can change the colormap
plt.title(image_file)
plt.colorbar()
plt.show()
Output:
The red band of the tile containig
Now focus on Budapest and mask the full raster tile by the admin boundaries of the city, the R, G, and B bands separately:
from rasterio import mask
f, ax = plt.subplots(1,3,figsize=(15,5))
for idx, (band_name, band_num, color_map) in enumerate([(‘Blue’, ‘B02’, ‘Blues’), (‘Green’, ‘B03’, ‘Greens’), (‘Red’, ‘B04’, ‘Reds’)]):
raster_path = image_path + ‘/T34TCT_20230810T094549_’ + band_num + ‘.jp2’
with rasterio.open(raster_path) as src:
polygons = admin.copy().to_crs(src.crs)
geom = polygons.geometry.iloc[0]
masked_image, _ = mask.mask(src, [geom], crop=True)
ax[idx].imshow(masked_image[0], cmap=color_map)
ax[idx].set_title(‘Budapest Sentinel 2 – ‘ + band_name + ‘ band’)Three visualized satellite bands of Budapest.
Finally, let’s assemble these to obtain an RGB image of Budapest. First, I assemble the full tile into an RGB image, then read it, conduct a histogram equalization following official instructions, and then arrive at the final plot.
# Get the band locations
band_blue = ‘/T34TCT_20230810T094549_B02.jp2’
band_green = ‘/T34TCT_20230810T094549_B03.jp2’
band_red = ‘/T34TCT_20230810T094549_B04.jp2’
# Read in the bands and create the full RGB tile
b2 = rasterio.open(image_path + ‘/’ + band_blue)
b3 = rasterio.open(image_path + ‘/’ + band_green)
b4 = rasterio.open(image_path + ‘/’ + band_red)
# export the full tile as a tif file
meta = b4.meta
meta.update({“count”: 3})
prefire_rgb_path = ‘budapest_rgb.tif’
with rasterio.open(prefire_rgb_path, ‘w’, **meta) as dest:
dest.write(b2.read(1),1)
dest.write(b3.read(1),2)
dest.write(b4.read(1),3)
# read and show the cropped version
from skimage import exposure
img = rasterio.open(‘budapest_rgb_cropped.tif’)
image = np.array([img.read(3), img.read(2), img.read(1)])
image = image.transpose(1,2,0)
# do the histogram equalization
p2, p98 = np.percentile(image, (2,98))
image = exposure.rescale_intensity(image, in_range=(p2, p98)) / 100000
f, ax = plt.subplots(1,1,figsize=(15,15))
rasterio.plot.show(image.transpose(2,0,1), transform=img.transform, ax = ax)
ax.axis(‘off’)
plt.savefig(‘budapest_rgb_cropped.png’, dpi = 700, bbox_inches = ‘tight’)
Output:
The RGB satellite map of Budapest based on the 10 m resolution Sentinel data.
Concluding remarks
To quickly wrap up, see what happened in this article:
A quick overview of the Sentinel satellite platformsAn example of how to query many image identifiers, including their metadataHow to conduct temporal analytics simply based on the aggregated information of the tiles, stored and directed in the metadataHow to download, store, and visualize single images
All these steps aim to point towards adding satellite image processing and analytics to your daily stack of geospatial data science tools, which could cover a great many use-cases from urban planning to environmental monitoring and agriculture.
Deep Dive into ESA’s Sentinel API was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.