Tutorial A: Download Data from Copernicus and ISIMIP¶

heiplanet-data Python package - data download and visualization of the raw data

Authors: Scientific Software Center
Date: October 2025
Version: 1.0

Overview¶

This tutorial demonstrates how to download data files through the Copernicus and ISIMIP APIs and inspect the data using xarray. You will learn how to:

Data Sources and APIs: Using Copernicus Climate Data Store (CDS) and ISIMIP APIs for data download
Xarray: Working with multi-dimensional scientific datasets
Data Visualization: Creating plots to verify data integrity

1. Downloading the data from the resources¶

Let's start by importing the necessary libraries.

In [ ]:

Copied!





# if running on google colab
# flake8-noqa-cell

if "google.colab" in str(get_ipython()):
    # install packages
    %pip install git+https://github.com/ssciwr/heiplanet-data.git -qqq
# if running on google colab
# flake8-noqa-cell

if "google.colab" in str(get_ipython()):
    # install packages
    %pip install git+https://github.com/ssciwr/heiplanet-data.git -qqq

In [ ]:

Copied!





# Import required libraries
from heiplanet_data import inout  # Our custom module for data I/O operations
from pathlib import Path  # For cross-platform file path handling
from matplotlib import pyplot as plt  # For creating plots and visualizations
import xarray as xr  # For working with labeled multi-dimensional arrays
from isimip_client.client import ISIMIPClient  # For downloading ISIMIP data
# Import required libraries
from heiplanet_data import inout  # Our custom module for data I/O operations
from pathlib import Path  # For cross-platform file path handling
from matplotlib import pyplot as plt  # For creating plots and visualizations
import xarray as xr  # For working with labeled multi-dimensional arrays
from isimip_client.client import ISIMIPClient  # For downloading ISIMIP data

We now need to set up a folder structure where to save the downloaded data.

In [ ]:

Copied!





# Set up data directories
# We use pathlib.Path for cross-platform compatibility
data_root = Path("data/")  # Navigate to the data directory from docs/source/notebooks/
data_folder = data_root / "in"  # Raw input data goes in the 'in' subfolder

print(f"Data root directory: {data_root.absolute()}")
print(f"Input data directory: {data_folder.absolute()}")
print(f"Directory exists: {data_folder.exists()}")
# Set up data directories
# We use pathlib.Path for cross-platform compatibility
data_root = Path("data/")  # Navigate to the data directory from docs/source/notebooks/
data_folder = data_root / "in"  # Raw input data goes in the 'in' subfolder

print(f"Data root directory: {data_root.absolute()}")
print(f"Input data directory: {data_folder.absolute()}")
print(f"Directory exists: {data_folder.exists()}")

Download ERA5-Land data¶

ERA5-Land is a reanalysis dataset providing a comprehensive record of land variables from 1950 to present. It's produced by the European Centre for Medium-Range Weather Forecasts (ECMWF).

What is reanalysis data?¶

Reanalysis combines:

Observations: From weather stations, satellites, radiosondes, etc.
Numerical weather models: Physics-based atmospheric models
Data assimilation: Mathematical techniques to optimally combine observations with model forecasts

This creates a spatially and temporally consistent dataset that's invaluable for climate research.

Key ERA5-Land variables we'll download¶

2m temperature (t2m): Air temperature at 2 meters above ground
Total precipitation (tp): Accumulated precipitation

The CDS's ERA5-Land monthly dataset is being used for now. Please set up the CDS API as outlined below and take note of the naming convention used for the downloaded files.

Set up CDS API¶

To use CDS API for downloading data, you need to first create an account on CDS to obtain your personal access token.

Create a .cdsapirc file containing your personal access token by following this instruction.

General API requests¶

Select the target dataset, e.g. ERA5-Land monthly averaged data from 1950 to present
Go to tab Download of the dataset and select the data variables, time range, geographical area, etc. that you want to download
At the end of the page, click on Show API request code and take notes of the following information
- dataset: name of the dataset
- request: a dictionary summarizes your download request
Replace the values of dataset and request in the below cell correspondingly

In [ ]:

Copied!





# replace dataset and request with your own values
dataset = "reanalysis-era5-land-monthly-means"
request = {
    "product_type": ["monthly_averaged_reanalysis"],
    "variable": ["2m_temperature", "total_precipitation"],
    "year": ["2016", "2017"],
    "month": [
        "01",
        "02",
        "03",
        "04",
        "05",
        "06",
        "07",
        "08",
        "09",
        "10",
        "11",
        "12",
    ],
    "time": ["00:00"],
    "data_format": "netcdf",
    "download_format": "unarchived",
}
# replace dataset and request with your own values
dataset = "reanalysis-era5-land-monthly-means"
request = {
    "product_type": ["monthly_averaged_reanalysis"],
    "variable": ["2m_temperature", "total_precipitation"],
    "year": ["2016", "2017"],
    "month": [
        "01",
        "02",
        "03",
        "04",
        "05",
        "06",
        "07",
        "08",
        "09",
        "10",
        "11",
        "12",
    ],
    "time": ["00:00"],
    "data_format": "netcdf",
    "download_format": "unarchived",
}

Understanding the request parameters¶

dataset: The specific ERA5-Land dataset identifier from CDS
product_type: Type of product (monthly averaged reanalysis)
variable: The climate variables we want (temperature and precipitation)
year and month: Time range for our data (2016-2017, all months)
time: Specific time of day (00:00 for monthly averages)
data_format: Output format (NetCDF is standard for scientific data)
download_format: Whether to compress files (unarchived = no compression)

In [ ]:

Copied!





# Generate a descriptive filename for our downloaded data
data_format = request.get("data_format")

# The inout.get_filename() function creates standardized filenames
# that include key metadata about the dataset
era5_fname = inout.get_filename(
    ds_name=dataset,  # Dataset identifier
    data_format=data_format,  # File format (netcdf)
    years=request["year"],  # Years included (2016-2017)
    months=request["month"],  # Months included (all 12)
    has_area=bool("area" in request),  # Whether spatial subsetting was used
    base_name="era5_data",  # Base prefix for the filename
    variables=request["variable"],  # Variables included (t2m, tp)
)

# Create the full file path
era5_fpath = data_folder / era5_fname

print(f"Generated filename: {era5_fname}")
print(f"Full file path: {era5_fpath}")
# Generate a descriptive filename for our downloaded data
data_format = request.get("data_format")

# The inout.get_filename() function creates standardized filenames
# that include key metadata about the dataset
era5_fname = inout.get_filename(
    ds_name=dataset,  # Dataset identifier
    data_format=data_format,  # File format (netcdf)
    years=request["year"],  # Years included (2016-2017)
    months=request["month"],  # Months included (all 12)
    has_area=bool("area" in request),  # Whether spatial subsetting was used
    base_name="era5_data",  # Base prefix for the filename
    variables=request["variable"],  # Variables included (t2m, tp)
)

# Create the full file path
era5_fpath = data_folder / era5_fname

print(f"Generated filename: {era5_fname}")
print(f"Full file path: {era5_fpath}")

In [ ]:

Copied!





# Download the ERA5 data (this may take several minutes)
# We first check if the file already exists to avoid unnecessary downloads
if not era5_fpath.exists():
    print("Downloading ERA5 data from Copernicus Climate Data Store...")
    print("This may take several minutes depending on file size and server load...")
    # The inout.download_data() function handles the CDS API authentication and download
    inout.download_data(era5_fpath, dataset, request)
    print(f"Download complete! File saved to: {era5_fpath}")
else:
    print(f"ERA5 data already exists at {era5_fpath}")
    print("Skipping download to save time.")
# Download the ERA5 data (this may take several minutes)
# We first check if the file already exists to avoid unnecessary downloads
if not era5_fpath.exists():
    print("Downloading ERA5 data from Copernicus Climate Data Store...")
    print("This may take several minutes depending on file size and server load...")
    # The inout.download_data() function handles the CDS API authentication and download
    inout.download_data(era5_fpath, dataset, request)
    print(f"Download complete! File saved to: {era5_fpath}")
else:
    print(f"ERA5 data already exists at {era5_fpath}")
    print("Skipping download to save time.")

Special download for total precipitation data from ERA5-Land Hourly dataset¶

Some models may require total precipitation data downloaded from the dataset ERA5-Land hourly data from 1950 to present.

Due to the nature of this dataset, the value at 00:00 is the total precipitation of the previous day (see here).

To get the correct precipitation values from 01.01.2016 to 31.12.2017, we need to download the data from 02.01.2016 to 01.01.2018. The current CDS request API does not support specifying a continuous date range that does not share the same days for each month and the same months for each year.

We implemented a special function for this case.

def download_total_precipitation_from_hourly_era5_land(
    start_date: str,
    end_date: str,
    area: List[float] | None = None,
    out_dir: Path = Path("."),
    base_name: str = "era5_data",
    data_format: str = "netcdf",
    ds_name: str = "reanalysis-era5-land",
    coord_name: str = "valid_time",
    var_name: str = "total_precipitation",
    clean_tmp_files: bool = False,
) -> str:

Input for this function includes:

start_date and end_date in the format of "YYYY-MM-DD"
area indicates the area to download; None means the entire globe.
out_dir: output directory to store the downloaded file
base_name: base string used to name the output file. File name is described in Naming convention - Special case
data_format: can be netcdf or grib
ds_name, coord_name, and var_name represent the dataset name, coordinate name, and data variable name in the dataset. Please only change these values when CDS changes the corresponding names.
clean_tmp_files parameter can be set to False to retain the downloaded temporary files, which store data for smaller sub-ranges derived from the overall date range. For example, the range 2016-01-01 to 2017-12-31 would be split into sub-ranges 2016-01-02 to 2016-12-31, 2017-01-01 to 2017-12-31, and 2018-01-01 to 2018-01-01, because the timestamps are shifted one day forward.

The function handles time shifting, downloads the data, adjusts the time coordinate back to the target range, and returns the output file path.

In [ ]:

Copied!





# download total precipitation data from ERA5-Land Hourly dataset
# from 2016-01-01 to 2017-12-31
start_time = "2016-01-01"
end_time = "2017-12-31"
tp_era5_hourly_file = inout.download_total_precipitation_from_hourly_era5_land(
    start_date=start_time,
    end_date=end_time,
    area=None,
    out_dir=data_folder,
    base_name="era5_data",
    data_format="netcdf",
    ds_name="reanalysis-era5-land",
    coord_name="valid_time",
    var_name="total_precipitation",
    clean_tmp_files=False,  # keep temporary files for checking
)
tp_era5_hourly_file
# download total precipitation data from ERA5-Land Hourly dataset
# from 2016-01-01 to 2017-12-31
start_time = "2016-01-01"
end_time = "2017-12-31"
tp_era5_hourly_file = inout.download_total_precipitation_from_hourly_era5_land(
    start_date=start_time,
    end_date=end_time,
    area=None,
    out_dir=data_folder,
    base_name="era5_data",
    data_format="netcdf",
    ds_name="reanalysis-era5-land",
    coord_name="valid_time",
    var_name="total_precipitation",
    clean_tmp_files=False,  # keep temporary files for checking
)
tp_era5_hourly_file

In [ ]:

Copied!

tp_era5_hourly_ds = xr.open_dataset(tp_era5_hourly_file)
tp_era5_hourly_ds["valid_time"]
tp_era5_hourly_ds = xr.open_dataset(tp_era5_hourly_file)
tp_era5_hourly_ds["valid_time"]

Naming convention¶

The filenames of the downloaded netCDF files follow this structure:

{base_name}_{year_str}_{month_str}_{day_str}_{time_str}_{var_str}_{ds_type}_{area_str}_raw.{ext}

base_name is "era5_data",
For list of numbers, i.e. years/months/days/times, the rule below is applied
- If the values are continuous, the string representation is a concatenate of min and max values, separated by -
- Otherwise, the string is a join of all values, separated by _
- However, if there are more than 5 values, we only keep the first 5 ones and replace the rest by "_etc"
- If the values are empty (e.g. no days or times in the download request), their string representation and the corresponding separator (i.e. "_") are omitted from the file name.
year_str is the string representation of list of years using the rule above.
Similarly for month_str. However, if the download requests all 12 months, month_str would be "allm"
day_str and time_str follows the same pattern, assuming that a month has at most 31 days ("alld") and a day has at most 24 hours ("allt").
- Special case: if data is downloaded at time 00:00 per day only, time_str would be "midnight" (e.g. precipitation data for P-model)
For var_str, each variable has an abbreviation derived by the first letter of each word in the variable name (e.g. tp for total precipitation).
- All abbreviations are then concatenated by _
- If this concatenated string is longer than 30 characters, we only keep the first 2 characters and replace the the rest by "_etc"
As for ds_type:
- If the file was downloaded from a monthly dataset, "monthly" is set to ds_type. This means the data is recorded only on the first day of each month.
- For other datasets, when data is downloaded only at midnight (time_str = "midnight"), the ds_type is "daily", meaning one data record for one day of each month.
- ds_type would be an empty string in other cases, i.e. multiple data records for each day of a month.
For area_str, if the downloaded data is only for an area of the grid (instead of the whole map), "area" would represent for area_str.
If the part before "_raw" is longer than 100 characters, only the first 100 characters are kept and the rest is replaced by "_etc"
"_raw" is added at the end to indicate that the file is raw data
Extension ext of the file can be .nc or .grib
If any of these fields (from year_str to area_str) are missing from the download request, the corresponding string and the preceding _ are removed from the file name.

Special case¶

As for total precipitation data downloaded from dataset ERA5-Land hourly data from 1950 to present, the file name is structured as:

{base_name}_{start_date}-{end_date}_{time_str}_{var_str}_{ds_type}_{area_str}_raw.{ext}

In this case, time_str is "midnight" and ds_type is "daily".

Download ISIMIP data (population data)¶

ISIMIP (Inter-Sectoral Impact Model Intercomparison Project) provides a framework for comparing and improving impact models across different sectors.

About population data¶

Source: Historical population data from 1901-2021
Resolution: 30 arc-minutes (approximately 50km at the equator)
Format: NetCDF with population counts per grid cell
Units: Number of people per grid cell
Use case: Essential for calculating population exposure to climate hazards

Download ISIMIP data manually¶

To download population data manually, please perform the following steps:

go to ISIMIP website
search population from the search bar
choose simulation round ISIMIP3a
click Input Data -> Direct human forcing -> Population data -> histsoc
choose population_histsoc_30arcmin_annual
download file population_histsoc_30arcmin_annual_1901_2021.nc

Download ISIMIP data using ISIMIP's API¶

In [ ]:

Copied!

# initialize ISIMIP client
client = ISIMIPClient()
# initialize ISIMIP client
client = ISIMIPClient()

In [ ]:

Copied!





# search for population data
response = client.datasets(
    path="ISIMIP3a/InputData/socioeconomic/pop/histsoc/population"
)  # this path is similar to the one in ISIMIP's website

for dataset in response["results"]:
    print("Dataset found: {}".format(dataset["path"]))

# download population data file, 1901_2021
for dataset in response["results"]:
    for file in dataset["files"]:
        if "1901_2021" in file["name"]:
            isimip_fpath = data_folder / file["name"]
            if isimip_fpath.exists():
                print(f"Population data file already exists: {file['name']}")
            else:
                print(f"Downloading population data file: {file['name']}")
                client.download(file["file_url"], path=data_folder)
                print("Download complete!")
            break  # exit after first match
# search for population data
response = client.datasets(
    path="ISIMIP3a/InputData/socioeconomic/pop/histsoc/population"
)  # this path is similar to the one in ISIMIP's website

for dataset in response["results"]:
    print("Dataset found: {}".format(dataset["path"]))

# download population data file, 1901_2021
for dataset in response["results"]:
    for file in dataset["files"]:
        if "1901_2021" in file["name"]:
            isimip_fpath = data_folder / file["name"]
            if isimip_fpath.exists():
                print(f"Population data file already exists: {file['name']}")
            else:
                print(f"Downloading population data file: {file['name']}")
                client.download(file["file_url"], path=data_folder)
                print("Download complete!")
            break  # exit after first match

2. Inspect the data using `xarray`¶

Xarray is a powerful Python library for working with labeled multi-dimensional arrays. It's particularly well-suited for scientific datasets like climate data because it:

Labels dimensions: Instead of working with raw indices, you can use meaningful names like 'time', 'latitude', 'longitude'
Handles metadata: Stores attributes, coordinate information, and units alongside your data
Integrates with pandas: Provides similar functionality for N-dimensional data as pandas does for 2D data
Works with NetCDF: Native support for the NetCDF format commonly used in climate science
Enables easy operations: Broadcasting, grouping, resampling, and mathematical operations across dimensions

Key Xarray concepts¶

Dataset: A dictionary-like container of data variables with shared coordinates
DataArray: A labeled N-dimensional array (similar to a pandas Series but for N dimensions)
Coordinates: Arrays that provide labels for each dimension
Attributes: Metadata stored as key-value pairs

In [ ]:

Copied!

# load netCDF files
ds_era5 = xr.open_dataset(era5_fpath)
ds_isimip = xr.open_dataset(isimip_fpath)
# load netCDF files
ds_era5 = xr.open_dataset(era5_fpath)
ds_isimip = xr.open_dataset(isimip_fpath)

Explore the xarray dataset structure¶

Let's examine our datasets to understand their structure. Xarray provides excellent methods for inspecting data.

In [ ]:

Copied!





# Examine the ERA5 dataset structure
print("=== ERA5 DATASET OVERVIEW ===")
print(ds_era5)
print("\n" + "=" * 50)
# Examine the ERA5 dataset structure
print("=== ERA5 DATASET OVERVIEW ===")
print(ds_era5)
print("\n" + "=" * 50)

In [ ]:

Copied!





# Explore specific data variables in the ERA5 dataset
print("=== ERA5 DATA VARIABLES ===")
for var_name, var in ds_era5.data_vars.items():
    print(f"\n📊 Variable: {var_name}")
    print(f"   Shape: {var.shape}")
    print(f"   Dimensions: {var.dims}")
    print(f"   Data type: {var.dtype}")
    if hasattr(var, "long_name"):
        print(f"   Description: {var.long_name}")
    if hasattr(var, "units"):
        print(f"   Units: {var.units}")
# Explore specific data variables in the ERA5 dataset
print("=== ERA5 DATA VARIABLES ===")
for var_name, var in ds_era5.data_vars.items():
    print(f"\n📊 Variable: {var_name}")
    print(f"   Shape: {var.shape}")
    print(f"   Dimensions: {var.dims}")
    print(f"   Data type: {var.dtype}")
    if hasattr(var, "long_name"):
        print(f"   Description: {var.long_name}")
    if hasattr(var, "units"):
        print(f"   Units: {var.units}")

In [ ]:

Copied!





# Examine coordinates in the ERA5 dataset
print("=== ERA5 COORDINATES ===")
for coord_name, coord in ds_era5.coords.items():
    print(f"\n🌐 Coordinate: {coord_name}")
    print(f"   Shape: {coord.shape}")
    print(f"   Range: {coord.min().values} to {coord.max().values}")
    if coord_name == "valid_time":
        print(f"   First date: {coord.values[0]}")
        print(f"   Last date: {coord.values[-1]}")
        print(f"   Total time steps: {len(coord)}")
    elif coord_name in ["latitude", "longitude"]:
        print(f"   Resolution: ~{abs(coord[1].values - coord[0].values):.3f} degrees")
# Examine coordinates in the ERA5 dataset
print("=== ERA5 COORDINATES ===")
for coord_name, coord in ds_era5.coords.items():
    print(f"\n🌐 Coordinate: {coord_name}")
    print(f"   Shape: {coord.shape}")
    print(f"   Range: {coord.min().values} to {coord.max().values}")
    if coord_name == "valid_time":
        print(f"   First date: {coord.values[0]}")
        print(f"   Last date: {coord.values[-1]}")
        print(f"   Total time steps: {len(coord)}")
    elif coord_name in ["latitude", "longitude"]:
        print(f"   Resolution: ~{abs(coord[1].values - coord[0].values):.3f} degrees")

Understanding xarray data selection¶

Xarray provides powerful methods for selecting and indexing data. Let's explore some common operations:

In [ ]:

Copied!





# Demonstrate xarray data selection methods
print("=== XARRAY DATA SELECTION EXAMPLES ===")

# 1. Select data by coordinate values (not indices!)
print("\n1. Select temperature data for January 2016:")
jan_2016_temp = ds_era5.t2m.sel(valid_time="2016-01")
print(f"   Shape: {jan_2016_temp.shape}")
print(f"   Selected time: {jan_2016_temp.valid_time.values}")

# 2. Select a specific geographic location
print("\n2. Select data for a specific location (52.5°N, 13.4°E):")
location_data = ds_era5.sel(latitude=52.5, longitude=13.4, method="nearest")
print(f"   Shape: {location_data.t2m.shape}")
print(
    f"   Actual coordinates: lat={location_data.latitude.values:.2f}, lon={location_data.longitude.values:.2f}"
)

# 3. Select a geographic region
print("\n3. Select data for an area:")
area_data = ds_era5.sel(latitude=slice(70, 35), longitude=slice(0, 40))
print(f"   Shape: {area_data.t2m.shape}")
print(
    f"   Lat range: {area_data.latitude.min().values:.1f} to {area_data.latitude.max().values:.1f}"
)
print(
    f"   Lon range: {area_data.longitude.min().values:.1f} to {area_data.longitude.max().values:.1f}"
)
# Demonstrate xarray data selection methods
print("=== XARRAY DATA SELECTION EXAMPLES ===")

# 1. Select data by coordinate values (not indices!)
print("\n1. Select temperature data for January 2016:")
jan_2016_temp = ds_era5.t2m.sel(valid_time="2016-01")
print(f"   Shape: {jan_2016_temp.shape}")
print(f"   Selected time: {jan_2016_temp.valid_time.values}")

# 2. Select a specific geographic location
print("\n2. Select data for a specific location (52.5°N, 13.4°E):")
location_data = ds_era5.sel(latitude=52.5, longitude=13.4, method="nearest")
print(f"   Shape: {location_data.t2m.shape}")
print(
    f"   Actual coordinates: lat={location_data.latitude.values:.2f}, lon={location_data.longitude.values:.2f}"
)

# 3. Select a geographic region
print("\n3. Select data for an area:")
area_data = ds_era5.sel(latitude=slice(70, 35), longitude=slice(0, 40))
print(f"   Shape: {area_data.t2m.shape}")
print(
    f"   Lat range: {area_data.latitude.min().values:.1f} to {area_data.latitude.max().values:.1f}"
)
print(
    f"   Lon range: {area_data.longitude.min().values:.1f} to {area_data.longitude.max().values:.1f}"
)

In [ ]:

Copied!





# Demonstrate common xarray operations
print("=== COMMON XARRAY OPERATIONS ===")

# 1. Statistical operations across dimensions
print("\n1. Calculate statistics across time dimension:")
temp_mean = ds_era5.t2m.mean(dim="valid_time")
temp_std = ds_era5.t2m.std(dim="valid_time")
print(f"   Mean temperature shape: {temp_mean.shape} (time dimension removed)")
print(f"   Global mean temperature: {temp_mean.mean().values:.2f} K")
print(f"   Standard deviation shape: {temp_std.shape} (time dimension removed)")
print(f"   Global standard deviation: {temp_std.mean().values:.2f} K")

# 2. GroupBy operations (seasonal means)
print("\n2. Calculate seasonal means using groupby:")
seasonal_temp = ds_era5.t2m.groupby("valid_time.season").mean()
print(f"   Seasonal data shape: {seasonal_temp.shape}")
print(f"   Seasons available: {seasonal_temp.season.values}")

# 3. Mathematical operations
print("\n3. Convert temperature from Kelvin to Celsius:")
temp_celsius = ds_era5.t2m - 273.15  # Simple arithmetic on the entire array
print(
    f"   Original range: {ds_era5.t2m.min().values:.1f} to {ds_era5.t2m.max().values:.1f} K"
)
print(
    f"   Converted range: {temp_celsius.min().values:.1f} to {temp_celsius.max().values:.1f} °C"
)
# Demonstrate common xarray operations
print("=== COMMON XARRAY OPERATIONS ===")

# 1. Statistical operations across dimensions
print("\n1. Calculate statistics across time dimension:")
temp_mean = ds_era5.t2m.mean(dim="valid_time")
temp_std = ds_era5.t2m.std(dim="valid_time")
print(f"   Mean temperature shape: {temp_mean.shape} (time dimension removed)")
print(f"   Global mean temperature: {temp_mean.mean().values:.2f} K")
print(f"   Standard deviation shape: {temp_std.shape} (time dimension removed)")
print(f"   Global standard deviation: {temp_std.mean().values:.2f} K")

# 2. GroupBy operations (seasonal means)
print("\n2. Calculate seasonal means using groupby:")
seasonal_temp = ds_era5.t2m.groupby("valid_time.season").mean()
print(f"   Seasonal data shape: {seasonal_temp.shape}")
print(f"   Seasons available: {seasonal_temp.season.values}")

# 3. Mathematical operations
print("\n3. Convert temperature from Kelvin to Celsius:")
temp_celsius = ds_era5.t2m - 273.15  # Simple arithmetic on the entire array
print(
    f"   Original range: {ds_era5.t2m.min().values:.1f} to {ds_era5.t2m.max().values:.1f} K"
)
print(
    f"   Converted range: {temp_celsius.min().values:.1f} to {temp_celsius.max().values:.1f} °C"
)

Plot the datasets¶

In [ ]:

Copied!





# plot the cartesian grid data of t2m and tp for 2016-2017, all months
ds_era5.t2m.plot.pcolormesh(
    col="valid_time", col_wrap=4, cmap="coolwarm", robust=True, figsize=(15, 10)
)
plt.savefig("era5_2016_2017_plots.png", dpi=300)
plt.show()
# plot the cartesian grid data of t2m and tp for 2016-2017, all months
ds_era5.t2m.plot.pcolormesh(
    col="valid_time", col_wrap=4, cmap="coolwarm", robust=True, figsize=(15, 10)
)
plt.savefig("era5_2016_2017_plots.png", dpi=300)
plt.show()

In [ ]:

Copied!





# Plot precipitation data
ds_era5.tp.plot.pcolormesh(
    col="valid_time",
    col_wrap=4,
    cmap="Blues",  # Use a sequential colormap for precipitation
    robust=True,
    figsize=(15, 10),
    add_colorbar=True,
)

plt.suptitle(
    "ERA5 Total Precipitation (mm) - 2016-2017 Monthly Data", fontsize=16, y=1.02
)

plt.savefig("era5_2016_2017_plots_tp.png", dpi=300, bbox_inches="tight")
plt.show()

print("✓ Precipitation plot saved as 'era5_2016_2017_plots_tp.png'")
# Plot precipitation data
ds_era5.tp.plot.pcolormesh(
    col="valid_time",
    col_wrap=4,
    cmap="Blues",  # Use a sequential colormap for precipitation
    robust=True,
    figsize=(15, 10),
    add_colorbar=True,
)

plt.suptitle(
    "ERA5 Total Precipitation (mm) - 2016-2017 Monthly Data", fontsize=16, y=1.02
)

plt.savefig("era5_2016_2017_plots_tp.png", dpi=300, bbox_inches="tight")
plt.show()

print("✓ Precipitation plot saved as 'era5_2016_2017_plots_tp.png'")

In [ ]:

Copied!





# Plot population data for specific years
print("Creating population plots...")

# Select population data for 2016 and 2017
pop_2016_2017 = ds_isimip.sel(time=slice("2016", "2017"))

# Create population plots
pop_2016_2017["total-population"].plot.pcolormesh(
    col="time",
    col_wrap=2,
    cmap="viridis",  # Use viridis colormap for population
    robust=True,
    figsize=(12, 5),
    add_colorbar=True,
)

plt.suptitle("Total population", fontsize=16, y=1.02)
plt.savefig("population_2016_2017_plots.png", dpi=300, bbox_inches="tight")
plt.show()

print("✓ Population plot saved as 'population_2016_2017_plots.png'")
# Plot population data for specific years
print("Creating population plots...")

# Select population data for 2016 and 2017
pop_2016_2017 = ds_isimip.sel(time=slice("2016", "2017"))

# Create population plots
pop_2016_2017["total-population"].plot.pcolormesh(
    col="time",
    col_wrap=2,
    cmap="viridis",  # Use viridis colormap for population
    robust=True,
    figsize=(12, 5),
    add_colorbar=True,
)

plt.suptitle("Total population", fontsize=16, y=1.02)
plt.savefig("population_2016_2017_plots.png", dpi=300, bbox_inches="tight")
plt.show()

print("✓ Population plot saved as 'population_2016_2017_plots.png'")

Tutorial A: Download Data from Copernicus and ISIMIP¶

Overview¶

1. Downloading the data from the resources¶

Download ERA5-Land data¶

What is reanalysis data?¶

Key ERA5-Land variables we'll download¶

Set up CDS API¶

General API requests¶

Understanding the request parameters¶

Special download for total precipitation data from ERA5-Land Hourly dataset¶

Naming convention¶

Special case¶

Download ISIMIP data (population data)¶

About population data¶

Download ISIMIP data manually¶

Download ISIMIP data using ISIMIP's API¶

2. Inspect the data using xarray¶

Key Xarray concepts¶

Explore the xarray dataset structure¶

Understanding xarray data selection¶

Plot the datasets¶

2. Inspect the data using `xarray`¶