heiplanet_data.preprocess module⚓︎
heiplanet_data.preprocess
⚓︎
Functions:
-
adjust_longitude_360_to_180–Adjust longitude from 0-360 to -180-180.
-
aggregate_data_by_nuts–Aggregate data from a NetCDF file by NUTS regions, data variable names, and time.
-
align_lon_lat_with_popu_data–Align longitude and latitude coordinates with population data of the same resolution.
-
convert_360_to_180–Convert longitude from 0-360 to -180-180.
-
convert_m_to_mm–Convert precipitation from meters to millimeters.
-
convert_m_to_mm_with_attributes–Convert precipitation from meters to millimeters and keep attributes.
-
convert_to_celsius–Convert temperature from Kelvin to Celsius.
-
convert_to_celsius_with_attributes–Convert temperature from Kelvin to Celsius and keep attributes.
-
downsample_resolution–Downsample the resolution of a dataset.
-
preprocess_data_file–Preprocess the dataset based on provided settings.
-
rename_coords–Rename coordinates in the dataset based on a mapping.
-
resample_resolution–Resample the grid of a dataset to a new resolution.
-
shift_time–Shift the time coordinate of a dataset by a specified timedelta.
-
truncate_data_by_time–Truncate data from a specific start date to an end date. Both dates are inclusive.
-
upsample_resolution–Upsample the resolution of a dataset.
Attributes:
-
CRS– -
T– -
warn_positive_resolution–
warn_positive_resolution
module-attribute
⚓︎
adjust_longitude_360_to_180
⚓︎
Adjust longitude from 0-360 to -180-180.
Parameters:
-
dataset(Dataset) –Dataset with longitude in 0-360 range.
-
limited_area(bool, default:False) –Flag indicating if the dataset is a limited area. Default is False.
-
lon_name(str, default:'longitude') –Name of the longitude variable in the dataset. Default is "longitude".
Returns:
-
Dataset–xr.Dataset: Dataset with longitude adjusted to -180-180 range.
aggregate_data_by_nuts
⚓︎
Aggregate data from a NetCDF file by NUTS regions, data variable names, and time. The aggregated data is saved to a NetCDF file with coordinates "NUTS_ID", "time", and data variables include aggregated data variables.
Parameters:
-
netcdf_files(dict[str, tuple[Path, dict | None]]) –Dictionary of NetCDF files. Keys are dataset names and values are tuples of (file path, agg_dict). The agg_dict can contain aggregation options for each data variable. For example, {"t2m": "mean", "tp": "sum"}. If agg_dict is None, default aggregation (i.e. mean) is used. NetCDF files must contain "latitude", "longitude", and "time" coordinates.
-
nuts_file(Path) –Path to the NUTS regions shape file. The shape file has columns such as "NUTS_ID" and "geometry".
-
normalize_time(bool, default:True) –If True, normalize time to the beginning of the day. e.g. 2025-10-01T12:00:00 becomes 2025-10-01T00:00:00. Default is True.
-
output_dir(Path | None, default:None) –Directory to save the aggregated NetCDF file. If None, the output file is saved in the same directory as the NUTS file. Default is None.
Returns:
-
Path(Path) –Path to the aggregated NetCDF file.
align_lon_lat_with_popu_data
⚓︎
align_lon_lat_with_popu_data(dataset, expected_longitude_max=float64(179.75), lat_name='latitude', lon_name='longitude')
Align longitude and latitude coordinates with population data of the same resolution. This function is specifically designed to ensure that the longitude and latitude coordinates in the dataset match the expected values used in population data, which are: - Longitude: -179.75 to 179.75, 720 points - Latitude: 89.75 to -89.75, 360 points
Parameters:
-
dataset(Dataset) –Dataset with longitude and latitude coordinates.
-
expected_longitude_max(float64, default:float64(179.75)) –Expected maximum longitude after adjustment. Default is np.float64(179.75).
-
lat_name(str, default:'latitude') –Name of the latitude coordinate. Default is "latitude".
-
lon_name(str, default:'longitude') –Name of the longitude coordinate. Default is "longitude".
Returns:
-
Dataset–xr.Dataset: Dataset with adjusted longitude and latitude coordinates.
convert_360_to_180
⚓︎
convert_m_to_mm
⚓︎
convert_m_to_mm_with_attributes
⚓︎
Convert precipitation from meters to millimeters and keep attributes.
Parameters:
-
dataset(Dataset) –Dataset containing precipitation in meters.
-
inplace(bool, default:False) –If True, modify the original dataset. If False, return a new dataset. Default is False.
-
var_name(str, default:'tp') –Name of the precipitation variable in the dataset. Default is "tp".
Returns:
-
Dataset–xr.Dataset: Dataset with precipitation converted to millimeters.
convert_to_celsius
⚓︎
convert_to_celsius_with_attributes
⚓︎
Convert temperature from Kelvin to Celsius and keep attributes.
Parameters:
-
dataset(Dataset) –Dataset containing temperature in Kelvin.
-
inplace(bool, default:False) –If True, modify the original dataset. If False, return a new dataset. Default is False.
-
var_name(str, default:'t2m') –Name of the temperature variable in the dataset. Default is "t2m".
Returns:
-
Dataset–xr.Dataset: Dataset with temperature converted to Celsius.
downsample_resolution
⚓︎
downsample_resolution(dataset, new_resolution=0.5, lat_name='latitude', lon_name='longitude', agg_funcs=None, agg_map=None)
Downsample the resolution of a dataset.
Parameters:
-
dataset(Dataset) –Dataset to change resolution.
-
new_resolution(float, default:0.5) –New resolution in degrees. Default is 0.5.
-
lat_name(str, default:'latitude') –Name of the latitude coordinate. Default is "latitude".
-
lon_name(str, default:'longitude') –Name of the longitude coordinate. Default is "longitude".
-
agg_funcs(Dict[str, str] | None, default:None) –Aggregation functions for each variable. If None, default aggregation (i.e. mean) is used. Default is None.
-
agg_map(Dict[str, Callable[[Any], float]] | None, default:None) –Mapping of string to aggregation functions. If None, default mapping is used. Default is None.
Returns:
-
Dataset–xr.Dataset: Dataset with changed resolution.
preprocess_data_file
⚓︎
preprocess_data_file(netcdf_file, source='era5', settings='default', new_settings=None, unique_tag=None)
Preprocess the dataset based on provided settings. If the settings path is "default", use the default settings of the source. The settings and preprocessed files are saved in the directory, which is specified by the settings file and unique number.
Parameters:
-
netcdf_file(Path) –Path to the NetCDF file to preprocess.
-
source(Literal['era5', 'isimip'], default:'era5') –Source of the data. Defaults to "era5".
-
settings(Path | str, default:'default') –Path to the settings file or "default" for default settings.
-
new_settings(Dict[str, Any] | None, default:None) –Additional settings to overwrite defaults. Defaults to None.
-
unique_tag(str | None, default:None) –Unique tag to append to the output file name and settings file. Defaults to None.
Returns: Tuple[xr.Dataset, str]: Preprocessed dataset and the name of the preprocessed file.
rename_coords
⚓︎
Rename coordinates in the dataset based on a mapping.
Parameters:
-
dataset(Dataset) –Dataset with coordinates to rename.
-
coords_mapping(dict) –Mapping of old coordinate names to new names.
Returns:
-
Dataset–xr.Dataset: A new dataset with renamed coordinates.
resample_resolution
⚓︎
resample_resolution(dataset, new_resolution=0.5, lat_name='latitude', lon_name='longitude', agg_funcs=None, agg_map=None, expected_longitude_max=float64(179.75), method_map=None)
Resample the grid of a dataset to a new resolution.
Parameters:
-
dataset(Dataset) –Dataset to resample.
-
new_resolution(float, default:0.5) –New resolution in degrees. Default is 0.5.
-
lat_name(str, default:'latitude') –Name of the latitude coordinate. Default is "latitude".
-
lon_name(str, default:'longitude') –Name of the longitude coordinate. Default is "longitude".
-
agg_funcs(Dict[str, str] | None, default:None) –Aggregation functions for each variable. If None, default aggregation (i.e. mean) is used. Default is None.
-
agg_map(Dict[str, Callable[[Any], float]] | None, default:None) –Mapping of string to aggregation functions. If None, default mapping is used. Default is None.
-
expected_longitude_max(float64, default:float64(179.75)) –Expected maximum longitude after adjustment. Default is np.float64(179.75).
-
method_map(Dict[str, str] | None, default:None) –Mapping of variable names to interpolation methods. If None, linear interpolation is used. Default is None.
Returns:
-
Dataset–xr.Dataset: Resampled dataset with changed resolution.
shift_time
⚓︎
Shift the time coordinate of a dataset by a specified timedelta. The dataset is overwritten with the shifted time values.
Parameters:
-
dataset(Dataset) –Dataset to shift.
-
offset(int, default:-1) –Amount to shift the time coordinate. Default is -1.
-
time_unit(Literal['W', 'D', 'h', 'm', 's', 'ms', 'ns'], default:'D') –Time unit for the shift. Default is "D".
-
var_name(str, default:'time') –Name of the time variable in the dataset. Default is "time".
truncate_data_by_time
⚓︎
Truncate data from a specific start date to an end date. Both dates are inclusive.
Parameters:
-
dataset(Dataset) –Dataset to truncate.
-
start_date(Union[str, datetime64]) –Start date for truncation. Format as "YYYY-MM-DD" or as a numpy datetime64 object.
-
end_date(Union[str, datetime64, None], default:None) –End date for truncation. Format as "YYYY-MM-DD" or as a numpy datetime64 object. If None, truncate until the last date in the dataset. Default is None.
-
var_name(str, default:'time') –Name of the time variable in the dataset. Default is "time".
Returns:
-
Dataset–xr.Dataset: Dataset truncated from the specified start date.
upsample_resolution
⚓︎
upsample_resolution(dataset, new_resolution=0.1, lat_name='latitude', lon_name='longitude', method_map=None)
Upsample the resolution of a dataset.
Parameters:
-
dataset(Dataset) –Dataset to change resolution.
-
new_resolution(float, default:0.1) –New resolution in degrees. Default is 0.1.
-
lat_name(str, default:'latitude') –Name of the latitude coordinate. Default is "latitude".
-
lon_name(str, default:'longitude') –Name of the longitude coordinate. Default is "longitude".
-
method_map(Dict[str, str] | None, default:None) –Mapping of variable names to interpolation methods. If None, linear interpolation is used. Default is None.
Returns:
-
Dataset–xr.Dataset: Dataset with changed resolution.