Skip to content

onehealth_db.preprocess module⚓︎

onehealth_db.preprocess ⚓︎

Functions:

Attributes:

T module-attribute ⚓︎

T = TypeVar('T', bound=Union[float64, DataArray])

warn_positive_resolution module-attribute ⚓︎

warn_positive_resolution = 'New resolution must be a positive number.'

adjust_longitude_360_to_180 ⚓︎

adjust_longitude_360_to_180(dataset, limited_area=False, lon_name='longitude')

Adjust longitude from 0-360 to -180-180.

Parameters:

  • dataset (Dataset) –

    Dataset with longitude in 0-360 range.

  • limited_area (bool, default: False ) –

    Flag indicating if the dataset is a limited area. Default is False.

  • lon_name (str, default: 'longitude' ) –

    Name of the longitude variable in the dataset. Default is "longitude".

Returns:

  • Dataset

    xr.Dataset: Dataset with longitude adjusted to -180-180 range.

align_lon_lat_with_popu_data ⚓︎

align_lon_lat_with_popu_data(dataset, expected_longitude_max=float64(179.75), lat_name='latitude', lon_name='longitude')

Align longitude and latitude coordinates with population data of the same resolution. This function is specifically designed to ensure that the longitude and latitude coordinates in the dataset match the expected values used in population data, which are: - Longitude: -179.75 to 179.75, 720 points - Latitude: 89.75 to -89.75, 360 points

Parameters:

  • dataset (Dataset) –

    Dataset with longitude and latitude coordinates.

  • expected_longitude_max (float64, default: float64(179.75) ) –

    Expected maximum longitude after adjustment. Default is np.float64(179.75).

  • lat_name (str, default: 'latitude' ) –

    Name of the latitude coordinate. Default is "latitude".

  • lon_name (str, default: 'longitude' ) –

    Name of the longitude coordinate. Default is "longitude".

Returns:

  • Dataset

    xr.Dataset: Dataset with adjusted longitude and latitude coordinates.

convert_360_to_180 ⚓︎

convert_360_to_180(longitude)

Convert longitude from 0-360 to -180-180.

Parameters:

  • longitude (T) –

    Longitude in 0-360 range.

Returns:

  • T ( T ) –

    Longitude in -180-180 range.

convert_m_to_mm ⚓︎

convert_m_to_mm(precipitation)

Convert precipitation from meters to millimeters.

Parameters:

  • precipitation (T) –

    Precipitation in meters.

Returns:

  • T ( T ) –

    Precipitation in millimeters.

convert_m_to_mm_with_attributes ⚓︎

convert_m_to_mm_with_attributes(dataset, inplace=False, var_name='tp')

Convert precipitation from meters to millimeters and keep attributes.

Parameters:

  • dataset (Dataset) –

    Dataset containing precipitation in meters.

  • inplace (bool, default: False ) –

    If True, modify the original dataset. If False, return a new dataset. Default is False.

  • var_name (str, default: 'tp' ) –

    Name of the precipitation variable in the dataset. Default is "tp".

Returns:

  • Dataset

    xr.Dataset: Dataset with precipitation converted to millimeters.

convert_to_celsius ⚓︎

convert_to_celsius(temperature_kelvin)

Convert temperature from Kelvin to Celsius.

Parameters:

  • temperature_kelvin (T) –

    Temperature in Kelvin, accessed through t2m variable in the dataset.

Returns:

  • T ( T ) –

    Temperature in Celsius.

convert_to_celsius_with_attributes ⚓︎

convert_to_celsius_with_attributes(dataset, inplace=False, var_name='t2m')

Convert temperature from Kelvin to Celsius and keep attributes.

Parameters:

  • dataset (Dataset) –

    Dataset containing temperature in Kelvin.

  • inplace (bool, default: False ) –

    If True, modify the original dataset. If False, return a new dataset. Default is False.

  • var_name (str, default: 't2m' ) –

    Name of the temperature variable in the dataset. Default is "t2m".

Returns:

  • Dataset

    xr.Dataset: Dataset with temperature converted to Celsius.

downsample_resolution ⚓︎

downsample_resolution(dataset, new_resolution=0.5, lat_name='latitude', lon_name='longitude', agg_funcs=None, agg_map=None)

Downsample the resolution of a dataset.

Parameters:

  • dataset (Dataset) –

    Dataset to change resolution.

  • new_resolution (float, default: 0.5 ) –

    New resolution in degrees. Default is 0.5.

  • lat_name (str, default: 'latitude' ) –

    Name of the latitude coordinate. Default is "latitude".

  • lon_name (str, default: 'longitude' ) –

    Name of the longitude coordinate. Default is "longitude".

  • agg_funcs (Dict[str, str] | None, default: None ) –

    Aggregation functions for each variable. If None, default aggregation (i.e. mean) is used. Default is None.

  • agg_map (Dict[str, Callable[[Any], float]] | None, default: None ) –

    Mapping of string to aggregation functions. If None, default mapping is used. Default is None.

Returns:

  • Dataset

    xr.Dataset: Dataset with changed resolution.

preprocess_data_file ⚓︎

preprocess_data_file(netcdf_file, settings)

Preprocess the dataset based on provided settings. Processed data is saved to the same directory with updated filename, defined by the settings.

Parameters:

  • netcdf_file (Path) –

    Path to the NetCDF file to preprocess.

  • settings (Dict[str, Any]) –

    Settings for preprocessing.

Returns:

  • Dataset

    xr.Dataset: Preprocessed dataset.

rename_coords ⚓︎

rename_coords(dataset, coords_mapping)

Rename coordinates in the dataset based on a mapping.

Parameters:

  • dataset (Dataset) –

    Dataset with coordinates to rename.

  • coords_mapping (dict) –

    Mapping of old coordinate names to new names.

Returns:

  • Dataset

    xr.Dataset: A new dataset with renamed coordinates.

resample_resolution ⚓︎

resample_resolution(dataset, new_resolution=0.5, lat_name='latitude', lon_name='longitude', agg_funcs=None, agg_map=None, expected_longitude_max=float64(179.75), method_map=None)

Resample the grid of a dataset to a new resolution.

Parameters:

  • dataset (Dataset) –

    Dataset to resample.

  • new_resolution (float, default: 0.5 ) –

    New resolution in degrees. Default is 0.5.

  • lat_name (str, default: 'latitude' ) –

    Name of the latitude coordinate. Default is "latitude".

  • lon_name (str, default: 'longitude' ) –

    Name of the longitude coordinate. Default is "longitude".

  • agg_funcs (Dict[str, str] | None, default: None ) –

    Aggregation functions for each variable. If None, default aggregation (i.e. mean) is used. Default is None.

  • agg_map (Dict[str, Callable[[Any], float]] | None, default: None ) –

    Mapping of string to aggregation functions. If None, default mapping is used. Default is None.

  • expected_longitude_max (float64, default: float64(179.75) ) –

    Expected maximum longitude after adjustment. Default is np.float64(179.75).

  • method_map (Dict[str, str] | None, default: None ) –

    Mapping of variable names to interpolation methods. If None, linear interpolation is used. Default is None.

Returns:

  • Dataset

    xr.Dataset: Resampled dataset with changed resolution.

truncate_data_from_time ⚓︎

truncate_data_from_time(dataset, start_date, var_name='time')

Truncate data from a specific start date.

Parameters:

  • dataset (Dataset) –

    Dataset to truncate.

  • start_date (Union[str, datetime64]) –

    Start date for truncation. Format as "YYYY-MM-DD" or as a numpy datetime64 object.

  • var_name (str, default: 'time' ) –

    Name of the time variable in the dataset. Default is "time".

Returns:

  • Dataset

    xr.Dataset: Dataset truncated from the specified start date.

upsample_resolution ⚓︎

upsample_resolution(dataset, new_resolution=0.1, lat_name='latitude', lon_name='longitude', method_map=None)

Upsample the resolution of a dataset.

Parameters:

  • dataset (Dataset) –

    Dataset to change resolution.

  • new_resolution (float, default: 0.1 ) –

    New resolution in degrees. Default is 0.1.

  • lat_name (str, default: 'latitude' ) –

    Name of the latitude coordinate. Default is "latitude".

  • lon_name (str, default: 'longitude' ) –

    Name of the longitude coordinate. Default is "longitude".

  • method_map (Dict[str, str] | None, default: None ) –

    Mapping of variable names to interpolation methods. If None, linear interpolation is used. Default is None.

Returns:

  • Dataset

    xr.Dataset: Dataset with changed resolution.