heiplanet_db.production module⚓︎
heiplanet_db.production
⚓︎
Functions:
-
check_paths–Check that the paths are not None.
-
create_directories–Create directories if they do not exist.
-
get_data_files– -
get_engine–Get or initialize the database engine.
-
get_production_data–Fetch data that is fed into the production database.
-
get_var_types_from_config–Get the variable types from the configuration file and
-
insert_data– -
insert_var_values– -
insert_var_values_nuts– -
load_data_with_optimization–Load data into the database with autovacuum disabled for performance.
-
main–Main function to set up the production database and data lake.
-
read_production_config–Read configuration of the production database.
-
reset_production_data–Truncate production data tables while keeping the existing schema.
Attributes:
parser
module-attribute
⚓︎
create_directories
⚓︎
Create directories if they do not exist.
Parameters:
-
dir(str) –String of the directory to create/use.
get_engine
⚓︎
Get or initialize the database engine.
Parameters:
-
drop_tables(bool, default:False) –If True, drop all existing tables and recreate them. If False, only create tables if they don't exist. Defaults to False.
Returns:
-
Engine–engine.Engine: SQLAlchemy engine object.
get_production_data
⚓︎
Fetch data that is fed into the production database.
url (str): URL to fetch the data from. filename (str): Name of the file to be fetched. filehash (str): SHA256SUM hash of the file to verify integrity. outputdir (Path): Directory where the file will be saved. Returns: completion_code (int): Status code indicating the success or failure of the operation.
get_var_types_from_config
⚓︎
Get the variable types from the configuration file and place them in a dictionary.
load_data_with_optimization
⚓︎
Load data into the database with autovacuum disabled for performance. Ensures autovacuum is re-enabled even if an error occurs.
main
⚓︎
Main function to set up the production database and data lake. This function reads the production configuration, creates the necessary directories, and fetches the data from the configured sources. It is intended to be run as a script.
Parameters:
-
drop_tables(bool, default:False) –If True, drop all existing tables before inserting data. If False, keep the existing schema but truncate and reload all production data tables so the load is idempotent. Defaults to False. Set to True only when initializing a fresh database and you also want tables to be recreated.
-
config_path(str | None, default:None) –Path to the production configuration file. If None, uses the CONFIG_FILE environment variable or the default container path
/heiplanet_db/production.yamlif available; otherwise falls back to the built-in default config. Defaults to None.
read_production_config
⚓︎
Read configuration of the production database.
Parameters:
-
dict_path(str | Traversable, default:None) –Path to the configuration dictionary. Defaults to None, which uses the default path.
Returns: dict: Dict with configuration details for the production database.
reset_production_data
⚓︎
Truncate production data tables while keeping the existing schema.
This makes reruns of the production load idempotent when tables are kept (i.e. when drop_tables=False), avoiding uniqueness constraint violations from bulk inserts into already-populated tables.