Data Lake & Database Backend
Data Lake Architecture
The Heiplanet platform implements a medallion architecture data lake with three distinct layers for data processing and refinement:
Bronze Layer (Raw Data)
The bronze layer stores raw, unprocessed data exactly as downloaded from external sources. This includes: - Original data files in their native formats (NetCDF, Shapefiles, etc.) - Complete metadata and provenance information - Data integrity verification through file hashing
Files are stored in .data_heiplanet_db/bronze/ directory.
Silver Layer (Cleaned Data)
The silver layer contains validated and cleaned data that has undergone initial processing: - Data format standardization - Quality checks and validation - Coordinate system transformations where needed - Initial data filtering and cleaning
Files are stored in .data_heiplanet_db/silver/ directory.
Gold Layer (Analysis-Ready Data)
The gold layer provides fully processed, analysis-ready datasets optimized for ingestion into the PostgreSQL/PostGIS database: - Aggregated and summarized data products - NUTS region-level statistics - Optimized data formats for database insertion - Final quality assurance and metadata enrichment
Files are stored in .data_heiplanet_db/gold/ directory.
This multi-tiered approach ensures data lineage, reproducibility, and the ability to reprocess data from any stage if requirements change.
Data Ingestion and Processing
Data flows through the pipeline in the following stages:
- Download: Raw data files are fetched from configured sources (URLs or local files)
- Validation: File integrity is verified using SHA-256 checksums
- Bronze Storage: Original files are preserved unchanged
- Processing: Data is cleaned, transformed, and validated
- Silver Storage: Cleaned data is stored in standardized formats
- Aggregation: Data is aggregated to analysis-ready products
- Gold Storage: Final datasets ready for database insertion
- Database Ingestion: Data is loaded into PostgreSQL/PostGIS tables
Configuration Files
The data pipeline is controlled by YAML configuration files that specify which datasets to download and process. Multiple pre-configured options are available based on deployment size and data requirements:
- Small: Minimal dataset for testing (single month)
- Medium: Seasonal dataset for production (3 months)
- Large: Complete annual dataset for comprehensive analysis (12 months)
- Historical: Extended multi-decade dataset for trend analysis (45 years)
For detailed information on these configurations, see the Deployment Configuration Options.
Data Model
The database schema is designed to efficiently store and query spatiotemporal disease surveillance data. Downloaded and processed data is organized in the database as follows:

Key Database Tables
- NUTS Definitions: European geographic boundaries at multiple administrative levels
- Climate Variables: Temperature and environmental data gridded globally
- R0 Values: Disease transmission suitability metrics
- Time-series data at multiple temporal resolutions
- Spatial data at both grid and regional aggregations
- Metadata: Data provenance, timestamps, and quality indicators
Spatial Data Features
The PostGIS extension enables advanced geospatial queries: - Spatial Indexing: Fast geographic searches and intersections - Coordinate Transformations: Support for multiple coordinate reference systems - Geometry Operations: Area calculations, buffer zones, spatial joins - Raster Support: Efficient storage and querying of gridded data
Running the PostgreSQL Database with Docker
Quick Start
Start the database service using Docker Compose:
docker compose up -d db
The -d flag runs the service in detached mode (background).
Accessing the Database
Connect to the running database:
# Using psql command-line client
docker exec -it <db-container-name> psql -U <username> -d <database-name>
# Or expose the port and connect externally (requires port mapping in docker-compose.yaml)
psql -h localhost -p 5432 -U <username> -d <database-name>
Database Management
# View database logs
docker logs <db-container-name>
# Stop the database
docker compose stop db
# Restart the database
docker compose restart db
# Remove database and volumes (deletes all data)
docker compose down -v
For complete deployment instructions including data loading and API setup, see the Deployment Guide.