Hyperparameter Optimization with Optuna
The QuantumGrav
Python package lets users customize hyperparameters when building, training, validating, and testing GNN models. Choosing the right values is crucial for model performance.
To accelerate this process, we developed QGTune
, a subpackage that uses Optuna to automatically find optimal hyperparameters for specific objectives (e.g. minimizing loss or maximizing accuracy).
Define Optuna search space
To use Optuna, we first need to define the hyperparameter search space with methods from optuna.trial.Trial, including:
suggest_categorical()
: suggest a value for the categorical parametersuggest_float()
: suggest a value for the floating point parametersuggest_int()
: suggest a value for the integer parameter
To define search space in QuantumGrav
, users need three setting files:
- Base setting file: contains all configurations for using the
QuantumGrav
Python package (see the configurationdict
). The hyperparameter values in this file will serve as defaults when users want to enable only a subset of the search space (see details in Build Optuna search space). - Search space file: specifies the hyperparameters to optimize and their ranges
- Dependency mapping file: defines dependencies between hyperparameters. A common case is in GNN layers, where the input dimension of one layer must match the output dimension of the previous layer.
Base setting vs. Search space
The search space file follows the same structure as the base setting file but replaces hyperparameter values with their ranges. For example:
model:
name: "QuantumGravBase"
gcn_net:
- in_dim: 12
out_dim: 128
dropout: 0.3
gnn_layer_type: "sage"
normalizer: "batch_norm"
activation: "relu"
model:
name: "QuantumGravSearchSpace"
gcn_net:
- in_dim: 12 # number of node features
out_dim: [128, 256]
dropout:
type: tuple # to distinguish from categorical
value: [0.2, 0.5, 0.1] # range for dropout, min, max, step
gnn_layer_type: ["sage", "gcn", "gat", "gco"]
normalizer: ["batch_norm", "identity", "layer_norm"]
activation: ["relu", "leaky_relu", "sigmoid", "tanh", "identity"]
- Categorical parameters are defined by assigning the parameter name a list of possible values (
bool
,string
,float
, orint
). In the example above, this applies toout_dim
,gnn_layer_type
,normalizer
, andactivation
, - Floating point and integer parameters are specified as a list of three items:
- [
float
,float
,float
orbool
] for floats - [
int
,int
,int
] for integers - To avoid confusion with categorical lists, the hyperparameter structure includes two sub-fields:
type
: set to"tuple"
value
: hold the 3-item tuple
- Another example for floats:
- [
Full example of the search space YAML file built based on the base setting from the configuration dict
:
model:
name: "QuantumGravSearchSpace"
gcn_net:
- in_dim: 12 # number of node features
out_dim: [128, 256]
dropout:
type: tuple # to distinguish from categorical
value: [0.2, 0.5, 0.1] # range for dropout, min, max, step
gnn_layer_type: ["sage", "gcn", "gat", "gco"]
normalizer: ["batch_norm", "identity", "layer_norm"]
activation: ["relu", "leaky_relu", "sigmoid", "tanh", "identity"]
norm_args:
- 128 # should match out_dim, manually set later
gnn_layer_kwargs:
normalize: False
bias: True
project: False
root_weight: False
aggr: "mean"
- in_dim: 128 # should match previous layer's out_dim, manually set later
out_dim: [256, 512]
dropout:
type: tuple
value: [0.2, 0.5, 0.1]
gnn_layer_type: ["sage", "gcn", "gat", "gco"]
normalizer: ["batch_norm", "identity", "layer_norm"]
activation: ["relu", "leaky_relu", "sigmoid", "tanh", "identity"]
norm_args:
- 256 # should match out_dim, manually set later
gnn_layer_kwargs:
normalize: False
bias: True
project: False
root_weight: False
aggr: "mean"
- in_dim: 256 # should match previous layer's out_dim, manually set later
out_dim: [128, 256]
dropout:
type: tuple
value: [0.2, 0.5, 0.1]
gnn_layer_type: ["sage", "gcn", "gat", "gco"]
normalizer: ["batch_norm", "identity", "layer_norm"]
activation: ["relu", "leaky_relu", "sigmoid", "tanh", "identity"]
norm_args:
- 128 # should match out_dim, manually set later
gnn_layer_kwargs:
normalize: False
bias: True
project: False
root_weight: False
aggr: "mean"
pooling_layer: ["mean", "max", "sum"]
classifier:
input_dim: 128 # should match last gcn_net layer's out_dim, manually set later
output_dims:
- 2 # number of classes in classification task
hidden_dims:
- 48
- 18
activation: ["relu", "leaky_relu", "sigmoid", "tanh", "identity"]
backbone_kwargs: [{}, {}]
output_kwargs: [{}]
activation_kwargs: [{ "inplace": False }]
training:
seed: 42
# training loop
device: "cuda"
early_stopping_patience: 5
early_stopping_window: 7
early_stopping_tol: 0.001
early_stopping_metric: "f1_weighted"
checkpoint_at: 2
checkpoint_path: /path/to/where/the/intermediate/models/should/go
# optimizer
learning_rate:
type: tuple
value: [1e-5, 1e-1, true]
weight_decay:
type: tuple
value: [1e-6, 1e-2, true]
# training loader
batch_size: [32, 64]
num_workers: 12
pin_memory: False
drop_last: True
num_epochs: [50, 100, 200]
split: 0.8
validation: &valtest
batch_size: 32
num_workers: 12
pin_memory: False
drop_last: True
shuffle: True
persistent_workers: True
split: 0.1
testing: *valtest
Dependency mapping
Given the following hyperparameters in the search space (unrelated lines are substitude by ...
for simplicity):
model:
name: "QuantumGravSearchSpace"
gcn_net:
- in_dim: 12 # number of node features
out_dim: [128, 256]
...
norm_args:
- 128 # should match out_dim, manually set later
gnn_layer_kwargs:
...
- in_dim: 128 # should match previous layer's out_dim, manually set later
out_dim: [256, 512]
dropout:
...
In this example, the first argument of norm_args
must match out_dim
, and the in_dim
of the second layer must match the out_dim
of the first layer. YAML anchors (&
) and aliases (*
) would not help here as they reference static values, while hyperparameters are assigned dynamically by Optuna at runtime.
To handle such cases, we introduce another YAML file with the same structure as the search space file:
model:
gcn_net:
# layer 0
- norm_args:
- "model.gcn_net[0].out_dim"
# layer 1
- in_dim: "model.gcn_net[0].out_dim"
norm_args:
...
Here, the first value of norm_args
in the first layer is set to "model.gcn_net[0].out_dim"
, which points to the out_dim
of the first element in model
-> gcn_net
. The same approach is used for the in_dim
of the second layer.
To use this mapping, users must understand the search space file's structure and ensure the dependency mapping file follows it exactly.
Full example of dependency mapping file for the above search space file:
model:
gcn_net:
# layer 0
- norm_args:
- "model.gcn_net[0].out_dim"
# layer 1
- in_dim: "model.gcn_net[0].out_dim"
norm_args:
- "model.gcn_net[1].out_dim"
# layer 2
- in_dim: "model.gcn_net[1].out_dim"
norm_args:
- "model.gcn_net[2].out_dim"
classifier:
input_dim: "model.gcn_net[-1].out_dim"
QGTune subpackage
Main purpose of the GQTune subpackage includes:
- Create an Optuna study from a config file (preferably YAML file)
- Build Optuna search space from the three described YAML files
- Save hyperparamter values of the best trial
- Save hyperparameter values of the best config
Create an Optuna study
The input for creating an Optuna study is a configuration dictionary that should include essential keys like "storage"
, "study_name"
, and "direction"
, for example:
{
"study_name": "quantum_grav_study", # name of the Optuna study
"storage": "experiments/results.log", # only supports JournalStorage for multi-processing
"direction": "minimize" # direction of optimization ("minimize" or "maximize")
}
If storage
is assigned to None
(or null
in YAML file), the study will be saved with optuna.storages.InMemoryStorage
, i.e. in RAM only until the Python session ends.
For simplicity while working with multi-processing, we only support storage with Optuna's JournalStorage.
Build an Optuna search space
To build an Optuna search space with QGTune
, users can use build_search_space_with_dependencies()
function as in the following example:
from QGTune import tune
def objective(trial, tuning_config):
search_space_file = tuning_config.get("search_space_path")
depmap_file = tuning_config.get("dependency_mapping_path")
tune_model = tuning_config.get("tune_model")
tune_training = tuning_config.get("tune_training")
base_config_file = tuning_config.get("base_settings_path")
built_search_space_file = tuning_config.get("built_search_space_path")
search_space = tune.build_search_space_with_dependencies(
search_space_file,
depmap_file,
trial,
tune_model=tune_model,
tune_training=tune_training,
base_settings_file=base_config_file,
built_search_space_file=built_search_space_file,
)
...
search_space
is a dictionary whose keys correspond to hyperparameter names.objective
is the function that will be used later for optimizationtrial
is an object ofoptuna.trial.Trial
tuning_config
serves as the configuration dictionary forQGTune
, defined by users (see a full example at the end of section Save best trial and best config)search_space_file
,depmap_file
,base_config_file
are paths to the search space file, dependency file, and base config file, respectively. These paths can be specified intuning_config
.tune_model
: whether to tune the hyperparameters associated with themodel
part of the search spacetune_training
: whether to tune the hyperparameters associated with thetraining
part of the search spacebuilt_search_space_file
: path to save the built search space. All hyperparameter values defined viatrial
suggestions will be recorded in this file as their initial suggestions. These values do not represent the best trial. This file serves as a reference for generating the best configuration later.
Note that the base_config_file
is required if either tune_model
or tune_training
is False
. In this case, hyperparameter values from the base settings will overwrite the corresponding part in the search_space
dictionary.
Save best trial and best config
After running all trials, users can save hyperparameter values of the best trial to a YAML file with save_best_trial(study, out_file)
function.
However, hyperparameters in this saved file only cover for the ones with values defined via trial
suggestions and not the ones with fixed values (e.g. model.gcn_net[0].in_dim
).
Therefore, to save all values of utilized parameters, users can use save_best_config()
function.
def save_best_config(
built_search_space_file: Path,
best_trial_file: Path,
depmap_file: Path,
output_file: Path,
):
built_search_space_file
is the file created after runningbuild_search_space_with_dependencies()
best_trial_file
created bysave_best_trial()
depmap_file
is needed again to make sure that all parameter dependencies are resolved
Full example of tuning config YAML file used for QGTune
tune_model: false # whether to use search space for model settings
tune_training: true # whether to use search space for training settings
base_settings_path: base_settings.yam # path to the base settings file
search_space_path: search_space.yaml # path to the search space config file
dependency_mapping_path: depmap.yaml # path to the dependency mapping file
built_search_space_path: built_search_space.yaml # path to save the built search space with dependencies applied
study_name: quantum_grav_study # name of the Optuna study
storage: experiments/results.log # storage file for the Optuna study, only supports JournalStorage for multi-processing
direction: minimize # direction of optimization ("minimize" or "maximize")
n_trials: 20 # number of trials for hyperparameter tuning
timeout: 600 # timeout in seconds for the study
n_jobs: 1 # number of parallel jobs for multi-threading (set to 1 for single-threaded)
n_processes: 4 # number of parallel processes for multi-processing, each process runs n_trials * n_iterations/n_processes
n_iterations: 8 # number of iterations to run the tuning process (each iteration runs n_trials)
best_trial_path: best_trial.yaml # path to save the best trial information
best_param_path: best_params.yaml # path to save the best hyperparameters
An example of tuning with QGTune
We have provided an example in the tune_example.py file to demonstrate the functionality of QGTune
.
In this example, we created sample config for tuning, search space, dependency mapping, and base settings. A small model is also defined based on Optuna's PyTorch example.
The dataset used in this example is Fashion-MNIST. The task is to classify each 28×28 grayscale image into one of 10 classes.
To allow Optuna to track training progress, we need to call trial.report
after each epoch:
def objective(trial, tuning_config):
...
search_space = ...
...
# prepare model
...
# prepare optimizer
...
# prepare data
...
for epoch in range(epochs):
# train the model
...
# validate the model
...
accuracy = ...
trial.report(accuracy, epoch)
if trial.should_prune():
raise optuna.exceptions.TrialPruned()
We also used Optuna's multi-process optimization in this example.
Notes on pruner and parallelization
We used optuna.pruners.MedianPruner when creating an Optuna study (QGTune.tune.create_study()
). Support for additional pruners may be added in the future if required.
Although users can specify n_jobs
(for multi-threading) when running a study optimization, we recommend keeping n_jobs
set to 1
, according to Optuna's Multi-thread Optimization.