Hyperparameter Tuning

Overview

Tuning a model often requires exploring the impact of changes to many hyperparameters. The best way to approach this is generally not by changing the source code of the training script as we did above, but instead by defining flags for key parameters then training over the combinations of those flags to determine which combination of flags yields the best model.

Training Flags

Here’s a declaration of 2 flags that control dropout rate within a model:

FLAGS <- flags(
  flag_numeric("dropout1", 0.4),
  flag_numeric("dropout2", 0.3)
)

These flags are then used in the definition of the model here:

model <- keras_model_sequential()
model %>%
  layer_dense(units = 128, activation = 'relu', input_shape = c(784)) %>%
  layer_dropout(rate = FLAGS$dropout1) %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dropout(rate = FLAGS$dropout2) %>%
  layer_dense(units = 10, activation = 'softmax')

Once we’ve defined flags, we can pass alternate flag values to training_run() as follows:

training_run('mnist_mlp.R', flags = list(dropout1 = 0.2, dropout2 = 0.2))

You aren’t required to specify all of the flags (any flags excluded will simply use their default value).

Flags make it very straightforward to systematically explore the impact of changes to hyperparameters on model performance, for example:

for (dropout1 in c(0.1, 0.2, 0.3))
  training_run('mnist_mlp.R', flags = list(dropout1 = dropout1))

Flag values are automatically included in run data with a “flag_” prefix (e.g. flag_dropout1, flag_dropout2).

See the article on training flags for additional documentation on using flags.

Tuning Runs

Above we demonstrated writing a loop to call training_run() with various different flag values. A better way to accomplish this is the tuning_run() function, which allows you to specify multiple values for each flag, and executes training runs for all combinations of the specified flags. For example:

# run various combinations of dropout1 and dropout2
runs <- tuning_run("mnist_mlp.R", flags = list(
  dropout1 = c(0.2, 0.3, 0.4),
  dropout2 = c(0.2, 0.3, 0.4)
))

# find the best evaluation accuracy
runs[order(runs$eval_acc, decreasing = TRUE), ]
Data frame: 9 x 28 
                    run_dir eval_loss eval_acc metric_loss metric_acc metric_val_loss metric_val_acc
9 runs/2018-01-26T13-21-03Z    0.1002   0.9817      0.0346     0.9900          0.1086         0.9794
6 runs/2018-01-26T13-23-26Z    0.1133   0.9799      0.0409     0.9880          0.1236         0.9778
5 runs/2018-01-26T13-24-11Z    0.1056   0.9796      0.0613     0.9826          0.1119         0.9777
4 runs/2018-01-26T13-24-57Z    0.1098   0.9788      0.0868     0.9770          0.1071         0.9771
2 runs/2018-01-26T13-26-28Z    0.1185   0.9783      0.0688     0.9819          0.1150         0.9783
3 runs/2018-01-26T13-25-43Z    0.1238   0.9782      0.0431     0.9883          0.1246         0.9779
8 runs/2018-01-26T13-21-53Z    0.1064   0.9781      0.0539     0.9843          0.1086         0.9795
7 runs/2018-01-26T13-22-40Z    0.1043   0.9778      0.0796     0.9772          0.1094         0.9777
1 runs/2018-01-26T13-27-14Z    0.1330   0.9769      0.0957     0.9744          0.1304         0.9751
# ... with 21 more columns:
#   flag_batch_size, flag_dropout1, flag_dropout2, samples, validation_samples, batch_size,
#   epochs, epochs_completed, metrics, model, loss_function, optimizer, learning_rate, script,
#   start, end, completed, output, source_code, context, type

Note that the tuning_run() function returns a data frame containing a summary of all of the executed training runs.

Experiment Scopes

By default all runs go into the “runs” sub-directory of the current working directory. For various types of ad-hoc experimentation this works well, but in some cases for a tuning run you may want to create a separate directory scope.

You can do this by specifying the runs_dir argument:

# run various combinations of dropout1 and dropout2
tuning_run("mnist_mlp.R", runs_dir = "dropout_tuning", flags = list(
  dropout1 = c(0.2, 0.3, 0.4),
  dropout2 = c(0.2, 0.3, 0.4)
))

# list runs witin the specified runs_dir
ls_runs(order = eval_acc, runs_dir = "dropout_tuning")
Data frame: 9 x 28 
                              run_dir eval_acc eval_loss metric_loss metric_acc metric_val_loss metric_val_acc
9 dropout_tuning/2018-01-26T13-38-02Z   0.9803    0.0980      0.0324     0.9902          0.1096         0.9789
6 dropout_tuning/2018-01-26T13-40-40Z   0.9795    0.1243      0.0396     0.9885          0.1341         0.9784
2 dropout_tuning/2018-01-26T13-43-55Z   0.9791    0.1138      0.0725     0.9813          0.1205         0.9773
7 dropout_tuning/2018-01-26T13-39-49Z   0.9786    0.1027      0.0796     0.9778          0.1053         0.9761
3 dropout_tuning/2018-01-26T13-43-08Z   0.9784    0.1206      0.0479     0.9871          0.1246         0.9775
4 dropout_tuning/2018-01-26T13-42-21Z   0.9784    0.1026      0.0869     0.9766          0.1108         0.9769
5 dropout_tuning/2018-01-26T13-41-31Z   0.9783    0.1086      0.0589     0.9832          0.1216         0.9764
8 dropout_tuning/2018-01-26T13-38-57Z   0.9780    0.1007      0.0511     0.9855          0.1100         0.9771
1 dropout_tuning/2018-01-26T13-44-41Z   0.9770    0.1178      0.1017     0.9734          0.1244         0.9757
# ... with 21 more columns:
#   flag_batch_size, flag_dropout1, flag_dropout2, samples, validation_samples, batch_size, epochs,
#   epochs_completed, metrics, model, loss_function, optimizer, learning_rate, script, start, end,
#   completed, output, source_code, context, type

Sampling Flag Combinations

If the number of flag combinations is very large, you can also specify that only a random sample of combinations should be tried using the sample parmaeter. For example:

# run random sample (0.3) of dropout1 and dropout2 combinations
runs <- tuning_run("mnist_mlp.R", sample = 0.3, flags = list(
  dropout1 = c(0.2, 0.3, 0.4),
  dropout2 = c(0.2, 0.3, 0.4)
))