Replicates a model on different GPUs.
Replicates a model on different GPUs.
multi_gpu_model(model, gpus = NULL, cpu_merge = TRUE,
cpu_relocation = FALSE)
Arguments
model | A Keras model instance. To avoid OOM errors, this model could have been built on CPU, for instance (see usage example below). |
gpus |
|
cpu_merge | A boolean value to identify whether to force merging model weights under the scope of the CPU or not. |
cpu_relocation | A boolean value to identify whether to create the model's weights under the scope of the CPU. If the model is not defined under any preceding device scope, you can still rescue it by activating this option. |
Value
A Keras model object which can be used just like the initial
model
argument, but which distributes its workload on multiple GPUs.
Details
Specifically, this function implements single-machine multi-GPU data parallelism. It works in the following way:
Divide the model's input(s) into multiple sub-batches.
Apply a model copy on each sub-batch. Every model copy is executed on a dedicated GPU.
Concatenate the results (on CPU) into one big batch.
E.g. if your batch_size
is 64 and you use gpus=2
,
then we will divide the input into 2 sub-batches of 32 samples,
process each sub-batch on one GPU, then return the full
batch of 64 processed samples.
This induces quasi-linear speedup on up to 8 GPUs.
This function is only available with the TensorFlow backend for the time being.
Model Saving
To save the multi-gpu model, use save_model_hdf5()
or
save_model_weights_hdf5()
with the template model (the argument you
passed to multi_gpu_model
), rather than the model returned
by multi_gpu_model
.
See also
Other model functions: compile
,
evaluate.keras.engine.training.Model
,
evaluate_generator
,
fit_generator
, fit
,
get_config
, get_layer
,
keras_model_sequential
,
keras_model
, pop_layer
,
predict.keras.engine.training.Model
,
predict_generator
,
predict_on_batch
,
predict_proba
,
summary.keras.engine.training.Model
,
train_on_batch
Examples
# NOT RUN {
library(keras)
library(tensorflow)
num_samples <- 1000
height <- 224
width <- 224
num_classes <- 1000
# Instantiate the base model (or "template" model).
# We recommend doing this with under a CPU device scope,
# so that the model's weights are hosted on CPU memory.
# Otherwise they may end up hosted on a GPU, which would
# complicate weight sharing.
with(tf$device("/cpu:0"), {
model <- application_xception(
weights = NULL,
input_shape = c(height, width, 3),
classes = num_classes
)
})
# Replicates the model on 8 GPUs.
# This assumes that your machine has 8 available GPUs.
parallel_model <- multi_gpu_model(model, gpus = 8)
parallel_model %>% compile(
loss = "categorical_crossentropy",
optimizer = "rmsprop"
)
# Generate dummy data.
x <- array(runif(num_samples * height * width*3),
dim = c(num_samples, height, width, 3))
y <- array(runif(num_samples * num_classes),
dim = c(num_samples, num_classes))
# This `fit` call will be distributed on 8 GPUs.
# Since the batch size is 256, each GPU will process 32 samples.
parallel_model %>% fit(x, y, epochs = 20, batch_size = 256)
# Save model via the template model (which shares the same weights):
model %>% save_model_hdf5("my_model.h5")
# }