TensorBoard: Visualizing Learning

The computations you’ll use TensorFlow for - like training a massive deep neural network - can be complex and confusing. To make it easier to understand, debug, and optimize TensorFlow programs, we’ve included a suite of visualization tools called TensorBoard. You can use TensorBoard to visualize your TensorFlow graph, plot quantitative metrics about the execution of your graph, and show additional data like images that pass through it. When TensorBoard is fully configured, it looks like this:

This tutorial is intended to get you started with simple TensorBoard usage. There are other resources available as well! The TensorBoard README has a lot more information on TensorBoard usage, including tips & tricks, and debugging information.

Serializing the data

TensorBoard operates by reading TensorFlow events files, which contain summary data that you can generate when running TensorFlow. Here’s the general lifecycle for summary data within TensorBoard.

First, create the TensorFlow graph that you’d like to collect summary data from, and decide which nodes you would like to annotate with [summary operations] (https://www.tensorflow.org/api_docs/python/train.html#summary-operations).

For example, suppose you are training a convolutional neural network for recognizing MNIST digits. You’d like to record how the learning rate varies over time, and how the objective function is changing. Collect these by attaching tf$summary$scalar ops to the nodes that output the learning rate and loss respectively. Then, give each tf$summary$scalar a meaningful tag, like 'learning rate' or 'loss function'.

Perhaps you’d also like to visualize the distributions of activations coming off a particular layer, or the distribution of gradients or weights. Collect this data by attaching tf$summary$histogram ops to the gradient outputs and to the variable that holds your weights, respectively.

For details on all of the summary operations available, check out the docs on [summary operations] (https://www.tensorflow.org/api_docs/python/summary/).

Operations in TensorFlow don’t do anything until you run them, or an op that depends on their output. And the summary nodes that we’ve just created are peripheral to your graph: none of the ops you are currently running depend on them. So, to generate summaries, we need to run all of these summary nodes. Managing them by hand would be tedious, so use tf$summary$merge_all to combine them into a single op that generates all the summary data.

Then, you can just run the merged summary op, which will generate a serialized Summary protobuf object with all of your summary data at a given step. Finally, to write this summary data to disk, pass the summary protobuf to a tf$summary$FileWriter.

The tf$summary$FileWriter takes a logdir in its constructor - this logdir is quite important, it’s the directory where all of the events will be written out. Also, the tf$summary$FileWriter can optionally take a Graph in its constructor. If it receives a Graph object, then TensorBoard will visualize your graph along with tensor shape information. This will give you a much better sense of what flows through the graph: see Tensor shape information.

Now that you’ve modified your graph and have a tf$summary$FileWriter, you’re ready to start running your network! If you want, you could run the merged summary op every single step, and record a ton of training data. That’s likely to be more data than you need, though. Instead, consider running the merged summary op every n steps.

The code example below is a modification of the simple MNIST tutorial, in which we have added some summary ops, and run them every ten steps. If you run this and then launch tensorboard --logdir=/tmp/mnist_logs, you’ll be able to visualize statistics, such as how the weights or accuracy varied during training. The code below is an excerpt; full source is here.

# Attach a lot of summaries to a Tensor
variable_summaries <- function(var, name) {
  with(tf$name_scope("summaries"), {
    mean <- tf$reduce_mean(var)
    tf$summary$scalar(paste0("mean/", name), mean)
    with(tf$name_scope("stddev"), {
      stddev <- tf$sqrt(tf$reduce_mean(tf$square(var - mean)))
    })
    tf$summary$scalar(paste0("stddev/", name), stddev)
    tf$summary$scalar(paste0("max/", name), tf$reduce_max(var))
    tf$summary$scalar(paste0("min/", name), tf$reduce_min(var))
    tf$summary$histogram(name, var)
  })
}

# Reusable code for making a simple neural net layer.
#
# It does a matrix multiply, bias add, and then uses relu to nonlinearize.
# It also sets up name scoping so that the resultant graph is easy to read,
# and adds a number of summary ops.
#
nn_layer <- function(input_tensor, input_dim, output_dim,
                     layer_name, act=tf$nn$relu) {
  with(tf$name_scope(layer_name), {
    # This Variable will hold the state of the weights for the layer
    with(tf$name_scope("weights"), {
      weights <- weight_variable(shape(input_dim, output_dim))
      variable_summaries(weights, paste0(layer_name, "/weights"))
    })
    with(tf$name_scope("biases"), {
      biases <- bias_variable(shape(output_dim))
      variable_summaries(biases, paste0(layer_name, "/biases"))
    })
    with (tf$name_scope("Wx_plus_b"), {
      preactivate <- tf$matmul(input_tensor, weights) + biases
      tf$summary$histogram(paste0(layer_name, "/pre_activations"), preactivate)
    })
    activations <- act(preactivate, name = "activation")
    tf$summary$histogram(paste0(layer_name, "/activations"), activations)
  })
  activations
}

hidden1 <- nn_layer(x, 784L, 500L, "layer1")

with(tf$name_scope("dropout"), {
  keep_prob <- tf$placeholder(tf$float32)
  tf$summary$scalar("dropout_keep_probability", keep_prob)
  dropped <- tf$nn$dropout(hidden1, keep_prob)
})

y <- nn_layer(dropped, 500L, 10L, "layer2", act = tf$nn$softmax)

with(tf$name_scope("cross_entropy"), {
  diff <- y_ * tf$log(y)
  with(tf$name_scope("total"), {
    cross_entropy <- -tf$reduce_mean(diff)
  })
  tf$summary$scalar("cross entropy", cross_entropy)
})

with(tf$name_scope("train"), {
  optimizer <- tf$train$AdamOptimizer(FLAGS$learning_rate)
  train_step <- optimizer$minimize(cross_entropy)
})

with(tf$name_scope("accuracy"), {
  with(tf$name_scope("correct_prediction"), {
    correct_prediction <- tf$equal(tf$arg_max(y, 1L), tf$arg_max(y_, 1L))
  })
  with(tf$name_scope("accuracy"), {
    accuracy <- tf$reduce_mean(tf$cast(correct_prediction, tf$float32))
  })
  tf$summary$scalar("accuracy", accuracy)
})

# Merge all the summaries and write them out to /tmp/mnist_logs (by default)
merged <- tf$summary$merge_all()
train_writer <- tf$summary$FileWriter(file.path(FLAGS$summaries_dir, "train"),
                                      sess$graph)
test_writer <- tf$summary$FileWriter(file.path(FLAGS$summaries_dir, "test"))
sess$run(tf$global_variables_initializer())

After we’ve initialized the FileWriters, we have to add summaries to the FileWriters as we train and test the model.

# Train the model, and also write summaries.
# Every 10th step, measure test-set accuracy, and write test summaries
# All other steps, run train_step on training data, & add training summaries

# Make a TensorFlow feed_dict: maps data onto Tensor placeholders.
feed_dict <- function(train) {
  if (train || FLAGS$fake_data) {
    batch <- mnist$train$next_batch(100L, fake_data = FLAGS$fake_data)
    xs <- batch[[1]]
    ys <- batch[[2]]
    k <- FLAGS$dropout
  } else {
    xs <- mnist$test$images
    ys <- mnist$test$labels
    k <- 1.0
  }
  dict(x = xs,
       y_ = ys,
       keep_prob = k)
}

for (i in 1:FLAGS$max_steps) {
  if (i %% 10 == 0) { # Record summaries and test-set accuracy
    result <- sess$run(list(merged, accuracy), feed_dict = feed_dict(FALSE))
    summary <- result[[1]]
    acc <- result[[2]]
    cat(sprintf("Accuracy at step %s: %s", i, acc))
    test_writer$add_summary(summary, i) 
  } else { # Record train set summaries, and train
    result <- sess$run(list(merged, train_step), feed_dict = feed_dict(TRUE))
    summary <- result[[1]]
    train_writer$add_summary(summary, i)
  }
}

You’re now all set to visualize this data using TensorBoard.

Launching TensorBoard

To run TensorBoard, use the following:

tensorboard(log_dir = "path/to/log-directory")

where log_dir points to the directory where the tf$summary$FileWriter serialized its data. If this log_dir directory contains subdirectories which contain serialized data from separate runs, then TensorBoard will visualize the data from all of those runs.

When looking at TensorBoard, you will see the navigation tabs in the top right corner. Each tab represents a set of serialized data that can be visualized.

For in depth information on how to use the graph tab to visualize your graph, see TensorBoard: Graph Visualization.

For more usage information on TensorBoard in general, see the TensorBoard README.