Do Conv-nets Dream of Psychedelic Sheep?

triptych of bighorn sheep, with varying levels of abstraction per frame
Deep dreaming at successive layers of abstraction. From top to bottom: input image, conv2-3x3_reduce, inception_4c-1×1. Made using deepdreamgenerator and public domain image from Yellowstone National Park NPS

The Ubiquity of Deep Learning

The modern successes of deep learning are in part due to the universal approximation theorem developed by George Cybenko, Kurt Hornik, and others. The theorem in essence states that a neural network with at least one hidden layer and a non-linear activation function can generally approximate arbitrary continuous functions. In the past decade, the re-purposing of powerful GPUs for training deep networks unleashed the potential of the universal approximation theorem, fueling many new areas of business and research. In a deep learning model, many hidden layers can be stacked one on top of another like a skyward-reaching Tower of Babel, and internal representations can come to represent complicated abstractions and feature hierarchies. These representations can be part of a model predicting anything from the seemingly inconsequential such as nonsensical astrophysics acronyms, streaming video preferences, and dating matches; to the potentially grave: credit risk ratings, medical diagnoses, and dating matches.

Given their pervasiveness, it is more important than ever to understand how deep neural network models form internal representations of input data and make decisions. In practice, back-propagation through variants of stochastic gradient descent do learn to fit the training data pretty well, but there are no guarantees for global convergence and the input data and learning process can often interact in surprising ways.

The In/Scrutability of Deep Networks

The surprising ways deep networks learn has led many to refer to the full process of training and using neural nets as a “black box,” especially in the popular press. These descriptions can lead to poor performance and public frustration when models behave badly.

If you work with deep learning, it pays to try to understand what your models are thinking. Understanding models and predicting their behavior clearly can not only help to improve model performance, but societal concern over privacy and regulations such as Europe’s GDPR are appropriately encouraging increased explain-ability in public-facing machine learning.

Simple Models, Simple Visualizations

Once upon a time, neural network parameters numbered from in the hundreds to a few thousand, and direct inspection of neuron activations was feasible and potentially fruitful. A classical direct visualization of neural network activity is the Hinton diagram (who else?), which directly displays positive and negative activations by the size and shade of a grid of squares.

Hinton diagram example

Visualizing the activity of individual neurons (with a few other tricks) can still be insightful in some cases, such as OpenAI’s Unsupervised Sentiment Neuron work or Andrej Karpathy et al.’s work on visualizing recurrent neural networks (related blog post, about 3/4 down the page). However, even in those cases finding a meaningfully interpretable neuron is a fishing expedition, and most neurons remain fairly inscrutable individually.

Dreaming up Insights in Big Models

You’ve probably seen some iteration of Google’s psychedelic inceptionism/deep dream technique. Deep dreaming was an early (2015) stab at reducing the black-box mystique of neural networks. By using the network to increasingly enhance features related to the activation of a given neuron, an image can be generated to gain some idea of how representations and decisions are formed in the neural network structure. This could be anthropomorphized as the network looking at an image and asking, “what do the image contents remind me of, and how can I make it look more like that?” Applying the deep dream technique is what generated the banner image for this article. Applying the technique at different layers yields varying levels of abstraction. In the title image of a bighorn sheep dreaming in the conv2-3x3_reduce layer, which occurs very early in the network, yields textures like brushstrokes. These are your edge, point, and curve detectors at different orientations that tend to be learned by convolutional kernels in the first few layers of a deep conv-net. Dreaming a bit deeper in the inception_4c-1×1 layer, we see instead collections of features associated with the types of classes used to train the inception model eerily overlaid on the image.

Deep dreaming certainly garnered ample attention and has been used as a creative tool to make beautiful images, but deep dreaming doesn’t offer much in the way of actionable insights on its own. That’s aside, perhaps, from the revelation that arms must grow out of dumbbells.

Four image pane of Dumbbells sprouting beefy arms
Dumbbells sprouting beefy arms from CC BY 4.0 fromGoogle.

Other visualization techniques applied in neural network computer vision include the feature visualization toolbox by Jason Yosinski et al.(demo video), which you can try out yourself if you are prepared to set up a Python environment with Caffe. The authors demonstrate several observations that mainly fall under the category of “interesting” such as neurons that respond to faces, text, and wrinkled fabric. Connecting the dots from “interesting” realizations to actionable insights is where interpretability begins to return on the investment of time and tools. Indeed, it has become a sub-field of machine learning in its own right, feeding the demand for researchers and engineers with skills in interpretable machine learning. It’s a worthwhile addition to your personal or organizational toolbox.

From Visualization to Actionable Utility

Perhaps the most cohesive line of research into deep learning interpretability comes from the confluence of Chris Olah, Google Brain, and OpenAI. Stretching back several years and published mostly on Distill, this work built on the concept of deep dream and feature visualization to develop optimization-based visualizations, further developed to understand attribution with spatial activations, and with the recent publication of activation atlases the authors demonstrated insights into vulnerabilities to adversarial examples and strange feature correlations.

A series of progressing images: from simple visualizations to activation atlases
Activation atlases are the latest in a line of research progressing from the simple: feature visualizations of single neuron activations, to the complex: activation atlases that subsample from a manifold representing all probable images the network might encounter. Image modified under a CC BY 4.0 license from Shan Carter, Chris Olah, et al. 2019

Activation atlases partially resemble the related spatial activations with one crucial difference. While spatial activations represent a model’s feature extractions for a single image, activation atlases form a map of feature extractions for all probable images. Instead of arranging a feature visualization for floppy ears, adorable noise, and fluffy paws over an image of a puppy, for example, an activation atlas will arrange a wide variety of animal noses next to each other, and these will eventually blend into other related features such as fur, ears, and tails.

This holistic view of feature visualization for a neural network allows to investigate and predict model behavior in interesting ways. By observing the proximity of features distinguishing “scuba diver” from “snorkeling” to the features that looked like they belong to a locomotive engine, the authors guessed that they could incorporate a patch from an image of a train to trick the network into identifying a snorkeler as a scuba diver. Other adversarial examples the team investigated included disguising a frying pan as a wok by adding noodles and a grey whale as a shark by adding a baseball.

image of cookware with cells and percentages of recognition of image as wok or frying pan
All you need to make a frying pan a wok are a few noodles. Image used under a CC BY 4.0 license from Shan Carter, Chris Olah, et al. 2019

Interpretable Value Moving Forward

To summarize, the field of interpretability has grown a lot since deep dream, and its beginning to offer actionable insights. The types of adversarial examples generated with activation atlases relied on an artful combination of whole-network feature visualization and human intuition. As deep learning networks continue to deploy to new areas and find greater adoption in existing fields, they’ll be expected to be held more accountable and be more predictably reliable. In this landscape, the demand for and value generated by interpretability tools and practitioners can only be expected to increase.