Convolutional Neural Networks

What are Convolutional Neural Networks?

Originally, convolutional neural networks (CNNs) were a technique for analyzing images.
Applications have expanded to include analysis of text, video, and audio.
CNNs apply multiple neural networks to subsets of a whole image in order to identify parts of the image.

The idea behind CNN

Recall the old joke about the blind-folded scientists trying to identify an elephant.
A CNN works in a similar way. It breaks an image down into smaller parts and tests whether these parts match known parts.
It also needs to check if specific parts are within certain proximities. For example, the tusks are near the trunk and not near the tail.

Is the image on the left most like an X or an O?

Images borrowed from http://brohrer.github.io/how_convolutional_neural_networks_work.html

What features are in common?

Building blocks of CNN

CNN performs a combination of layers

Convolution Layer
- This layer compares a feature with all subsets of the image
- It creates a map showing where the comparable features occur
Rectified Linear Units (ReLU) Layer
- This layer goes through the features maps and replaces negative values with $0$
Pooling Layer
- This layer reduces the size of the rectified feature maps by taking the maximum value of a subset

The CNN ends with a final layer

Classification (Fully-connected layer) layer
- This combines the specific features to determine the classification of the image

Convolution → Rectified Linear → Pooling

These layers can be repeated multiple times. The final layer converts the final feature map to the classification.

Example: MNIST Data

Image borrowed from Getting Started with TensorFlow by Giancarlo Zaccone

The MNIST data set is a collection of hand-written digits (e.g., 0-9).
Each digit is captured as an image with 28x28 pixels.
The data set is already partitioned into a training set (60,000 images) and a test set (10,000 images).
The tensorflow packages have tools for reading in the MNIST datasets.
More details on the data are available at http://yann.lecun.com/exdb/mnist/

Why Use GPUs?

Over time, bigger models have been developed to handle more complex tasks, and consequently, to handle more computations. The training process involves hundreds of thousands of computations, and we need a form of parallelization to speed up the process.

HPC systems can help meet this demand through specialized hardware, like GPUs which can provide the needed parallelization, and other hardware.

Last updated on Jun 6, 2024