An Introduction Into The Concepts and Technology That Allow Computers to See and Learn

It may seem like many of the latest technological innovations are reliant on image recognition – and you’d be right. The tech behind facial recognition in our smartphones, autonomous modes in self-driving cars, and diagnostic imaging in healthcare have made massive strides in recent years. They all use solutions that make sense of objects in front of them – hence why it’s often called “computer vision”. These computers are able to make accurate decisions based on what they “see”.

Curious as to how it’s possible? In this article, we’ll provide a high-level explanation of how image recognition works, along with the deep learning technology that powers it. The following is meant for those without an advanced engineering background (there’s plenty of deep-dive information around the web) but still have an interest in image recognition technology.

Deep Learning and Neural Networks — Algorithms That Get Smarter With Time

Much of the modern innovations in image recognition is reliant on deep learning technology, an advanced type of machine learning, and the modern wonder of artificial intelligence. Typical machine learning takes in data, pushes it through algorithms, and then makes a prediction; this gives the impression that a computer is “thinking” and coming to its own conclusion. Deep learning differs in how it’s able to determine if the conclusions are correct all on it’s own, given enough time.

This is important for image recognition, as you’d want a something like a self-driving car to be able to tell the difference between a sign post and a pedestrian. How deep learning technology works is with something called neural networks.

Neural networks use algorithms that are layered next to each other. This makes each algorithm contingent on the outcomes of the other surrounding algorithms. This creates a process that tries to simulate the logical reasoning that we use as humans (and why we call it “artificial intelligence”). For image recognition, the kind of neural network used is called convolutional neural networks.

Convolutional Neural Networks — Breaking Images Into Numbers

When we see something, our brain makes sense of it by labeling, predicting, and recognizing specific patterns. A computer using convolutional neural networks (CNNs) processes information in a similar way, but it does so by using numbers. Where we recognize patterns through our sense of sight (in conjunction with our other senses), a CNN does so by breaking images down into numbers.

The inner workings of CNNs are obviously far more intricate and technical than just number patterns, but it’s important to know what convolution is: the combination of two functions that produce a third function. A neural network that uses convolution is merging multiple sets of information, pooling them together to create an accurate representation of an image. After pooling, the image is described in lots of data that a neural network can use to make a prediction about what it is. Computers can then apply that prediction to other applications, like unlocking your phone or suggesting a friend to tag on Facebook.

A neural network will learn over time if its predictions are accurate. Like with anything else, it takes a lot of training for computers to get their predictions right; they don’t automatically know how to classify what objects are called in the real world.

(Convolutional Neural Network)

Image Datasets — Applying Experience to More Challenging Cases

Something that many folks don’t know about artificial intelligence is how much human work goes into making what’s called a dataset. This is how deep learning model trains: it practices making predictions from the information in a dataset and uses that experience in real-world situations. Part of why image recognition is such a developed and widely used form of artificial intelligence is because of how developed the datasets are. A notable example for image recognition is ImageNet, one of the first widely-used image databases for artificial intelligence.

The ImageNet project labeled 3.2 million images and prompted researchers to develop their own algorithmic models. Those labeled images created a dataset that AI models could practice on, recognizing images with growing complexity and utilizing more advanced convolutions. ImageNet was eventually followed by AlexNet, which utilized a deep convolutional neural network architecture that’s still used to this day.

For an AI application to be able to process such a vast amount of information, and use it effectively within a deep learning model, it requires some very efficient processing power.

GPU Solutions — Turnkey Setups from Exxact

Not sure where to start? Our Deep Learning Workstations are a great place to start, preinstalled with all of the Deep Learning Frameworks, and powered by the latest NVIDIA GPUs.

Have any questions? Contact us directly here.