One of the areas of machine intelligence that has been more dramatically disrupted by the deep learning revolution is computer vision.  For decades the field of computer vision has relied on carefully handcrafting features to improve the accuracy of algorithms, developing a rich theory and thousands of very domain-specific algorithms. With deep learning this has changed: given the right conditions, many computer vision tasks no longer require such careful feature crafting. Among such tasks we have image classification: teaching a machine to recognize the category of an image from a given taxonomy.

How hard image classification really is? In 2013, Kaggle launched a competition to classify pictures of cats and dogs, providing 12,500 images of each. According to this paper, the state of the art algorithms were expected to get an accuracy of around 80%. It turns out that the accuracy, using deep learning, was over 98%. But how is that even possible?

ImageNet: Where it all started

One of the earliest successes of deep learning is the ImageNet challenge. The ImageNet data set is a huge image library with over 1000 classes, curated by initiative of Fei-Feli Li, from the University of Illinois in Urbana-Champaign. Launched in 2010, the ImageNet challenge is a competition using this data set for researchers to evaluate the quality of their algorithms. Around 2011, the error rate was 25%. In 2012, using a deep learning architecture known as AlexNet, it was possible to reduce the error rate to 16%. The architecture of this network has been used over and over in different domains, as it has proven to be very successful. It is also possible to fine tune the trained network to adapt it to your application, so that you don’t need to retrain it every time!

Autonomous driving

One of the most fascinating applications of computer vision and deep learning is autonomous driving. On a recent article published in, NVIDIA researchers describe an end-to-end autonomous driving system. The resulting network architecture, a convolutional neural network (CNN) called PilotNet, is fed data collected on a real vehicle by a human driver. The data consists of steering angle and video images from the road. The motivation was to eliminate the need of creating hand-coded rules for the driving system, as the system is able to generate the necessary domain knowledge from the raw data. One striking feature is that the car is able to remain on the correct lane even when there are no marks.  The development was done using and NVIDIA DevBox using Torch 7 for the training, and an NVIDIA DriveP X self-driving car computer for the driving. Once the network is trained, the car computer captures the image from a video feed and returns the correct steering angle.

Of couse, NVIDIA is not alone. A startup called, founded by deep learning experts from the Stanford University’s Artificial Intelligence Laboratory is working in the development of a completely autonomous vehicle as well, integrating deep learning from the beginning on the design.

Autonomous driving for the poor man

You may not have a ton of data at hand, maybe not even a car on which to run experiments. But that does not mean that you should miss the fun. Udacity recently open sourced their autonomous car simulator, on which you can train your own car to drive! The simulator is built in Unity, so you need to install it first and be somewhat familiar with it to retrieve the data. But once this is done, it does not take a lot of code, nor a lot of time to start developing our own self-driving car, at least virtually. Or you can use also training data from Grand Theft Auto V to create your own self-driving algorithm.

Cucumber sorting

Around a year ago, Japanese former embedded systems engineer decided to help his parents’ cucumber farming. Cucumbers are grown and sorted according to different attributes, among them, the curvature of the cucumber, into nine different classes. He was surprised of how much manual, tedious work was involved in the sorting process and decided to give a try to something else. Using 7000 labeled images from his mother, he was able to reach 95% accuracy. In his original design, the heavy deep learning part takes place in the cloud (using Google Cloud Machine Learning API). However, the progress in hardware and the increasing availability of high-quality, affordable (and tiny!) graphic cards could cut the cloud dependency out and run the classification algorithm in your mobile or tablet. You can read the full story here.

Breast cancer detection

Breast cancer is one of the major threats to women’s health. According to, 1 in 8 U.S. women will develop invasive cancer during her lifetime, and in 2017 alone, a bit over forty thousand deaths are expected from the disease. Traditionally, women over 50 should perform X-ray checks every year, and follow-up tests are scheduled if something does not seem quite right.  The diagnosis, is, however, quite subjective and depends on the experience of the physician.

Startups like iSonoHealth are working on making this process less invasive and more affordable. It would not be surprise to see that the secret sauce behind their solution is deep learning.

Other applications of image classification worth mentioning are pedestrian and traffic sign recognition (crucial for autonomous vehicles).

Of course, it all comes with a cost: deep learning algorithms are (more often than not) data hungry and require huge computing power, which might be a no-go for many simple applications. However, this is being addressed by top researchers in the area and it might not take too long before we see much more deep learning applications in our everyday lives. The sky is the limit!