Nowadays, people seem to take for granted how sophisticated and powerful computers have become. You can talk to your phones, and your Bluetooth speakers and they will respond with context-aware information; in certain cars, you can take your hands off the wheel and let yourself be carried down the road by electronics, and you can also share pictures and messages at the touch of a button anywhere around the world.
But one aspect where your devices are still very much in their infancy is that of computer ‘vision’. While there are ever-better cameras in markets today, but in terms of understanding the world, these devices are quite dumb. While they can see with superb precision, they can’t yet understand what they are seeing.
For example, if one shows a three-year-old child an image of a person standing next to an elephant, they would have no issues telling what they are seeing, but it would be extremely challenging for a computer to do the same.
However, things are changing, as recently, a field of computing called ‘deep learning’ has dramatically enhanced the capacity of computers to understand what they see. Rather than relying on deep learning, traditional image processing techniques, the use of convolutional neural networks are beginning to make significant inroads into giving computers the ability to make sense of the world.
CNNs were first pioneered back in the late 1980s based on a series of earlier work in the 1960s on Artificial Neural Networks (ANNs) and Multilayer Perceptrons (MLPs). They were initially designed to work similarly to the human brain. Of course, much like a human brain, to do their job well, they need loads of data on which to be trained.
Convolutional neural networks became more widely known and used around 2005 with the rise of modern GPUs, as their ability to process repetitive tasks at speed makes it practical to use CNNs.
CNNs are neural networks used mainly to group images (i.e., name what they see), cluster images by similarity (photo search), and perform object recognition within scenes. For instance, convolutional neural networks are used to identify individuals, faces, street signs, platypuses (platypi?), tumors, and many other facets of visual data.
The efficiency of convolutional nets in image recognition is one of the main reasons why the world has woken up to the effectiveness of deep learning. In a way, CNNs made deep learning famous. CNNs are powering significant advances in computer vision (CV), which has obvious applications for self-driving cars, robotics, security, drones, treatments, and medical diagnoses for the visually impaired.
Convolutional networks can also perform more profitable (and more banal), business-oriented tasks like optical character recognition (OCR) to digitize text and make natural-language processing possible on analog and hand-written documents, where the images are symbols to be transcribed.
CNNs are not limited to image recognition, however. They have been applied directly to text analytics. And they can be applied to sound when it is represented visually as a spectrogram, and graph data with graph convolutional networks.
The real-time systems that are around today have an ever-increasingly tough task – process all the data and make intelligent decisions about what should be done next. CNNs are fundamental to processing all of this data very rapidly.
CNNs are used in a variety of areas, including image and pattern recognition, speech recognition, natural language processing, and video analysis. From smartphones to smartwatches, from virtual-reality gaming consoles, advanced driver assistance systems (ADAS) to drone control, and a host of security devices, the application areas that rely on high-resolution imaging (1080p, 4K, and beyond) are growing.
High-resolution imaging is so sophisticated today, that it is being relied on for almost everything from face recognition in security systems to face detection in smartphones to traffic sign recognition in our vehicles, and for the autonomous car of the future. There’s an excellent opportunity to use CNN techniques to enhance computer vision applications further to achieve a superb form of accuracy.
A neural network is a system of interconnected artificial “neurons” that exchange messages with one another. The connections have numeric weights that are tuned during the training process so that a properly trained network will respond correctly when presented with a pattern or image to recognize. The network comprises of multiple layers of feature-detecting “neurons.” Each Layer is composed of many neurons that respond to different combinations of inputs from the previous layers.
Architectural Layers in Convolutional Neural Networks:
A ConvNet is an interconnection of Layers. Every Layer performs a simple function: It transforms an input 3D volume (neurons arranged in 3 dimensions: height, depth, and width) to an output 3D volume with some differentiable function that may or may not have parameters. There are three primary types of layers to build ConvNet architectures: Convolutional Layer, Pooling Layer, and Fully Connected Layer. A representation of a simple CNN looks like: Input -> Convolutional -> Rectified Linear Unit -> Pool -> Fully Connected.
In conclusion, CNNs are unique cases of the neural network described above. A Convolutional Neural Network consists of one or more convolutional layers, often with a subsampling layer, which are followed by one or more fully connected layers as in a standard neural network.
With the innovation in the semiconductor technology to low power highly parallel processors and lower geometries, there is an opportunity to provide Convolutional Neural Network (CNN) based computer vision solutions.
About the Author
Mayur Rele is a cloud automation expert and cybersecurity leader that has a wide experience in overseeing global technology, cloud infrastructure, and security in healthcare, e-commerce, and technology companies. Mayur graduated with an M.S. in Computer and Telecommunications Engineering from Stevens Institute of Technology and is an active IEEE researcher and contributor.