This decade has been defined by advancement in data based technologies and learning algorithms. Terms like AI and automation have been used extensively to explain current trends in just about all industries. They have also been used to spark debates over fears of massive unemployment and increased consumerism that these technologies may bring. It is of much importance that you, yes YOU, the reader, to check and understand how these technologies may transform your way of life in years to come. One you may or may not have heard of is Computer Vision, that is likely to transform my current area of work and in this post I look at what this means for you and I.
Okay, Fancy Term, But What Is It?
Computer vision (CV) is a branch of computer science that deals with enabling computers to process digital visual data and perform certain computations to make decisions based on the data. In simpler terms the computers can see and respond to images and videos provided to them, live or recorded, with a high level of accuracy and understanding.
Image processing algorithms are at the heart of this to analyse images and videos (videos are just images when taken frame by frame). However, computers can see more than just images of bananas, image processing algorithms can be used for thermal (infrared) imaging, medical (x-ray and CT) scanning, satellite imaging and other forms humans can’t detect.
CV has proven incredibly important to some of the most talked about companies in the world. Tesla is using CV to control their driver-less cars while Google Photos has already categorized my photos in terms of people and places. These are just some ways CV is being used but the possibilities are endless.
Is Computer Vision Important?
A study by Cisco revealed that by 2019, 80% of all Internet traffic will be video. We are a year away from that reality. Hmm. Maybe I should be making videos instead of bloggi… I digress. According to that study, there is an ongoing explosion of video content. Without CV algorithms most of the data that can be generated from the content will be wasted.
In the media and entertainment world, useful information derived from videos can be used to more efficiently position and time adverts with sufficient knowledge that they will be seen/interacted with. Check out how TheTake is doing it in a very interesting way. CV has been used extensively in sports broadcast especially with tracking fast moving objects and object identification. Post-match analysis of sports videos, Figure 1, gave rise to richer sports commentaries, very useful for coaches and fans.
On a hardware level, CV is useful in the automotive industry (as discussed earlier with Tesla), manufacturing where quality assurance can be aided with CV; check out Sight Machine a company that uses CV and other AI techniques to improve manufacturing, farming industry, for this check out Prospera, to detect crop yields and many more. They may work hand in hand with IoT devices to deliver decisions over the Internet.
One limitation with CV applications is poor quality images. However, we are seeing how that keeps changing year after year with cameras capable of taking higher resolution images, at a higher dynamic range. They even include processors that perform image stabilization, noise reduction and defect removal all while being smaller and more robust.
What Really Happens?
So far I’ve mentioned image processing and algorithms, I’ll explain further. An image fed to a computer can be broken down to individual pixels. Each pixel is defined by a color or the chromaticity of the pixel. There could be several ways to represent color but a popular scheme is the RGB value that defines intensity of red, green or blue color as an integer between 0 and 255 e.g. (201, 250, 100) represents tennis ball green. To perform image analysis, you tell the computer what RGB value requires tracking. In our example, you feed it the RGB value of tennis ball green and images of a tennis court with an ongoing match. The scene is analysed pixel by pixel until it lands on the pixel whose RGB value has the lowest difference to the one provided.
That covers the basics but in reality, things need to be more efficient than that. Analysis is better performed using kernels which analyse a patch of pixels and characterize them. Kernels can then be combined to characterize a combination of features and with this complex images can be detected. Convolution algorithms can be further added to aid in detection, where a series inputs from an image can carry a specific weight (by multiplying the input value with the weight) and then added together. This is used to generate useful kernels to further analyse the images. Such is a convolutional neural network (CNN), which learns to generate useful kernels.
Beyond this, the CNNs may perform image processing in layers. Layer 1 may detect lines (1D), layer 2 may detect shapes (2D), layer 3 may detect shadows (3D) and so on. Usually, the greater the number of layers used the better the computer’s ability to accurately identify objects and make meaningful decisions. The use of a multitude of layers, as in Figure 2, gave rise to the term deep learning algorithms.
This goes even further with stuff like Markov models coming into place to provide more accurate results.
Where CV is Best Applied
CV is likely to revolutionize several fields and industries. We will begin to see smarter devices and robots using imaging to perform a variety of tasks. Drones equipped with cameras to give reports of drought and forest cover and immediately establish optimal irrigation schemes. CV experts and doctors could start collecting all imaging records for faster and more accurate diagnosis. The results of applied computer vision could massively reduce costs of items as manufacturing processes become more streamlined.
The entertainment sector stands to make massive profits from applied CV. Imagine being able to read your audiences reactions and quickly adjust their content, or learning a living room’s occupants and giving meaningful recommendations (if they allow for this feature of course). Here, only your creativity limits you.
The Computer Vision Technologies and Markets report defines 8 large application areas for CV, these are:
- Automotive
- Sports and Entertainment
- Consumer and Mobile
- Robotics and Machine Vision
- Medical
- Security and Surveillance
- Retail
- Agriculture
The report was compiled with assistance of the Embedded Vision Alliance which is a worldwide industry partnership of different technology leaders with an aim of producing practical applications to CV.
A Place to Start
The supply of experts in deep learning and computer vision is low compared to the continued rise in global demand. Perhaps you might want to learn more about this, or you are in one of the application areas discussed above (frankly, I believe some of the best engineers are those already entrenched in particular fields like agriculture or surveillance who then add cutting-edge concepts such as CV to their toolkit).
On the web it may be taxing to find a good online course but I luckily found a CV course offered by Udacity. It’s quite lengthy as the subject is broad and complex, but after scanning through and completing the first two chapters, I highly recommend it as it covers all you’ll need to get started.
Conclusion
Computer vision is an interesting sub-field of digital signal processing (DSP) that it has become worth consideration in research and projects that I’ll conduct in future. It promises to dramatically change our way of life (for better or worse) and some understanding of it can be used to steer business into more profitable methods and practices.