Computer vision, a form of Artificial Intelligence (AI) that involves computers extracting information from images, has tons of potential applications to all sorts of business and scientific needs. It is not surprising that various groups are investing in development of these techniques, but, as Gary Marcus points out in a New York Times op-ed, most of these approaches are bottom-up, crunching huge amounts of data on pixel color and pattern (i.e., AI as a “passive vessel”) to discern content or classify the image. At the same time, the approaches are confined to small groups in labs or companies that have little incentive to share their breakthroughs with the outside world. Another technical problem is that these approaches can produce incorrect results for reasons that are hard for a human to identify because they often come from multiple processing steps that are difficult to trace.
“To get computers to think like humans, we need a new A.I. paradigm, one that places “top down” and “bottom up” knowledge on equal footing. Bottom-up knowledge is the kind of raw information we get directly from our senses, like patterns of light falling on our retina. Top-down knowledge comprises cognitive models of the world and how it works.”
Marcus calls for approaches to AI that utilize more top-down approaches – that is, incorporate the strengths of human intelligence into the AI framework. More data (bottom-up) do not necessarily lead to a better decision, especially if that decision involves complex thought, such as considering image context or future actions of objects.
I wholeheartedly agree with Gary Marcus’s position based on my own experiences in the computer vision world. I am not a computer scientist, but I have been working for years on plankton imaging and the automated analysis of the images, so I have a general familiarity with the approaches to analyze image data. Currently, there is a push within the image processing world towards “deep learning” techniques, which is a form of AI and appears to be similar to previous approaches to recognize plankton images – extract as much data as possible and categorize based on some training set that creates a model for the algorithm to follow. Over time working with results of various computer classification techniques, I have developed a new, but profound respect for the human brain. We truly are image processing wizards – we easily look at an image in 2D and can interpret it in 3D, and we also have incredible skill for understanding the context of the image – two things that are difficult to communicate to a computer. The reason for this difficulty is that these image processing instincts humans possess are not easily translatable to a computer, which works in mathematical terms. How do you describe mathematically that an object has multiple orientations toward the camera, and all of these orientations should be considered the same type of object? This is not trivial to implement in a computer program, but it is quite easy for our brains to accomplish this task.
Our ability to construct “cognitive models of the world” leads to numerous mental shortcuts that are accurate and are computationally inexpensive. For example, within the study of predator-prey interactions and Batesian mimicry, there is an idea of “feature saltation”, which essentially means that a predator uses one or two visual traits to assess whether a potential prey item is palatable or threatening. This is exactly what humans do to recognize objects. We assess the overall shape, which computers do quite well, but then we cue in on features of the images (e.g., lighting, positioning of eyes, stems, etc.). Once we see one or two relatively subtle things, we can typically make a positive and accurate identification - as well as say something about what may be happening in the image. From a computer’s perspective, it is difficult to cue in on specific features, which is why deep learning algorithms can periodically be “tricked” in non-intuitive ways. Marcus mentioned an example of a deep learning algorithm mistaking a pattern of yellow and black stripes for a school bus.
I hope the AI community takes some of these suggestions to heart because this is an exciting field that potentially could progress faster if we change a few approaches. Although he doesn’t say this explicitly, I believe Marcus would agree that we need more research conducted on how human (and other animal) brains construct these "cognitive models", which will help computer scientists more accurately incorporate this top-down knowledge into AI.