"What I cannot create, I do not understand." – R. Feynman
Jonas Kubilius


What is the purpose of vision? In our recent Simplicity vs Likelihood workshop Frank Jäkel has been quite pushing me with this question. In fact, so much that I could no longer ignore it.

At the same time I had an opportunity to push Jacob Feldman to tell me how a Gestalt principle of common fate could be implemented on a computer. (Wait for my next post on common fate.) Feldman’s answer was unsettling to me but thanks to Lee de-Wit it did not go unnoticed. “Common fate attempts to find surfaces in the inputs,” was more or less his answer.

And now I think we got it all wrong. Edges group together and then an object is extracted? No, it is rather that the visual system is looking for surfaces in the inputs, and edges are just one of the cues to help find those surfaces, just like color or textures, or common motion. These surfaces are used for depth assignments in the scene. It could be a dichotomous figure-ground segmentation or a more precise depth map. And that’s the organizing priciple of at least mid-level vision.

I’m not sure though if depth order is established for extracting 3D object structure. My mom has always (albeit jokingly) claimed to live in a 2D world and for a long time I shared the same confusion about the dimensionality of space. I can still remember not “seeing” the third dimension in a scene – only flat surfaces, much like on a TV. (Now that I watch much less TV, I started noticing 3D around me).

To my surprise, these thoughts are very much in line with the works of two members of our lab, Naoki Kogo andJan Koenderink. I will go ahead and explore their ideas.