Creating a world representation of the environment from visual images


Javier Bautista

Computer Science Department. University of Southern California


In this paper I will describe a visual system that combines cooperating bottom-up and top-down signals to construct an egocentric representation of the environment. Information travels up and down following  two well know pathways: a 'where subsystem' where the spatial location of the attended stimulus is processed and the short-term memory map of the environment is updated. A 'what subsystem' where the intrinsic features of the attended stimulus are processed. Both pathways meet together at their endpoints linking together both subsystems. This way, a short term memory of what and where is in the environment is kept. This high level knowledge is in turn used to create expectations that direct attention to specific locations and features. Attentional enhancement is projected through feedback top-down connections to the lower levels facilitating the segmentation and recognition of specific interesting stimuli. For this purpose, the visual cortex is proposed as a laminar structure (which can be replicated at a higher scale in V2) where feature extraction, feature grouping, attentional enhancement and segmentation processes are performed in separate, cooperating layers. Features of different kinds (color, oriented edges, etc.) are extracted from retinal image and used to form groupings of candidate objects. Features belonging to the same grouping excite each other, while features of different grouping inhibit one to another. Thus groupings compete to get attention. Top-down attentional enhancement carrying high level expectations biases this competition towards interesting features/locations. Only the most active grouping will make it through this competition. Wherever this grouping is in the topological V1 map, a blob of activation will extend over the entire area of the candidate object allowing none but all of the features belonging to the new attended stimulus to pass their activity up to the higher levels for further processing.

The model thus shows how low level processes can build up a high level representation of the environment, which in turn is used to help the lower level processes. A solution to the binding problem is also proposed by linking the 'where' and 'what' subsystem, so that different instances of the same object class can be maintained and localized simply by following the connections that link its object feature representation with its spatial position.