Javier Bautista
In this paper I will describe a visual
system that combines cooperating bottom-up and top-down signals to
construct an egocentric representation of the environment. Information
travels up and down following two well know pathways: a 'where subsystem'
where the spatial location of the attended stimulus is processed and the
short-term memory map of the environment is updated. A 'what subsystem'
where the intrinsic features of the attended stimulus are processed. Both
pathways meet together at their endpoints linking together both subsystems.
This way, a short term memory of what and where is in the environment is
kept. This high level knowledge is in turn used to create expectations
that direct attention to specific locations and features. Attentional enhancement
is projected through feedback top-down connections to the lower levels
facilitating the segmentation and recognition of specific interesting stimuli.
For this purpose, the visual cortex is proposed as a laminar structure
(which can be replicated at a higher scale in V2) where feature extraction,
feature grouping, attentional enhancement and segmentation processes are
performed in separate, cooperating layers. Features of different kinds
(color, oriented edges, etc.) are extracted from retinal image and used
to form groupings of candidate objects. Features belonging to the same
grouping excite each other, while features of different grouping inhibit
one to another. Thus groupings compete to get attention. Top-down attentional
enhancement carrying high level expectations biases this competition towards
interesting features/locations. Only the most active grouping will make
it through this competition. Wherever this grouping is in the topological
V1 map, a blob of activation will extend over the entire area of the candidate
object allowing none but all of the features belonging to the new attended
stimulus to pass their activity up to the higher levels for further processing.
The model thus shows how low level processes can build up a high level representation of the environment, which in turn is used to help the lower level processes. A solution to the binding problem is also proposed by linking the 'where' and 'what' subsystem, so that different instances of the same object class can be maintained and localized simply by following the connections that link its object feature representation with its spatial position.