Unsupervised Learning of the Non-Classical Surround

Gary R. Holt
Bartlett W. Mel
BME Department, USC

The primate visual cortex is an exceedingly powerful image processor, whose circuitry is dedicated in large part to the long-range, multi-scale dynamical interactions involved in contour and surface extraction, completion, and grouping operations (Grossberg & Mignolla, 1985; Peterhans and von der Heydt, 1989; Kapadia et al. 1995). The extraordinary computational demands of this process derive from the fact that accurate parsing of the contour and surface structure of natural images typically depends on multiple cues acting in combination (i.e. intensity, chromaticity, texture, end-stopping, etc.), where the rules of combination are highly nonlinear and depend on configural constraints acting over long distances in the image. In short, the evidentiary value of any single image measurement can be strongly boosted or suppressed by other measurements there and elsewhere acting in subtle combinations.

The nonclassical surrounds of V1 receptive fields are the first stage in the visual cortical processing stream where these critical visual computations are carried out, and likely depend on the extensive network of long-range horizontal connections among cells in the superficial layers (Gilbert et al., 1996). Given the complexity of the geometric inferences that must be made to properly complete contours and fill surfaces, and the large areas over which such information must propagate, the problem of wiring the cortex to carry out these computations is necessarily a very high-dimensional, very nonlinear learning problem. The intrinsic difficulty of this learning problem could account for the fact that contour completion thresholds continue to improve over many years throughout childhood (Kovacs et al., 1999).

In this work, we have used an unsupervised booststrapping approach to the learning of the non-classical surround of V1-like neurons, in which a hardwired local edge detector providing direct input to each neuron is used to train the non-classical surround---i.e. to extract high-order interactions between surround elements that predict the center element. After learning on a training set consisting of complex photographic scenes, it is possible to compare the information available about contours in the image based on the local input only, or based on the surround input ONLY, i.e. entirely excluding input from the center. Thus, all contour information is derived non-locally, and illustrates the considerable repository of information contained in long range interactions. Numerous implications of this kind of surround learning will be discussed in relation to visual cortical information processing.


Gary Holt