Managing Uncertainty in Cue Combination

Part of Advances in Neural Information Processing Systems 12 (NIPS 1999)

Bibtex Metadata Paper


Zhiyong Yang, Richard Zemel


We develop a hierarchical generative model to study cue combi(cid:173) nation. The model maps a global shape parameter to local cue(cid:173) specific parameters, which in tum generate an intensity image. Inferring shape from images is achieved by inverting this model. Inference produces a probability distribution at each level; using distributions rather than a single value of underlying variables at each stage preserves information about the validity of each local cue for the given image. This allows the model, unlike standard combination models, to adaptively weight each cue based on gen(cid:173) eral cue reliability and specific image context. We describe the results of a cue combination psychophysics experiment we con(cid:173) ducted that allows a direct comparison with the model. The model provides a good fit to our data and a natural account for some in(cid:173) teresting aspects of cue combination.

Understanding cue combination is a fundamental step in developing computa(cid:173) tional models of visual perception, because many aspects of perception naturally involve multiple cues, such as binocular stereo, motion, texture, and shading. It is often formulated as a problem of inferring or estimating some relevant parameter, e.g., depth, shape, position, by combining estimates from individual cues. An important finding of psychophysical studies of cue combination is that cues vary in the degree to which they are used in different visual environments. Weights assigned to estimates derived from a particular cue seem to reflect its estimated reliability in the current scene and viewing conditions. For example, motion and stereo are weighted approximately equally at near distances, but motion is weighted more at far distances, presumably due to distance limits on binocular disparity.3 Experiments have also found these weightings sensitive to image ma(cid:173) nipulations; if a cue is weakened, such as by adding noise, then the uncontami(cid:173) nated cue is utilized more in making depth judgments.9 A recent study2 has shown that observers can adjust the weighting they assign to a cue based on its relative utility for a particular task. From these and other experiments, we can identify two types of information that determine relative cue weightings: (1) cue reliability: its relative utility in the context of the task and general viewing conditions; and (2) region informativeness: cue information available locally in a given image. A central question in computational models of cue combination then concerns how these forms of uncertainty can be combined. We propose a hierarchical generative


Z. Yang and R. S. Zemel

model. Generative models have a rich history in cue combination, as thel underlie models of Bayesian perception that have been developed in this area. lO , The nov(cid:173) elty in the generative model proposed here lies in its hierarchical nature and use of distributions throughout, which allows for both context-dependent and image(cid:173) specific uncertainty to be combined in a principled manner. Our aims in this paper are dual: to develop a combination model that incorporates cue reliability and region informativeness (estimated across and within images), and to use this model to account for data and provide predictions for psychophys(cid:173) ical experiments. Another motivation for the approach here stems from our recent probabilistic framework,11 which posits that every step of processing entails the representation of an entire probability distribution, rather than just a single value of the relevant underlying variable(s). Here we use separate local probability dis(cid:173) tributions for each cue estimated directly from an image. Combination then entails transforming representations and integrating distributions across both space and cues, taking across- and within-image uncertainty into account.


In this paper we study the case of combining shading and texture. Standard shape(cid:173) from-shading models exclude texture, l, 8 while standard shape-from-texture mod(cid:173) els exclude shading.7 Experimental results and computational arguments have supported a strong interaction between these cues}O but no model accounting for this interaction has yet been worked out. The shape used in our experiments is a simple surface:

Z = B(l - x2 ), Ixl <= 1, Iyl <= 1


where Z is the height from the xy plane. B is the only shape parameter. Our image formation model is a hierarchical generative model (see Figure 1). The top layer contains the global parameter B. The second layer contains local shad(cid:173) ing and texture parameters S, T = {Sj, 11}, where i indexes image regions. The generation of local cues from a global parameter is intended to allow local uncer(cid:173) tainties to be introduced separately into the cues. This models specific conditions in realistic images, such as shading uncertainty due to shadows or specularities, and texture uncertainty when prior assumptions such as isotropicity are violated.4 Here we introduce uncertainty by adding independent local noise to the underly(cid:173) ing shape parameter; this manipulation is less realistic but easier to control.