{"title": "A Bayesian Framework for Figure-Ground Interpretation", "book": "Advances in Neural Information Processing Systems", "page_first": 631, "page_last": 639, "abstract": "Figure/ground assignment, in which the visual image is divided into nearer (figural) and farther (ground) surfaces, is an essential step in visual processing, but its underlying computational mechanisms are poorly understood. Figural assignment (often referred to as border ownership) can vary along a contour, suggesting a spatially distributed process whereby local and global cues are combined to yield local estimates of border ownership. In this paper we model figure/ground estimation in a Bayesian belief network, attempting to capture the propagation of border ownership across the image as local cues (contour curvature and T-junctions) interact with more global cues to yield a figure/ground assignment. Our network includes as a nonlocal factor skeletal (medial axis) structure, under the hypothesis that medial structure ``draws'' border ownership so that borders are owned by their interiors. We also briefly present a psychophysical experiment in which we measured local border ownership along a contour at various distances from an inducing cue (a T-junction). Both the human subjects and the network show similar patterns of performance, converging rapidly to a similar pattern of spatial variation in border ownership along contours.", "full_text": "A Bayesian Framework for Figure-Ground\n\nInterpretation\n\nVicky Froyen\n\n\u2217Center for Cognitive Science\n\nRutgers University, Piscataway, NJ 08854\nLaboratory of Experimental Psychology\n\nUniversity of Leuven (K.U. Leuven), Belgium\nvicky.froyen@eden.rutgers.edu\n\nJacob Feldman\n\nCenter for Cognitive Science\n\nRutgers University, Piscataway, NJ 08854\n\njacob@ruccs.rutgers.edu\n\nManish Singh\n\nCenter for Cognitive Science\n\nRutgers University, Piscataway, NJ 08854\n\nmanish@ruccs.rutgers.edu\n\nAbstract\n\nFigure/ground assignment, in which the visual image is divided into nearer (\ufb01gu-\nral) and farther (ground) surfaces, is an essential step in visual processing, but its\nunderlying computational mechanisms are poorly understood. Figural assignment\n(often referred to as border ownership) can vary along a contour, suggesting a\nspatially distributed process whereby local and global cues are combined to yield\nlocal estimates of border ownership. In this paper we model \ufb01gure/ground estima-\ntion in a Bayesian belief network, attempting to capture the propagation of border\nownership across the image as local cues (contour curvature and T-junctions) in-\nteract with more global cues to yield a \ufb01gure/ground assignment. Our network\nincludes as a nonlocal factor skeletal (medial axis) structure, under the hypothesis\nthat medial structure \u201cdraws\u201d border ownership so that borders are owned by the\nskeletal hypothesis that best explains them. We also brie\ufb02y present a psychophys-\nical experiment in which we measured local border ownership along a contour at\nvarious distances from an inducing cue (a T-junction). Both the human subjects\nand the network show similar patterns of performance, converging rapidly to a\nsimilar pattern of spatial variation in border ownership along contours.\n\nFigure/ground assignment (further referred to as f/g), in which the visual image is divided into nearer\n(\ufb01gural) and farther (ground) surfaces, is an essential step in visual processing. A number of fac-\ntors are known to affect f/g assignment, including region size [9], convexity [7, 16], and symmetry\n[1, 7, 11]. Figural assignment (often referred to as border ownership, under the assumption that the\n\ufb01gural side \u201cowns\u201d the border) is usually studied globally, meaning that entire surfaces and their\nenclosing boundaries are assumed to receive a globally consistent \ufb01gural status. But recent psy-\nchophysical \ufb01ndings [8] have suggested that border ownership can vary locally along a boundary,\neven leading to a globally inconsistent \ufb01gure/ground assignment\u2014broadly consistent with electro-\nphysiological evidence showing local coding for border ownership in area V2 as early as 68 msec\nafter image onset [20]. This suggests a spatially distributed and potentially competitive process of\n\ufb01gural assignment [15], in which adjacent surfaces compete to own their common boundary, with\n\ufb01gural status propagating across the image as this competition proceeds. But both the principles and\ncomputational mechanisms underlying this process are poorly understood.\n\n\u2217V.F. was supported by a Fullbright Honorary fellowship and by the Rutgers NSF IGERT program in Per-\n\nceptual Science, NSF DGE 0549115, J.F. by NIH R01 EY15888, and M.S. by NSF CCF-0541185\n\n1\n\n\fIn this paper we consider how border ownership might propagate over both space and time\u2014that is,\nacross the image as well as over the progression of computation. Following Weiss et al. [18] we\nadopt a Bayesian belief network architecture, with nodes along boundaries representing estimated\nborder ownership, and connections arranged so that both neighboring nodes and nonlocal integrating\nnodes combine to in\ufb02uence local estimates of border ownership. Our model is novel in two particular\nrespects: (a) we combine both local and global in\ufb02uences on border ownership in an integrated and\nprincipled way; and (b) we include as a nonlocal factor skeletal (medial axis) in\ufb02uences on f/g\nassignment. Skeletal structure has not been previously considered as a factor on border ownership,\nbut its relevance follows from a model [4] in which shapes are conceived of as generated by or\n\u201cgrown\u201d from an internal skeleton, with the consequence that their boundaries are perceptually\n\u201cowned\u201d by the skeletal side.\nWe also briey present a psychophysical experiment in which we measured local border ownership\nalong a contour, at several distances from a strong local f/g inducing cue, and at several time delays\nafter the onset of the cue. The results show measurable spatial differences in judged border owner-\nship, with judgments varying with distance from the inducer; but no temporal effect, with essentially\nasymptotic judgments even after very brief exposures. Both results are consistent with the behavior\nof the network, which converges quickly to an asymptotic but spatially nonuniform f/g assignment.\n\n1 The Model\n\nThe Network. For simplicity, we take an edge map as input for the model, assuming that edges\nand T-junctions have already been detected. From this edge map we then create a Bayesian belief\nnetwork consisting of four hierarchical levels. At the input level the model receives evidence E\nfrom the image, consisting of local contour curvature and T-junctions. The nodes for this level are\nplaced at equidistant locations along the contour. At the \ufb01rst level the model estimates local border\nownership. The border ownership, or B-nodes at this level are at the same locations as the E-nodes,\nbut are connected to their nearest neighbors, and are the parent of the E-node at their location. (As a\nsimplifying assumption, such connections are broken at T-junctions in such a way that the occluded\ncontour is disconnected from the occluder.) The highest level has skeletal nodes, S, whose positions\nare de\ufb01ned by the circumcenters of the Delaunay triangulation on all the E-nodes, creating a coarse\nmedial axis skeleton [13]. Because of the structure of the Delaunay, each S-node is connected to\nexactly three E-nodes from which they receive information about the position and the local tangent\nof the contour. In the current state of the model the S-nodes are \u201cpassive\u201d, meaning their posteriors\nare computed before the model is initiated. Between the S nodes and the B nodes are the grouping\nnodes G. They have the same positions as the S-nodes and the same Delaunay connections, but to\nB-nodes that have the same image positions as the E-nodes. They will integrate information from\ndistant B-nodes, applying an interiority cue that is in\ufb02uenced by the local strength of skeletal axes\nas computed by the S-nodes (Fig. 1). Although this is a multiply connected network, we have found\nthat given reasonable parameters the model converges to intuitive posteriors for a variety of shapes\n(see below).\n\nUpdating. Our goal is to compute the posterior p(Bi|I), where I is the whole image. Bi is a\nbinary variable coding for the local direction of border ownership, that is, the side that owns the\nborder. In order for border ownership estimates to be in\ufb02uenced by image structure elsewhere in\nthe image, information has to propagate throughout the network. To achieve this propagation, we\nuse standard equations for node updating [14, 12]. However while to all other connections being\ndirected, connections at the B-node level are undirected, causing each node to be child and parent\nnode at the same time. Considering only the B-node level, a node Bi is only separated from the\nrest of the network by its two neighbors. Hence the Markovian property applies, in that Bi only\nneeds to get iterative information from its neighbors to eventually compute p(Bi|I). So consid-\nering the whole network, at each iteration t, Bi receives information from both its child, Ei and\nfrom its parents\u2014that is neigbouring nodes (Bi+1 and Bi\u22121)\u2014as well as all grouping nodes con-\nnected to it (Gj, ..., Gm). The latter encode for interiority versus exteriority, interiority meaning that\nthe B-node\u2019s estimated gural direction points towards the G-node in question, exteriority meaning\nthat it points away. Integrating all this information creates a multidimensional likelihood function:\np(Bi|Bi\u22121, Bi+1, Gj, ..., Gm). Because of its complexity we choose to approximate it (assuming\nall nodes are marginally independent of each other when conditioned on Bi) by\n\n2\n\n\fFigure 1: Basic network structure of the model. Both skeletal (S-nodes) and border-ownerhsip nodes\n(B-nodes) get evidence from E-nodes, though different types. S-nodes receive mere positional in-\nformation, while B-nodes receive information about local curvature and the presence of T-junctions.\nBecause of the structure of the Delaunay triangulation S-nodes and G-nodes (grouping nodes) al-\nways get input from exactly three nodes, respectively E and B-nodes. The gray color depicts the\nfact that this part of the network is computed before the model is initiated and does not thereafter\ninteract with the dynamics of the model.\n\np(Bi|Pj, ..., Pm) \u221d m(cid:89)\n\nj\n\np(Bi|Pj)\n\n(1)\n\nwhere the Pj\u2019s are the parents of Bi. Given this, at each iteration, each node Bi performs the\nfollowing computation:\n\nBel(Bi) \u2190 c\u03bb(Bi)\u03c0(Bi)\u03b1(Bi)\u03b2(Bi)\n\n(2)\n\nwhere conceptually \u03bb stands for bottom-up information, \u03c0 for top down information and \u03b1 and \u03b2\nfor information received from within the same level. More formally,\n\n\u03bb(Bi) \u2190 p(E|Bi)\n\n\u03c0(Bi) \u2190 m(cid:89)\n\n(cid:88)\n\np(Bi|Gj)\u03c0Gj (Bi)\n\nj\n\nGj\n\n(3)\n\n(4)\n\nand analogously to equation 4 for \u03b1(Bi) and \u03b2(Bi), which compute information coming from Bi\u22121\nand Bi+1 respectively. For these \u03c0Bi\u22121(Bi), \u03c0Bi+1(Bi), and \u03c0Gj (Bi):\n\n(cid:89)\n\n3\n\n\u03c0Gj (Bi) \u2190 c(cid:48)\u03c0(G)\n\u03c0Bi\u22121 (Bi) \u2190 c(cid:48)\u03b2(Bi\u22121)\u03bb(Bi\u22121)\u03c0(Bi\u22121)\n\n\u03bbBk (Gj)\n\nk(cid:54)=i\n\n(5)\n\n(6)\n\n\fand \u03c0Bi+1(Bi) is analogous to \u03c0Bi\u22121 (Bi), with c(cid:48) and c being normalization constants. Finally for\nthe G-nodes:\n\nBel(Gi) \u2190 c\u03bb(Gi)\u03c0(Gi)\n\u03bbBj (Gi)\n\n\u03bb(Gi) \u2190(cid:89)\n\u03bbBj (Gi) \u2190(cid:88)\n\n\u03bb(Bj)p(Bi|Gj)[\u03b1(Bj)\u03b2(Bj)\n\nj\n\nBj\n\nm(cid:89)\n\n(cid:88)\n\nk(cid:54)=i\n\nGk\n\np(Bi|Gk)\u03c0Gk (Bi)]\n\n(7)\n(8)\n\n(9)\n\nThe posteriors of the S-nodes are used to compute the \u03c0(Gi). This posterior computes how well\nthe S-node at each position explains the contour\u2014that is, how well it accounts for the cues \ufb02owing\nfrom the E-nodes it is connected to. Each Delaunay connection between S- and E-nodes can be\nseen as a rib that sprouts from the skeleton. More speci\ufb01cally each rib sprouts in a direction that is\nnormal (perpendicular) to the tangent of the contour at the E-node plus a random error \u03c6i chosen\nindependently for each rib from a von Mises distribution centered on zero, i.e. \u03c6i \u223c V (0, \u03baS) with\nspread parameter \u03baS [4]. The rib lengths are drawn from an exponential decreasing density function\np(\u03c1i) \u221d e\u2212\u03bbS \u03c1i [4]. We can now express how well this node \u201cexplains\u201d the three E-nodes it is\nconnected to via the probability that this S-node deserves to be a skeletal node or not,\n\np(S = true|E1, E2, E3) \u221d(cid:89)\n\np(\u03c1i)p(\u03c6i)\n\n(10)\n\ni\n\nwith S = true depicting that this S-node deserves to be a skeletal node. From this we then compute\nthe prior \u03c0(Gi) in such a way that good (high posterior) skeletal nodes induce a high interiority bias,\nhence a stronger tendency to induce \ufb01gural status. Conversely, bad (low posterior) skeletal nodes\ncreate a prior close to indifferent (uniform) and thus have less (or no) in\ufb02uence on \ufb01gural status.\n\nLikelihood functions Finally we need to express the likelihood function necessary for the updat-\ning rules described above. The \ufb01rst two likelihood functions are part of p(Ei|Bi), one for each of\nthe local cues. The \ufb01rst one, re\ufb02ecting local curvature, gives the probability of the orientations of\nthe two vectors inherent to Ei (\u03b11 and \u03b12) given both direction of \ufb01gure (\u03b8) encoded in Bi as a von\nMises density centered on \u03b8, i.e. \u03b1i \u223c V (\u03b8, \u03baEB). The second likelihood function, re\ufb02ecting the\npresence of a T-junction, simply assumes a \ufb01xed likelihood when a T-junction is present\u2014that is\np(T-junction = true|Bi) = \u03b8T , where Bi places the direction of \ufb01gure in the direction of the oc-\ncluder. This likelihood function is only in effect when a T-junction is present, replacing the curvature\ncue at that node.\nThe third likelihood function serves to keep consistency between nodes of the \ufb01rst level. This func-\ntion p(Bi|Bi\u22121) or p(Bi|Bi+1) is used to compute \u03b1(B) and \u03b2(B) and is de\ufb01ned 2x2 conditional\nprobability matrix with a single free parameter, \u03b8BB (the probability that \ufb01gural direction at both\nB-nodes are the same). A fourth and \ufb01nal likelihood function p(Bi|Gj) serves to propagate infor-\nmation between level one and two. This likelihood function is 2x2 conditional probability matrix\nmatrix with one free parameter, \u03b8BG. In this case \u03b8BG encodes the probability that the \ufb01gural direc-\ntion of the B-node is in the direction of the exterior or interior preference of the G-node. In total this\nbrings us to six free parameters in the model: \u03baS, \u03bbS, \u03baEB, \u03b8T , \u03b8BB, and \u03b8BG.\n\n2 Basic Simulations\n\nTo evaluate the performance of the model, we \ufb01rst tested it on several basic stimulus con\ufb01gurations\nin which the desired outcome is intuitively clear: a convex shape, a concave shape, a pair of over-\nlapping shapes, and a pair of non-overlapping shapes (Fig. 2,3). The convex shape is the simplest\nin that curvature never changes sign. The concave shape includes a region with oppositely signed\ncurvature. (The shape is naturally described as predominantly positively curved with a region of neg-\native curvature, i.e. a concavity. But note that it can also be interpreted as predominantly negatively\ncurved \u201cwindow\u201d with a region of positive curvature, although this is not the intuitive interpretation.)\n\n4\n\n\fThe overlapping pair of shapes consists of two convex shapes with one partly occluding the other,\ncreating a competition between the two shapes for the ownership of the common borderline. Finally\nthe non-overlapping shapes comprise two simple convex shapes that do not touch\u2014again setting up\na competition for ownership of the two inner boundaries (i.e. between each shape and the ground\nspace between them). Fig. 2 shows the network structures for each of these four cases.\n\nFigure 2: Network structure for the four shape categories (left to right: convex, concave, overlap-\nping, non-overlapping shapes). Blue depict the locations of the B-nodes (and also the E-nodes),\nthe red connections are the connections between B-nodes, the green connections are connections\nbetween B-nodes and G-nodes, and the G-nodes (and also the S-nodes) go from orange to dark red.\nThis colour code depicts low (orange) to high (dark red) probability that this is a skeletal node, and\nhence the strength of the interiority cue.\n\nRunning our model with hand-estimated parameter values yields highly intuitive posteriors (Fig. 3),\nan essential \u201csanity check\u201d to ensure that the network approximates human judgments in simple\ncases. For the convex shape the model assigns \ufb01gure to the interior just as one would expect even\nbased solely on local curvature (Fig. 3A). In the concave \ufb01gure (Fig. 3B), estimated border own-\nership begins to reverse inside the deep concavity. This may seem surprising, but actually closely\nmatches empirical results obtained when local border ownership is probed psychophysically inside\na similarly deep concavity, i.e. a \u201cnegative part\u201d in which f/g seems to partly reverse [8]. For the\noverlapping shapes posteriors were also intuitive, with the occluding shape interpreted as in front\nand owning the common border (Fig. 3C). Finally, for the two non-overlapping shapes the model\ncomputed border-ownership just as one would expect if each shape were run separately, with each\nshape treated as \ufb01gural along its entire boundary (Fig. 3D). That is, even though there is skeletal\nstructure in the ground-region between the two shapes (see Fig. 2D), its posterior is weak compared\nto the skeletal structure inside the shapes, which thus loses the competition to own the boundary\nbetween them.\nFor all these con\ufb01gurations, the model not only converged to intuitive estimates but did so rapidly\n(Fig. 4), always in fewer cycles than would be expected by pure lateral propagation, niterations <\nNnodes [18] (with these parameters, typically about \ufb01ve times faster).\n\nFigure 3: Posteriors after convergence for the four shape categories (left to right: convex, concave,\noverlapping, non-overlapping). Arrows indicate estimated border ownership, with direction pointing\nto the perceived \ufb01gural side, and length proportional to the magnitude of the posterior. All four\nsimulations used the same parameters.\n\n5\n\n\fFigure 4: Convergence of the model for the basic shape categories. The vertical lines represent the\npoint of convergence for each of the three shape categories. The posterior change is calculated as\n\n(cid:80)|p(Bi = 1|I)t \u2212 p(Bi = 1|I)t\u22121| at each iteration.\n\n3 Comparison to human data\n\nBeyond the simple cases reviewed above, we wished to submit our network to a more \ufb01ne-grained\ncomparison with human data. To this end we compared its performance to that of human subjects\nin an experiment we conducted (to be presented in more detail in a future paper). Brie\ufb02y, our\nexperiment involved \ufb01nding evidence for propagation of f/g signals across the image. Subjects were\n\ufb01rst shown a stimulus in which the f/g con\ufb01guration was globally and locally unambiguous and\nconsistent: a smaller rectangle partly occluding a larger one (Fig. 5A), meaning that the smaller\n(front) one owns the common border. Then this con\ufb01guration was perturbed by adding two bars,\nof which one induced a local f/g reversal\u2014making it now appear locally that the larger rectangle\nowned the border (Fig. 5B). (The other bar in the display does not alter f/g interpretation, but was\nincluded to control for the attentional affects of introducing a bar in the image.) The inducing bar\ncreates T-junctions that serve as strong local f/g cues, in this case tending to reverse the prior global\ninterpretation of the \ufb01gure. We then measured subjective border ownership along the central contour\nat various distances from the inducing bar, and at different times after the onset of the bar (25ms,\n100ms and 250ms). We measured border ownership locally using a method introduced in [8] in\nwhich a local motion probe is introduced at a point on the boundary between two color regions of\ndifferent colors, and the subject is asked which color appeared to move. Because the \ufb01gural side\n\u201cowns\u201d the border, the response re\ufb02ects perceived \ufb01gural status.\nThe goal of the experiment was to actually measure the progression of the in\ufb02uence of the inducing\nT-junction as it (hypothetically) propagated along the boundary. Brie\ufb02y, we found no evidence of\ntemporal differences, meaning that f/g judgments were essentially constant over time, suggesting\nrapid convergence of local f/g assignment.\n(This is consistent with the very rapid convergence\nof our network, which would suggest a lack of measurable temporal differences except at much\nshorter time scales than we measured.) But we did \ufb01nd a progressive reduction of f/g reversal with\nincreasing distance from the inducer\u2014that is, the in\ufb02uence of the T-junction decayed with distance.\nMean responses aggregated over subjects (shortest delay only) are shown in Fig. 6.\nIn order to run our model on this stimulus (which has a much more complex structure than the simple\n\ufb01gures tested above) we had to make some adjustments. We removed the bars from the edge map,\nleaving only the T-junctions as underlying cues. This was a necessary \ufb01rst step because our model is\nnot yet able to cope with skeletons that are split up by occluders. (The larger rectangle\u2019s skeleton has\nbeen split up by the lower bar.) In this way all contours except those created by the bars were used to\ncreate the network (Fig. 7). Given this network we ran the model using hand-picked parameters that\n\n6\n\n\fFigure 5: Stimuli used in the experiment. A. Initial stimulus with locally and globally consistent and\nunambiguous f/g. B. Subsequently bars were added of which one (the top bar in this case) created a\nlocal reversal of f/g. C. Positions at which local f/g judgments of subjects were probed.\n\nFigure 6: Results from our experiment aggregated for all 7 subjects (shortest delay only) are shown\nin red. The x-axis shows distance from the inducing bar at which f/g judgment was probed. The\ny-axis shows the proportion of trials on which subjects judged the smaller rectangle to own the\nboundary. As can be seen, the further from the T-junction, the lower the f/g reversal. The \ufb01tted\nmodel (green curve) shows very similar pattern. Horizontal black line indicates chance performance\n(ambiguous f/g).\n\ngave us the best possible qualitative similarity to the human data. The parameters used never entailed\ntotal elimination of the in\ufb02uence of any likelihood function (\u03baS = 16, \u03bbS = .025, \u03baEB = .5,\n\u03b8T = .9, \u03b8BB = .9, and \u03b8BG = .6). As can be seen in Fig. 6 the border-ownership estimates at\nthe locations where we had data show compelling similarities to human judgments. Furthermore\nalong the entire contour the model converged to intuitive border-ownership estimates (Fig. 7) very\nrapidly (within 36 iterations). The fact that our model yielded intuitive estimates for the current\nnetwork in which not all contours were completed shows another strength of our model. Because\nour model included grouping nodes, it did not require contours to be amodally completed [6] in\norder for information to propagate.\n\n4 Conclusion\n\nIn this paper we proposed a model rooted in Bayesian belief networks to compute \ufb01gure/ground.\nThe model uses both local and global cues, combined in a principled way, to achieve a stable and\napparently psychologically reasonable estimate of border ownership. Local cues included local\ncurvature and T-junctions, both well-established cues to f/g. Global cues included skeletal structure,\n\n7\n\n\fFigure 7: (left) Node structure for the experimental stimulus.\nownership estimates after convergence.\n\n(right) The model\u2019s local border-\n\na novel cue motivated by the idea that strongly axial shapes tend to be \ufb01gural and thus own their\nboundaries. We successfully tested this model on both simple displays, in which it gave intuitive\nresults, and on a more complex experimental stimulus, in which it gave a close match to the pattern\nof f/g propagation found in our subjects. Speci\ufb01cally, the model, like the human subjects rapidly\nconverged to a stable local f/g interpretation.\nOur model\u2019s structure shows several interesting parallels to properties of neural coding of border\nownership in visual cortex. Some cortical cells (end-stopped cells) appear to code for local curvature\n[3] and T-junctions [5]. The B-nodes in our model could be seen as corresponding to cells that code\nfor border ownership [20]. Furthermore, some authors [2] have suggested that recurrent feedback\nloops between border ownership cells in V2 and cells in V4 (corresponding to G-nodes in our model)\nplay a role in the rapid computation of border ownership. The very rapid convergence we observed\nin our model likewise appears to be due to the connections between B-nodes and G-nodes. Finally\nscale-invariant shape representations (such as, speculatively, those based on skeletons) are thought\nto be present in higher cortical regions such as IT [17], which project down to earlier areas in ways\nthat are not yet understood.\nA number of parallels to past models of f/g should be mentioned. Weiss [18] pioneered the appli-\ncation of belief networks to the f/g problem, though their network only considered a more restricted\nset of local cues and no global ones, such that information only propagated along the contour. Fur-\nthermore it has not been systematically compared to human judgments. Kogo et al. [10] proposed\nan exponential decay of f/g signals as they spread throughout the image. Our model has a similar\ndecay for information going through the G-nodes, though it is also in\ufb02uenced by an angular factor\nde\ufb01ned by the position of the skeletal node. Like the model by Li Zhaoping [19], our model includes\nhorizontal propagation between B-nodes, analogous to border-ownership cells in her model. A neu-\nrophysiological model by Craft et al. [2] de\ufb01nes grouping cells coding for an interiority preference\nthat decays with the size of the receptive \ufb01elds of these grouping cells. Our model takes this a step\nfurther by including shape (skeletal) structure as a factor in interiority estimates, rather than simply\nsize of receptive \ufb01elds (which is similar to the rib lengths in our model).\nCurrently, our use of skeletons as shape representations is still limited to medial axis skeletons and\nsurfaces that are not split up by occluders. Our future goals including integrating skeletons in a more\nrobust way following the probabilistic account suggested by Feldman and Singh [4]. Eventually, we\nhope to fully integrate skeleton computation with f/g computation so that the more general problem\nof shape and surface estimation can be approached in a coherent and uni\ufb01ed fashion.\n\n8\n\n\fReferences\n[1] P. Bahnsen.\n\nEine untersuchung uber\n\nsymmetrie und assymmetrie bei visuellen\n\nwahrnehmungen. Zeitschrift fur psychology, 108:129\u2013154, 1928.\n\n[2] E. Craft, H. Sch\u00a8utze, E. Niebur, and R. von der Heydt. A neural model of \ufb01gure-ground\n\norganization. Journal of Neurophysiology, 97:4310\u20134326, 2007.\n\n[3] A. Dobbins, S. W. Zucker, and M. S. Cyander. Endstopping and curvature. Vision Research,\n\n29:1371\u20131387, 1989.\n\n[4] J. Feldman and M. Singh. Bayesian estimation of the shape skeleton. Proceedings of the\n\nNational Academy of Sciences, 103:18014\u201318019, 2006.\n\n[5] B. Heider, V. Meskenaite, and E. Peterhans. Anatomy and physiology of a neural mechanism\nde\ufb01ning depth order and contrast polarity at illusory contours. European Journal of Neuro-\nscience, 12:4117\u20134130, 2000.\n\n[6] G. Kanizsa. Organization inVision. New York: Praeger, 1979.\n[7] G. Kanizsa and W. Gerbino. Vision and Artifact, chapter Convexity and symmetry in \ufb01gure-\n\nground organisation, pages 25\u201332. New York: Springer, 1976.\n\n[8] S. Kim and J. Feldman. Globally inconsistent \ufb01gure/ground relations induced by a negative\n\npart. Journal of Vision, 9:1534\u20137362, 2009.\n\n[9] K. Koffka. Principles of Gestalt Psychology. Lund Humphries, London, 1935.\n[10] N. Kogo, C. Strecha, L. Van Gool, and J. Wagemans.\n\nSurface construction by a 2-d\ndifferentiation-integration process: a neurocomputational model for perceived border own-\nership, depth, and lightness in kanizsa \ufb01gures. Psychological Review, 117:406\u2013439, 2010.\n\n[11] B. Machielsen, M. Pauwels, and J. Wagemans. The role of vertical mirror-symmetry in visual\n\nshape detection. Journal of Vision, 9:1\u201311, 2009.\n\n[12] K. Murphy, Y. Weiss, and M.I. Jordan. Loopy belief propagation for approximate inference:\n\nan empirical study. Proceedings of Uncertainty in AI, pages 467\u2013475, 1999.\n\n[13] R. L. Ogniewicz and O. K\u00a8ubler. Hierarchic Voronoi skeletons. Pattern Recognition, 28:343\u2013\n\n359, 1995.\n\n[14] J. Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Mor-\n\ngan Kaufmann, 1988.\n\n[15] M. A. Peterson and E. Skow.\n\nInhibitory competition between shape properties in \ufb01gure-\nground perception. Journal of Experimental Psychology: Human Perception and Performance,\n34:251\u2013267, 2008.\n\n[16] K. A. Stevens and A. Brookes. The concave cusp as a determiner of \ufb01gure-ground. Perception,\n\n17:35\u201342, 1988.\n\n[17] K. Tanaka, H. Saito, Y. Fukada, and M. Moriya. Coding visual images of object in the infer-\n\notemporal cortex of the macaque monkey. Journal of Neurophysiology, 66:170\u2013189, 1991.\n\n[18] Y. Weiss. Interpreting images by propagating Bayesian beliefs. Adv. in Neural Information\n\nProcessing Systems, 9:908915, 1997.\n\n[19] L. Zhaoping. Border ownership from intracortical interactions in visual area V2. Neuron,\n\n47(1):143\u2013153, Jul 2005.\n\n[20] H. Zhou, H. S. Friedman, and R. von der Heydt. Coding of border ownerschip in monkey\n\nvisual cortex. The Journal of Neuroscience, 20:6594\u20136611, 2000.\n\n9\n\n\f", "award": [], "sourceid": 993, "authors": [{"given_name": "Vicky", "family_name": "Froyen", "institution": null}, {"given_name": "Jacob", "family_name": "Feldman", "institution": null}, {"given_name": "Manish", "family_name": "Singh", "institution": null}]}