{"title": "Combining Dimensions and Features in Similarity-Based Representations", "book": "Advances in Neural Information Processing Systems", "page_first": 67, "page_last": 74, "abstract": "", "full_text": "Combining Dimensions and Features in\n\nSimilarity-Based Representations\n\nDaniel J. Navarro\n\nDepartment of Psychology\n\nOhio State University\nnavarro.20@osu.edu\n\nMichael D. Lee\n\nDepartment of Psychology\n\nUniversity of Adelaide\n\nmichael.lee@psychology.adelaide.edu.au\n\nAbstract\n\nThis paper develops a new representational model of similarity data\nthat combines continuous dimensions with discrete features. An al-\ngorithm capable of learning these representations is described, and\na Bayesian model selection approach for choosing the appropriate\nnumber of dimensions and features is developed. The approach is\ndemonstrated on a classic data set that considers the similarities\nbetween the numbers 0 through 9.\n\n1 Introduction\n\nA central problem for cognitive science is to understand the way people mentally\nrepresent stimuli. One widely used approach for deriving representations from data\nis to base them on measures of stimulus similarity (see Shepard 1974). Similarity\nis naturally understood as a measure of the degree to which the consequences of\none stimulus generalize to another, and may be measured using a number of experi-\nmental methodologies, including ratings scales, confusion probabilities, or grouping\nor sorting tasks. For a domain with n stimuli, similarity data take the form of an\nn \u00a3 n matrix, S = [sij], where sij is the similarity of the ith and jth stimuli. The\ngoal of similarity-based representation is then to (cid:222)nd structured and interpretable\ndescriptions of the stimuli that capture the pattern of similarities.\n\nModeling the similarities between stimuli requires making assumptions about both\nthe representational structures used to describe stimuli, and the processes used to\nassess the similarities across these structures. The two best developed represen-\ntational approaches in cognitive modeling are the (cid:145)dimensional(cid:146) and (cid:145)featural(cid:146) ap-\nproaches (Goldstone, 1999). In the dimensional approach, stimuli are represented by\ncontinuous values along a number of dimensions, so that each stimulus corresponds\nto a point in a multi-dimensional space, and the similarity between two stimuli is\nmeasured according to the distance between their representative points. In the fea-\ntural approach, stimuli are represented in terms of the presence or absence of a set\nof discrete (usually binary) features or properties, and the similarity between two\nstimuli is measured according to their common and distinctive features.\n\nThe dimensional and featural approaches have di!erent strengths and weaknesses.\nDimensional representations are constrained by the metric axioms, such as the tri-\n\n\fangle inequality, that are violated by some empirical data. Featural representations\nare ine\"cient when representing inherently continuous aspects of the variation be-\ntween stimuli. It has been argued that spatial representations are most appropriate\nfor low-level perceptual stimuli, whereas featural representations are better suited to\nhigh-level conceptual domains (e.g., Carroll 1976, Tenenbaum 1996, Tversky 1977).\nIn general, though, stimuli convey both perceptual and conceptual information. As\nCarroll (1976) concludes: (cid:147)Since what is going on inside the head is likely to be\ncomplex, and is equally likely to have both discrete and continuous aspects, I believe\nthe models we pursue must also be complex, and have both discrete and continuous\ncomponents(cid:148) (p. 462).\n\nThis paper develops a new model of similarity that combines dimensions with fea-\ntures in the obvious way, allowing a stimulus to take continuous values on a number\nof dimensions, as well as potentially having a number of discrete features. We de-\nscribe an algorithm capable of learning these representations from similarity data,\nand develop a Bayesian model selection approach for choosing the appropriate num-\nber of dimensions and features. Finally, we demonstrate the approach on a classic\ndata set that considers the similarities between the numbers 0 through 9.\n\n2 Dimensional, Featural and Combined Representations\n\n2.1 Dimensional Representation\n\nIn a dimensional representation, the ith stimulus is represented by a point pi =\n(pi1, . . . , piv) in a v-dimensional coordinate space. The dissimilarity between the\nith and jth stimuli is then usually modeled as the distance between their points\naccording to one of the family of Minkowskian metrics\n\n(cid:136)dij =\u02c6 vXk=1\n\njpik \u00a1 pjkjr! 1\n\nr\n\n+ c,\n\n(1)\n\nwhere c is a non-negative constant. Dimensional representations can be learned us-\ning a variety of multidimensional scaling algorithms (e.g., Cox & Cox, 1994), which\nhave placed particular emphasis on the r = 1 (City-Block) and r = 2 (Euclidean)\ncases because of their relationship, respectively, to so-called (cid:145)separable(cid:146) and (cid:145)inte-\ngral(cid:146) stimulus dimensions (Garner 1974). Pairs of separable dimensions are those,\nlike shape and size, that can be attended to separately.\nIntegral dimensions, in\ncontrast, are those rarer cases like hue and saturation that are not easily separated.\n\n2.2 Featural Representation\n\nIn a featural representation, the ith stimulus is represented by a vector of m bi-\nnary variables fi = (fi1, . . . , fim), where fik = 1 if the ith stimulus possesses the\nkth feature, and fik = 0 if it does not. Each feature is also usually associated\nwith a positive weight, wk, denoting its importance or salience. No constraints are\nplaced on the way features may be assigned to stimuli. Rather than requiring fea-\ntures partition stimuli, as in many clustering methods, or that features nest within\none another, as in many tree-(cid:222)tting methods, the (cid:223)exible nature of human mental\nrepresentation demands that features are allowed to overlap in arbitrary ways.\n\nAlthough a number of models have been proposed for measuring the similarity\nbetween featurally represented stimuli (Navarro & Lee, 2002), the most widely used\nis the Contrast Model (Tversky, 1977). The Contrast Model assumes the similarity\n\n\fbetween two stimuli increases according to the weights of the (common) features\nthey share, decreases according to the weights of the (distinctive) features that one\nhas but the other does not, and these common and distinctive sources of information\nare themselves weighted in arriving at a (cid:222)nal similarity value. Particular emphasis\n(e.g., Shepard & Arabie, 1979; Tenenbaum, 1996) has been given to the special case\nof the Contrast Model where only common features are used, and feature weights\nare additive, so that the similarity of the ith and jth stimuli is given by\n\n(cid:136)sij =\n\nmXk=1\n\nwkfikfjk + c.\n\n(2)\n\nAlthough learning common feature representations is a di\"cult combinatorial op-\ntimization problem, several successful additive clustering algorithms have been de-\nveloped (e.g., Lee, 2002; Ruml, 2001; Tenenbaum, 1996).\n\n2.3 Combined Representation\n\nThe obvious generalization of dimensional and featural approaches is to represent\nstimuli in terms of continuous values along a set of dimensions and the presence or\nabsence of a number of discrete features. If there are v dimensions and m features,\nthe ith stimulus is de(cid:222)ned by a point pi, a feature vector fi, and the feature weights\nw = (w1, . . . , wm).\n\nWith this representational structure in place, we assume the similarity between\nthe ith and jth stimuli is then simply the sum of the similarity arising from their\ncommon features (Eq. 2), minus the dissimilarity arising from their dimensional\ndi!erences (Eq. 1), as follows\n\n(cid:136)sij =\u02c6 mXk=1\n\nwkfikfjk! \u00a1\u02c6 vXk=1\n\njpik \u00a1 pjkjr! 1\n\nr\n\n+ c.\n\n3 Model Fitting and Selection\n\nProposing the combined representational approach immediately presents two chal-\nlenges. The (cid:222)rst model (cid:222)tting problem is to develop a method for learning rep-\nresentations that (cid:222)t the similarity data well using a given number of dimensions\nand features. The second model selection problem is to choose between alternative\ncombined representations of the same data that use di!erent numbers of features\nand dimensions.\n\nFormally, we conceive of the representational model as specifying the number of\ndimensions and features and the nature of the distance metric, and being para-\nmeterized by the feature variables and weights, coordinate locations and the ad-\nditive constant. This means a particular representation is given by R! (!) where\n\" = (v, m, r) and ! = (p1, . . . , pn,f 1, . . . , fn,w , c).\n\nFollowing Tenenbaum (1996), we assume that the observed similarities come from\nindependent Gaussian distributions with means sij and common variance #. The\nvariance corresponds to the precision of the data which, for empirical similarity\ndata averaged across information sources (such as individual participants) is easily\nestimated (Lee 2001), and otherwise must be speci(cid:222)ed by assumption.\n\nUnder these assumptions, the likelihood of a similarity matrix given a particular\n\n\frepresentation is\n\np (S j R!, !) = Yi