{"title": "A Bayesian Analysis of Dynamics in Free Recall", "book": "Advances in Neural Information Processing Systems", "page_first": 1714, "page_last": 1722, "abstract": "We develop a probabilistic model of human memory performance in free recall experiments. In these experiments, a subject first studies a list of words and then tries to recall them. To model these data, we draw on both previous psychological research and statistical topic models of text documents. We assume that memories are formed by assimilating the semantic meaning of studied words (represented as a distribution over topics) into a slowly changing latent context (represented in the same space). During recall, this context is reinstated and used as a cue for retrieving studied words. By conceptualizing memory retrieval as a dynamic latent variable model, we are able to use Bayesian inference to represent uncertainty and reason about the cognitive processes underlying memory. We present a particle filter algorithm for performing approximate posterior inference, and evaluate our model on the prediction of recalled words in experimental data. By specifying the model hierarchically, we are also able to capture inter-subject variability.", "full_text": "A Bayesian Analysis of Dynamics in Free Recall\n\nRichard Socher\n\nSamuel J. Gershman, Adler J. Perotte, Per B. Sederberg\n\nDepartment of Computer Science\n\nStanford University\nStanford, CA 94305\n\nrichard@socher.org\n\nDepartment of Psychology\n\nPrinceton University\nPrinceton, NJ 08540\n\n{sjgershm,aperotte,persed}@princeton.edu\n\nDavid M. Blei\n\nDepartment of Computer Science\n\nPrinceton University\nPrinceton, NJ 08540\n\nblei@cs.princeton.edu\n\nKenneth A. Norman\n\nDepartment of Psychology\n\nPrinceton University\nPrinceton, NJ 08540\n\nknorman@princeton.edu\n\nAbstract\n\nWe develop a probabilistic model of human memory performance in free recall\nexperiments. In these experiments, a subject \ufb01rst studies a list of words and then\ntries to recall them. To model these data, we draw on both previous psychological\nresearch and statistical topic models of text documents. We assume that memories\nare formed by assimilating the semantic meaning of studied words (represented\nas a distribution over topics) into a slowly changing latent context (represented\nin the same space). During recall, this context is reinstated and used as a cue for\nretrieving studied words. By conceptualizing memory retrieval as a dynamic latent\nvariable model, we are able to use Bayesian inference to represent uncertainty and\nreason about the cognitive processes underlying memory. We present a particle\n\ufb01lter algorithm for performing approximate posterior inference, and evaluate our\nmodel on the prediction of recalled words in experimental data. By specifying the\nmodel hierarchically, we are also able to capture inter-subject variability.\n\n1 Introduction\n\nModern computational models of verbal memory assume that the recall of items is shaped by their\nsemantic representations. The precise nature of this relationship is an open question. To address\nit, recent research has used information from diverse sources, such as behavioral data [14], brain\nimaging [13] and text corpora [8]. However, a principled framework for integrating these different\ntypes of information is lacking. To this end, we develop a model of human memory that encodes\nprobabilistic dependencies between multiple information sources and the hidden variables that couple\nthem. Our model lets us combine multiple sources of information and multiple related memory\nexperiments.\nOur model builds on the Temporal Context Model (TCM) of [10, 16]. TCM was developed to explain\nthe temporal structure of human behavior in free recall experiments, where subjects are presented\nwith lists of words (presented one at a time) and then asked to recall them in any order. TCM posits a\nslowly changing mental context vector whose evolution is driven by lexical input. At study, words\nare bound to context states through learning; during recall, context information is used as a cue\nto probe for stored words. TCM can account for numerous regularities in free recall data, most\nprominently the \ufb01nding that subjects tend to consecutively recall items that were studied close in\ntime to one another. (This effect is called the temporal contiguity effect.) TCM explains this effect\nby positing that recalling an item also triggers recall of the context state that was present when the\n\n1\n\n\fitem was studied; subjects can use this retrieved context state to access items that were studied close\nin time to the just-recalled item. The fact that temporal contiguity effects in TCM are mediated\nindirectly (via item-context associations) rather than directly (via item-item associations) implies that\ntemporal contiguity effects should persist when subjects are prevented from forming direct item-item\nassociations; for evidence consistent with this prediction, see [9].\nImportantly, temporal structure is not the only organizing principle in free recall data: Semantic\nrelatedness between items also in\ufb02uences the probability of recalling them consecutively [11].\nMoreover, subjects often recall semantically-related items that were not presented at study. (These are\ncalled extra-list intrusions; see [15].) To capture this semantic structure, we will draw on probabilistic\ntopic models of text documents, speci\ufb01cally latent Dirichlet allocation (LDA) [3]. LDA is an\nunsupervised model of document collections that represents the meaning of documents in terms of\na small number of \u201ctopics,\u201d each of which is a distribution over words. When \ufb01t to a corpus, the\nmost probable words of these distributions tend to represent the semantic themes (like \u201csports\u201d or\n\u201cchemistry\u201d) that permeate the collection. LDA has been used successfully as a psychological model\nof semantic representation [7].\nWe model free recall data by combining the underlying assumptions of TCM with the latent semantic\nspace provided by LDA. Speci\ufb01cally, we reinterpret TCM as a dynamic latent variable model where\nthe mental context vector speci\ufb01es a distribution over topics. In other words, the human memory\ncomponent of our model represents the drifting mental context as a sequence of mixtures of topics, in\nthe same way that LDA represents documents. With this representation, the dynamics of the mental\ncontext are determined by two factors: the posterior probability over topics given a studied or recalled\nword (semantic inference) and the retrieval of previous contexts (episodic retrieval). These dynamics\nlet us capture both the episodic and semantic structure of human verbal memory.\nThe work described here goes beyond prior TCM modeling work in two ways: First, our approach\nallows us to infer the trajectory of the context vector over time, which (in turn) allows us to predict\nthe item-by-item sequence of word recalls; by contrast, previous work (e.g., [10, 16]) has focused\non \ufb01tting the summary statistics of the data. Second, we model inter-subject variability using a\nhierarchical model speci\ufb01cation; this approach allows us to capture both common and idiosyncratic\nfeatures of the behavioral data.\nThe rest of the paper is organized as follows. In Section 2 we describe LDA and in Section 3 we\ndescribe our model, which we refer to as LDA-TCM. In Section 4 we describe a particle \ufb01lter for\nperforming posterior inference in this model. In Section 5.1 we present simulation results showing\nhow this model reproduces fundamental behavioral effects in free recall experiments. In Section\n5.2 we present inference results for a dataset collected by Sederberg and Norman in which subjects\nperformed free recall of words.\n\n2 Latent Dirichlet allocation\n\nOur model builds on probabilistic topic models, speci\ufb01cally latent Dirichlet allocation. Latent\nDirichlet allocation (LDA) is a probabilistic model of document collections [3]. LDA posits a set\nof K topics, each of which is a distribution over a \ufb01xed vocabulary, and documents are represented\nas mixtures over these topics. Thus, each word is assumed to be drawn from a mixture model with\ncorpus-wide components (i.e., the topics) and document-speci\ufb01c mixture proportions. When \ufb01t to a\ncollection of documents, the topic distributions often re\ufb02ect the themes that permeate the document\ncollection.\nMore formally, assume that there are K topics \u03b2k, each of which is a distribution over words. (We\nwill call the K \u00d7 W matrix \u03b2 the word distribution matrix.) For each document, LDA assumes the\nfollowing generative process:\n\n1. Choose topic proportions \u03b8 \u223c Dir(\u03b1).\n2. For each of the N words wn:\n\n(a) Choose a topic assignment zn \u223c Mult(\u03b8).\n(b) Choose a word wn \u223c Mult(\u03b2zn ).\n\n2\n\n\fFigure 1: A graphical model of LDA-TCM.\n\nGiven a collection of documents, posterior inference in LDA essentially reverses this process to\ndecompose the corpus according to its topics and \ufb01nd the corresponding distributions over words.\nPosterior inference is intractable, but many approximation algorithms have been developed [3, 7, 17].\nIn addition to capturing the semantic content of documents, recent psychological work has shown\nthat several aspects of LDA make it attractive as a model of human semantic representation [7]. In\nour model of memory, the topic proportions \u03c7play the role of a \u201cmental context\u201d that guides memory\nretrieval by parameterizing a distribution over words to recall.\n\n3 Temporal context and memory\n\nWe now turn to a model of human memory that uses the latent representation of LDA to capture the\nsemantic aspects of recall experiments. Our data consist of two types of observations: a corpus of\ndocuments from which we have obtained the word distribution matrix, 1 and behavioral data from\nfree recall experiments, which are studied and recalled words from multiple subjects over multiple\nruns of the experiment. Our goal is to model the psychological process of recall in terms of a drifting\nmental context.\nThe human memory component of our model is based on the Temporal Context Model (TCM). There\nare two core principles of TCM: (1) Memory retrieval involves reinstating a representation of context\nthat was active at the time of study; and (2) context change is driven by features of the studied stimuli\n[10, 16, 14]. We capture these principles by representing the mental context drift of each subject\nwith a trajectory of latent variables \u03c7n. Our use of the same variable name (\u03c7) and dimensionality\nfor the context vector and for topics re\ufb02ects our key assertion: Context and topics reside in the same\nmeaning space.\nThe relationship between context and topics is speci\ufb01ed in the generative process of the free recall\ndata. The generative process encompasses both the study phase and the recall phase of the memory\nexperiment. During study, the model speci\ufb01es the distribution of the trajectory of internal mental\ncontexts of the subject. (These variables are important in the next phase when recalling words\nepisodically.) First, the initial mental context is drawn from a Gaussian:\n\n(1)\nwhere s denotes the study phase and I is a K (cid:215) K identity matrix.2 Then, for each studied word the\nmental context drifts according to\n\n\u03c7s(cid:44)0 (cid:31) N (0(cid:44) \u03c6I)(cid:44)\n\nwhere\n\nhs(cid:44)n = (cid:29)1\u03c7s(cid:44)n(cid:31) 1 + (1 (cid:30) (cid:29)1) log((cid:152)ps(cid:44)n)(cid:46)\n\n\u03c7s(cid:44)n (cid:31) N (hs(cid:44)n(cid:44) \u03c6I)(cid:44)\n\n(2)\n\n(3)\n\n1For simplicity, we \ufb01x the word distribution matrix to one \ufb01t using the method of [3]. In future work, we will\nexplore how the data from the free recall experiment could be used to constrain estimates of the word distribution\nmatrix.\n\n2More precisely, context vectors are log-transformed topic vectors (see [1, 2]). When generating words from\n\nthe topics, we renormalize the context vector.\n\n3\n\n\fThis equation identi\ufb01es the two pulls on mental context drift when the subject is studying words: the\nprevious context vector \u03b8n\u22121 and \u02dcps,n \u221d \u03b2\u00b7,ws,n, the posterior probabilities of each topic given the\ncurrent word and the topic distribution matrix. This second term captures the idea that mental context\nis updated with the meaning of the current word (see also [2] for a related treatment of topic dynamics\nin the context of text modeling). For example, if the studied word is \u201cstocks\u201d then the mental context\nmight drift toward topics that also have words like \u201cbusiness\u201d, \u201c\ufb01nancial\u201d, and \u201cmarket\u201d with high\nprobability. (Note that this is where the topic model and memory model are coupled.) The parameter\n\u03b71 controls the rate of drift, while \u03c3 controls its noisiness.\nDuring recall, the model speci\ufb01es a distribution over drifting contexts and recalled words. For each\ntime t, the recalled word is assumed to be generated from a mixture of two components. Effectively,\nthere are two \u201cpaths\u201d to recalling a word: a semantic path and an episodic path.\nThe semantic path recalls words by \u201cfree associating\u201d according to the LDA generative process:\nUsing the current context as a distribution over topics, it draws a topic randomly and then draws a\nword from this topic (this is akin to thinking of a word that is similar in meaning to just-recalled\nwords). Formally, the probability of recalling a word via the semantic path is expressed as the\nmarginal probability of that word induced by the current context:\n\nPs(w) = \u03c0(\u03b8r,t) \u00b7 \u03b2\u00b7,w,\n\n(4)\nwhere \u03c0 is a function that maps real-valued vectors onto the simplex (i.e., positive vectors that sum to\none) and the index r denotes the recall phase.\nThe episodic path recalls words by drawing them exclusively from the set of studied words. This path\nputs a high probability on words that were studied in a context that resembles the current context\n(this is akin to remembering words that you studied when you were thinking about things similar to\nwhat you are currently thinking about). Formally, the episodic distribution over words is expressed as\na weighted sum of delta functions (each corresponding to a word distribution that puts all its mass on\na single studied word), where the weight for a particular study word is determined by the similarity\nof the context at recall to the state of context when the word was studied:\n\nPe(w) =\n\n,\n\n(5)\n\nut,w(cid:80)\n\ni ut,i\n\nwhere\n\nut =(cid:80)N\n\nn=1 \u03b4s,ws,n /d(\u03c0(\u03b8r,t), \u03c0(\u03b8s,n))\u0001.\n\nHere d(\u00b7,\u00b7) is a similarity function between distributions (here we use the negative KL-divergence)\nand \u0001 is a parameter controlling the curvature of the similarity function. We de\ufb01ne {\u03b4s,ws,n}N\nn=1 to\nbe delta functions de\ufb01ned at study words. Because people tend not to repeatedly recall words, we\nremove the corresponding delta function after a word is recalled.\nOur model assumes that humans use some mixture of these two paths, determined by mixing\nproportion \u03bb. Letting wr,t \u223c Mult(\u03c6t), we have\n\n\u03c6t(w) = \u03bbPs(w) + (1 \u2212 \u03bb)Pe(w).\n\n(6)\nIntuitively, \u03bb in Equation 6 controls the balance between semantic in\ufb02uences and episodic in\ufb02uences.\nWhen \u03bb approaches 1, we obtain a \u201cpure semantic\u201d model wherein words are recalled essentially by\nfree association (this is similar to the model used by [7] to model semantically-related intrusions in\nfree recall). When \u03bb approaches 0, we obtain a \u201cpure episodic\u201d model wherein words are recalled\nexclusively from the study list. An intermediate value of \u03bb is essential to simultaneously explaining\ntemporal contiguity and semantic effects in memory.\nFinally, the context drifts according to\n\n\u03b8r,t+1 \u223c N (hr,t, \u03c3I),\n\n(7)\n\nwhere\n\n(8)\nThis is similar to how context drifts in the study phase, except that the context is additionally pushed\nby the context that was present when the recalled word was studied. This is obtained mathematically\nby de\ufb01ning n(wr,t) to be a mapping from a recalled word to the index of the same word at study. For\n\nhr,t = \u03b72\u03b8r,t + \u03b73 log(\u02dcpr,t) + \u03b74\u03b8s,n(wr,t).\n\n4\n\n\fFigure 2: Simulated and empirical recall data. Data replotted from [9]. (Left) Probability of \ufb01rst\nrecall curve. (Right) Conditional response probability curve.\n\nP (\u03b7i\n\n1:4, \u03c3i, \u03bbi, \u0001i) = P (\u03b7i\n\nexample, if the recalled word is \u201ccat\u201d and cat was the sixth studied word then n(wr,t) = 6. If there\nis a false recall, i.e., the subject recalls a word that was not studied, then \u03b8s,n(wr,t) is set to the zero\nvector.\nThis generative model is depicted graphically in Figure 1, where \u2126 = {\u03b71:4, \u03c3, \u03bb, \u0001} represents the\nset of model parameters and \u039e is the set of hyperparameters.\nTo model inter-subject variability, we extend our model hierarchically, de\ufb01ning group-level prior\ndistributions from which subject-speci\ufb01c parameters are assumed to be drawn [6]. This approach\nallows for inter-subject variability and, at the same time, it allows us to gain statistical strength from\nthe ensemble by coupling subjects in terms of higher-level hyperparameters. We choose our group\nprior over subject i\u2019s parameters to factorize as follows:\n1)P (\u03b7i\n\n(9)\n2:4)P (\u03c3i)P (\u03bbi)P (\u0001i).\n2:4 \u223c\nIn more detail, the factors take on the following functional forms: \u03b7i\nDir(\u03c7), \u03c3i \u223c Exp(\u03bd), \u03bbi \u223c Beta(a, b), \u0001i \u223c Gamma(\u03b11, \u03b12). Except where mentioned otherwise,\nwe used the following hyperparameter values: a = b = c = d = 1, \u03c7 = [1, 1, 1], \u03b11 = 1, \u03b12 = 1.\nFor some model variants (described in Section 5.2) we set the parameters to a \ufb01xed value rather than\ninferring them.\nHere, we use the model to answer the following questions about behavior in free recall experiments:\n(1) Do both semantic and temporal factors in\ufb02uence recall, and if so what are their relative contri-\nbutions; (2) What are the relevant dimensions of variation across subjects? In our model, semantic\nand temporal factors exert their in\ufb02uence via the context vector, while variation across subjects is\nexpressed in the parameters drawn from the group prior. Thus, our goal in inference is to compute the\nposterior distribution over the context trajectory and subject-speci\ufb01c parameters, given a sequence\nof studied and recalled words. We can also use this posterior to make predictions about what words\nwill be recalled by a subject at each point during the recall phase. By comparing the predictive\nperformance of different model variants, we can examine what types of model assumptions (like the\nbalance between semantic and temporal factors) best capture human behavior.\n\n1 \u223c Beta(c, d), \u03b7i\n\n4 Inference\n\nWe now describe an approximate inference algorithm for computing the posterior distribution. Letting\n\u03b8 = {\u03b8s,0:N , \u03b8r,1:T , \u2126}, the posterior is:\nP (\u03b8|W) =\n\nP (wr,1:T|\u03b8s,1:N , \u03b8r,1:T , ws,1:N )P (\u03b8r,1:T|\u03b8s,1:N )P (\u03b8s,1:N|ws,1:N , \u03b8s,0)P (\u03b8s,0)P (\u2126)\n\n.\n\nP (ws,1:N , wr,1:T )\n\n(10)\nBecause computing the posterior exactly is intractable (the denominator involves a high-dimensional\nintegral that cannot be solved exactly), we approximate it with a set of C samples using the particle\n\ufb01lter algorithm [4], which can be summarized as follows. At time t > 0:\n\n5\n\n\fFigure 3: Factors contributing to context change during recall on a single list. (Left) Illustration of how\nthree successively recalled words in\ufb02uence context. Each column corresponds to a speci\ufb01c recalled\nword (shown in the top row). The bars in each cell correspond to individual topics (speci\ufb01cally,\nthese are the top ten inferred topics at recall; the center legend shows the top \ufb01ve words associated\nwith each topic). Arrows schematically indicate the \ufb02ow of in\ufb02uence between the components. The\ncontext vector at recall (Middle Row) is updated by the posterior over topics given the recalled word\n(Top Row) and also by retrieved study contexts (Bottom Row). (Right) Plot of the inferred context\ntrajectory at study and recall for a different list, in a 2-dimensional projection of the context space\nobtained by principal components analysis.\n\n1. Sample recall context \u03b8(c)\nt \u221d P\n\n2. Compute weights v(c)\n\nt\n\nusing (7).\nwr,t|\u03b8(c)\n\nr,t\n\n(cid:16)\n\n(cid:17)\n\nusing (6).\n\n3. Resample the particles according to their weights.\n\nUsing this sample-based approximation, the posterior is approximated as a sum of the delta functions\nplaced at the samples:\n\n(cid:16)\n\n\u03b8 \u2212 \u03b8(c)(cid:17)\n\n\u03b4\n\nC(cid:88)\n\nc=1\n\nP (\u03b8|W) \u2248 1\nC\n\n5 Results\n\n.\n\n(11)\n\nWe evaluate our model in two ways. First, we generate data from the generative model and record\na number of common psychological measurements to assess to what extent the model reproduces\nqualitative patterns of recall behavior. Second, we perform posterior inference and evaluate the\npredictive performance of the model on a real dataset gathered by Sederberg and Norman.\n\n5.1 Simulations\n\nFor the simulations, the following parameters were used: \u03b71 = 0.2, \u03b72 = 0.55, \u03b73 = 0.05, \u03c3 =\n0.00001, \u03bb = 0.2, \u0001 = 1.7. Note that these parameters have not been \ufb01t quantitatively to the data; here\nwe are simply trying to reproduce qualitative patterns. These values have been chosen heuristically\nwithout a systematic search through the parameter space. The results are averaged over 400 random\nstudy lists of 12 words each. In Figure 3, we compare our simulation results to data collected by [9].\nFigure 2 (left) shows the probability of \ufb01rst recall (PFR) curve, which plots the probability of each\nlist position being the \ufb01rst recalled word. This curve illustrates how words in later positions are more\nlikely to be recalled \ufb01rst, a consequence (in our model) of initializing the recall context with the last\nstudy context. Figure 2 (right) shows the lag conditional response probability (lag-CRP) curve, which\nplots the conditional probability of recalling a word given the last recalled word as a function of the\nlag (measured in terms of serial position) between the two. This curve demonstrates the temporal\n\n6\n\n\fFigure 4: (Left) Box-plot of average predictive log-probability of recalled words under different\nmodels. S: pure semantic model; E: pure episodic model. Green line indicates chance. See text for\nmore detailed descriptions of these models. (Right) Box-plot of inferred parameter values across\nsubjects.\n\ncontiguity effect observed in human recall behavior: the increased probability of recalling words that\nwere studied nearby in time to the last-recalled word. As in TCM, this effect is present in our model\nbecause items studied close in time to one another have similar context vectors; as such, cuing with\ncontextual information from time t will facilitate recall of other items studied in temporal proximity\nto time t.\n\n5.2 Modeling psychological data\n\nThe psychological data modeled here are from a not-previously-published dataset collected by\nSederberg and Norman. 30 participants studied 8 lists of words for a delayed free-recall task. Each\nlist was composed of 15 common nouns, chosen at random and without replacement from one of 28\ncategories, such as Musical Instruments, Sports, or Four-footed Animals. After \ufb01tting LDA to the\nTASA corpus [5], we ran the particle \ufb01lter with 1000 particles on the Sederberg and Norman dataset.\nOur main interest here is comparing our model (which we refer to as the semantic-episodic model)\nagainst various special hyperparameter settings that correspond to alternative psychological accounts\nof verbal memory. The models being compared include:\n\n1. Pure semantic: de\ufb01ned by drawing words exclusively from the semantic path, with \u03bb = 1.\nThis type of model has been used by [7] to examine semantic similarity effects in free recall.\n2. Pure episodic: de\ufb01ned by drawing words exclusively from the episodic path, with \u03bb = 0.\n3. Semantic-episodic: a = b = 1 (uniform beta prior on \u03bb). This corresponds to a model in\n\nwhich words are drawn from a mixture of the episodic and semantic paths.\n\nWe also compare against a null (chance) model in which all words in the vocabulary have an equal\nprobability of being recalled.\nAs a metric of model comparison, we calculate the model\u2019s predictive probability for the word\nrecalled at time t given words 1 to t \u2212 1, for all t:\n\nT(cid:88)\n\nt=1\n\n\u2212 log p(wr,t|wr,1:t\u22121, ws,1:N ).\n\n(12)\n\nThis metric is proportional to the accumulative prediction error [19], a variant of cross-validation\ndesigned for time series models.\nTo assure ourselves that the particle \ufb01lter we used does not suffer from weight degeneracy, we\nalso calculated the effective sample size, as recommended by [4]: ESS =\n.\nConventionally, it is desirable that the effective sample size is at least half the number of particles.\nThis desideratum was satis\ufb01ed for all the models we explored.\n\n(cid:0)v(c)(cid:1)2(cid:17)\u22121\n\n(cid:16)(cid:80)C\n\nc=1\n\n7\n\n\fBefore we present the quantitative results, it is useful to examine some examples of inferred context\nchange and how it interacts with word recall. Figure 3 shows the different factors at work in\ngenerating context change during recall on a single trial, illustrating how semantic inference and\nretrieved episodic memories combine to drive context change. The legend showing the top words in\neach topic illustrates how these topics appear to capture some of the semantic structure of the recalled\nwords. On the right of Figure 3, we show another representation of context change (from a different\ntrial), where the context trajectory is projected onto the \ufb01rst two principal components of the context\nvector. We can see from this \ufb01gure how recall involves reinstatement of studied contexts: Recalling a\nword pulls the inferred context vector in the direction of the (inferred) contextual state associated\nwith that word at study.\nFigure 4 (left) shows the average predictive log-probability of recalled words for the models described\nabove. Overall, the semantic-episodic model outperforms the pure episodic and pure semantic\nmodels in predictive accuracy (superiority over the closest competitor, the pure episodic model, was\ncon\ufb01rmed by a paired-sample t-test, with p < 0.002). To gain deeper insight into this pattern of\nresults, consider the behavior of the different \u201cpure\u201d models with respect to extra-list intrusions\nvs. studied list items. The pure episodic model completely fails to predict extra-list intrusions,\nbecause it restricts recall to the study list (i.e., it assigns zero predictive probability to extra-list\nitems). Conversely, the pure semantic model does a poor job of predicting recall of studied list items,\nbecause it does not scope recall to the study list. Thus, each of these models is hobbled by crucial (but\ncomplementary) shortcomings. The semantic-episodic model, by occupying an intermediate position\nbetween these two extremes, is able to capture both the semantic and temporal structure in free recall.\nOur second goal in inference was to examine individual differences in parameter \ufb01ts. Figure 4\n(right) shows box-plots of the different parameters. In some cases there is substantial variability\nacross subjects, such as for the similarity parameter \u0001. Another pattern to notice is that the values of\nthe episodic-semantic trade-off parameter \u03bb tend to cluster close to 0 (the episodic extreme of the\nspectrum), consistent with the fact that the pure episodic and semantic-episodic models are fairly\ncomparable in predictive accuracy. Future work will assess the extent to which these across-subject\ndifferences in parameter \ufb01ts re\ufb02ect stable individual differences in memory functioning.\n6 Discussion\nWe have presented here LDA-TCM, a probabilistic model of memory that integrates semantic and\nepisodic in\ufb02uences on recall behavior. By formalizing this model as a probabilistic graphical model,\nwe have provided a common language for developing and comparing more sophisticated variants. Our\nsimulation and empirical results show that LDA-TCM captures key aspects of the experimental data\nand provides good accuracy at making item-by-item recall predictions. The source code for learning\nand inference and the experimental datasets are available at www.cs.princeton.edu/\u02dcblei.\nThere are a number of advantages to adopting a Bayesian approach to modeling free recall behavior.\nFirst, it is easy to integrate more sophisticated semantic models such as hierarchical Dirichlet\nprocesses [18]. Second, hierarchical model speci\ufb01cation gives us the power to capture both common\nand idiosyncratic behavioral patterns across subjects, thereby opening a window onto individual\ndifferences in memory. Finally, this approach makes it possible to integrate other sources of data, such\nas brain imaging data. In keeping with the graphical model formalism, we plan to augment LDA-TCM\nwith additional nodes representing variables measured with functional magnetic resonance imaging\n(fMRI). Existing studies have used fMRI data to decode semantic states in the brain [12] and predict\nrecall behavior at the level of semantic categories [13]. Incorporating fMRI data into the model will\nhave several bene\ufb01ts: The fMRI data will serve as an additional constraint on the inference process,\nthereby improving our ability to track subjects\u2019 mental states during encoding and recall; fMRI will\ngive us a new way of validating the model \u2013 we will be able to measure the model\u2019s ability to predict\nboth brain states and behavior; also, by examining the relationship between latent context states and\nfMRI data, we will gain insight into how mental context is instantiated in the brain.\n\nAcknowledgements\n\nRS acknowledges support from the Francis Robbins Upton Fellowship and the ERP Fellowship. This\nwork was done while RS was at Princeton University. PBS acknowledges support from National\nInstitutes of Health research grant MH080526.\n\n8\n\n\fReferences\n[1] J. Aitchison. The statistical analysis of compositional data. Journal of the Royal Statistical\n\nSociety. Series B (Methodological), pages 139\u2013177, 1982.\n\n[2] D.M. Blei and J.D. Lafferty. Dynamic topic models. In Proceedings of the 23rd international\n\nconference on Machine learning, pages 113\u2013120. ACM New York, NY, USA, 2006.\n\n[3] D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent dirichlet allocation. Journal of Machine Learning\n\nResearch, 3:993\u20131022, 2003.\n\n[4] A. Doucet and N. De Freitas. Sequential Monte Carlo Methods in Practice. Springer, 2001.\n[5] ST Dumais and TK Landauer. A solution to Platos problem: The latent semantic analysis theory\nof acquisition, induction and representation of knowledge. Psychological Review, 104:211\u2013240,\n1997.\n\n[6] A. Gelman and J. Hill. Data analysis using regression and multilevel/hierarchical models.\n\nCambridge University Press, 2007.\n\n[7] T.L. Grif\ufb01ths, M. Steyvers, and J.B. Tenenbaum. Topics in semantic representation. Psychologi-\n\ncal Review, 114(2):211\u2013244, 2007.\n\n[8] M.W. Howard, B. Jing, K.M. Addis, and M.J. Kahana. Semantic structure and episodic memory.\n\nHandbook of Latent Semantic Analysis, pages 121\u2013142, 2007.\n\n[9] M.W. Howard and M.J. Kahana. Contextual variability and serial position effects in free recall.\n\nJournal of Experimental Psychology: Learning, Memory, and Cognition, 25(4):923, 1999.\n\n[10] M.W. Howard and M.J. Kahana. A distributed representation of temporal context. Journal of\n\nMathematical Psychology, 46:269\u2013299, 2002.\n\n[11] M.W. Howard and M.J. Kahana. When does semantic similarity help episodic retrieval? Journal\n\nof Memory and Language, 46(1):85\u201398, 2002.\n\n[12] T.M. Mitchell, S.V. Shinkareva, A. Carlson, K. Chang, V.L. Malave, R.A. Mason, and M.A.\nJust. Predicting human brain activity associated with the meanings of nouns. Science,\n320(5880):1191\u20131195, 2008.\n\n[13] S.M. Polyn, V.S. Natu, J.D. Cohen, and K.A. Norman. Category-speci\ufb01c cortical activity\n\nprecedes retrieval during memory search. Science, 310(5756):1963\u20131966, 2005.\n\n[14] S.M. Polyn, K.A. Norman, and M.J. Kahana. A context maintenance and retrieval model of\n\norganizational processes in free recall. Psychological Review, 116(1):129, 2009.\n\n[15] H.L. Roediger and K.B. McDermott. Creating false memories: Remembering words not\npresented in lists. Journal of Experimental Psychology Learning Memory and Cognition,\n21:803\u2013803, 1995.\n\n[16] P.B. Sederberg, M.W. Howard, and M.J. Kahana. A context-based theory of recency and\n\ncontiguity in free recall. Psychological Review, 115(4):893\u2013912, 2008.\n\n[17] Y. Teh, D. Newman, and M. Welling. A collapsed variational Bayesian inference algorithm for\n\nlatent Dirichlet allocation. In Neural Information Processing Systems, 2006.\n\n[18] Y.W. Teh, M.I. Jordan, M.J. Beal, and D.M. Blei. Hierarchical dirichlet processes. Journal of\n\nthe American Statistical Association, 101(476):1566\u20131581, 2006.\n\n[19] E.J. Wagenmakers, P. Gr\u00a8unwald, and M. Steyvers. Accumulative prediction error and the\n\nselection of time series models. Journal of Mathematical Psychology, 50(2):149\u2013166, 2006.\n\n9\n\n\f", "award": [], "sourceid": 351, "authors": [{"given_name": "Richard", "family_name": "Socher", "institution": null}, {"given_name": "Samuel", "family_name": "Gershman", "institution": null}, {"given_name": "Per", "family_name": "Sederberg", "institution": null}, {"given_name": "Kenneth", "family_name": "Norman", "institution": null}, {"given_name": "Adler", "family_name": "Perotte", "institution": null}, {"given_name": "David", "family_name": "Blei", "institution": null}]}