{"title": "Reinforcement Learning for Spoken Dialogue Systems", "book": "Advances in Neural Information Processing Systems", "page_first": 956, "page_last": 962, "abstract": null, "full_text": "Reinforcement Learning for \nSpoken Dialogue Systems \n\nSatinder Singh \n\nAT&T Labs \n\nMichael Keams \n\nAT&T Labs \n\nDiane Litman \n\nAT&T Labs \n\nMarilyn Walker \n\nAT&T Labs \n\n{baveja,mkeams,diane, walker} @research.att.com \n\nAbstract \n\nRecently, a number of authors have proposed treating dialogue systems as Markov \ndecision processes (MDPs). However, the practical application ofMDP algorithms \nto dialogue systems faces a number of severe technical challenges. We have built a \ngeneral software tool (RLDS, for Reinforcement Learning for Dialogue Systems) \nbased on the MDP framework, and have applied it to dialogue corpora gathered \nfrom two dialogue systems built at AT&T Labs. Our experiments demonstrate that \nRLDS holds promise as a tool for \"browsing\" and understanding correlations in \ncomplex, temporally dependent dialogue corpora. \n\n1 Introduction \n\nSystems in which human users speak to a computer in order to achieve a goal are called \nspoken dialogue systems. Such systems are some of the few realized examples of open(cid:173)\nended, real-time, goal-oriented interaction between humans and computers, and are therefore \nan important and exciting testbed for AI and machine learning research. Spoken dialogue \nsystems typically integrate many components, such as a speech recognizer, a database back(cid:173)\nend (since often the goal of the user is to retrieve information), and a dialogue strategy. In \nthis paper we are interested in the challenging problem of automatically inferring a good \ndialogue strategy from dialogue corpora. \n\nResearch in dialogue strategy has been perhaps necessarily ad-hoc due to the open-ended \nnature of dialogue system design. For example, a common and critical design choice is be(cid:173)\ntween a system that always prompts the user to select an utterance from fixed menus (system \ninitiative), and one that attempts to determine user intentions from unrestricted utterances \n(mixed initiative). Typically a system is built that explores a few alternative strategies, this \nsystem is tested, and conclusions are drawn regarding which of the tested strategies is best \nfor that domain [4, 7, 2]. This is a time-consuming process, and it is difficult to rigorously \ncompare and evaluate alternative systems in this fashion, much less design improved ones. \n\nRecently, a number of authors have proposed treating dialogue design in the formalism of \nMarkov decision processes (MDPs)[ 1, 3, 7]. In this view, the popUlation of users defines the \nstochastic environment, a dialogue system's actions are its (speech-synthesized) utterances \nand database queries, and the state is represented by the entire dialogue so far. The goal is \nto design a dialogue system that takes actions so as to maximize some measure of reward. \nViewed in this manner, it becomes possible, at least in principle, to apply the framework and \nalgorithms of reinforcement learning (RL) to find a good dialogue strategy. \n\nHowever, the practical application of RL algorithms to dialogue systems faces a number of \nsevere technical challenges. First, representing the dialogue state by the entire dialogue so \n\n\fReinforcement Learningfor Spoken Dialogue Systems \n\n957 \n\nfar is often neither feasible nor conceptually useful, and the so-called belief state approach \nis not possible, since we do not even know what features are required to represent the belief \nstate. Second, there are many different choices for the reward function, even among systems \nproviding very similar services to users. Previous work [7] has largely dealt with these issues \nby imposing a priori limitations on the features used to represent approximate state, and then \nexploring just one of the potential reward measures. \n\nIn this paper, we further develop the MDP formalism for dialogue systems, in a way that does \nnot solve the difficulties above (indeed, there is no simple \"solution\" to them), but allows us \nto attenuate and quantify them by permitting the investigation of different notions of approx(cid:173)\nimate state and reward. Using our expanded formalism, we give one of the first applications \nof RL algorithms to real data collected from multiple dialogue systems. We have built a gen(cid:173)\neral software tool (RLDS, for Reinforcement Learning for Dialogue Systems) based on our \nframework, and applied it to dialogue corpora gathered from two dialogue systems built at \nAT&T Labs, the TOOT system for voice retrieval of train schedule information [4] and the \nELVIS system for voice retrieval of electronic mail [7]. \n\nOur experiments demonstrate that RLDS holds promise not just as a tool for the end-to(cid:173)\nend automated synthesis of complicated dialogue systems from passive corpora -\na \"holy \ngrail\" that we fall far short of here 1 -\nbut more immediately, as a tool for \"browsing\" \nand understanding correlations in complex, temporally dependent dialogue corpora. Such \ncorrelations may lead to incremental but important improvements in existing systems. \n\n2 The TOOT and ELVIS Spoken Dialogue Systems \n\nThe TOOT and ELVIS systems were implemented using a general-purpose platform devel(cid:173)\noped at AT&T, combining a speaker-independent hidden Markov model speech recognizer, \na text-to-speech synthesizer, a telephone interface, and modules for specifying data-access \nfunctions and dialogue strategies. In TOOT, the data source is the Amtrak train schedule web \nsite, while in ELVIS, it is the electronic mail spool of the user. \n\nIn a series of controlled experiments with human users, dialogue data was collected from \nboth systems, resulting in 146 dialogues from TOOT and 227 dialogues from ELVIS. The \nTOOT experiments varied strategies for information presentation, confirmation (whether and \nhow to confirm user utterances) and initiative (system vs. mixed), while the ELVIS experi(cid:173)\nments varied strategies for information presentation, for summarizing email folders, and ini(cid:173)\ntiative. Each resulting dialogue consists of a series of system and user utterances augmented \nby observations derived from the user utterances and the internal state of the system. The \nsystem's utterances (actions) give requested information, ask for clarification, provide greet(cid:173)\nings or instructions, and so on. The observations derived from the user's utterance include \nthe speech-recognizer output, the corresponding log-likelihood score, the semantic labels as(cid:173)\nsigned to the recognized utterances (such as the desired train departure and arrival cities in \nTOOT, or whether the user prefers to hear their email ordered by date or sender in ELVIS); \nindications of user barge-ins on system prompts; and many more. The observations derived \nfrom the internal state include the grammar used by the speech recognizer during the tum, \nand the results obtained from a query to the data source. In addition, each dialogue has an \nassociated survey completed by the user that asks a variety of questions relating to the user's \nexperience. See [4, 7] for details. \n\n3 Spoken Dialogue Systems and MDPs \n\nGiven the preceding discussion, it is natural to formally view a dialogue as a sequence d \n\nd = (a 1, 01, rt), (a2' 02, r2), ... , (at, Ot, rt). \n\n--------------\nthe perfonnance of a new dialogue system [5]. \n\n1 However, in recent work we have applied the methodology described here to significantly improve \n\n\f958 \n\nS. Singh, M Kearns, D. Litman and M Walker \n\nHere ai is the action taken by the system (typically a speech-synthesized utterance, and less \nfrequently, a database query) to start the ith exchange (or tum, as we shall call it), OJ consists \nof all the observations logged by the system on this turn, as discussed in the last section, \nand rj is the reward received on this turn. As an example, in roOT a typical turn might \nindicate that the action aj was a system utterance requesting the departure city, and the 0; \nmight indicate several observations: that the recognized utterance was \"New York\", that the \nlog-likelihood of this recognition was -2.7, that there was another unrecognized utterance as \nwell, and so on. We will use d[ i] to denote the prefix of d that ends following the ith turn, and \nd\u00b7 (a, 0, r) to denote the one-turn extension of dialogue d by the turn (a, 0, r). The scope of \nthe actions aj and observations 0; is determined by the implementation of the systems (e.g. \nif some quantity was not logged by the system, we will not have access to it in the 0; in the \ndata). Our experimental results will use rewards derived from the user satisfaction surveys \ngathered for the roOT and ELVIS data \nWe may view any dialogue d as a trajectory in a well-defined true MOP M. The states 2 \nof M are all possible dialogues, and the actions are all the possible actions available to the \nspoken dialogue system (utterances and database queries). Now from any state (dialogue) d \nand action a, the only possible next states (dialogues) are the one-turn extensions d\u00b7 (a, 0, r). \nThe probability of transition from d to d\u00b7(a, 0, r) is exactly the probability, over the stochastic \nensemble of users, that 0 and r would be generated following action a in dialogue d. \n\nIt is in general impractical to work directly on M due to the unlimited size of the state (di(cid:173)\nalogue) space. Furthermore, M is not known in advance and would have to be estimated \nfrom dialogue corpora. We would thus like to permit a flexible notion of approximate states. \nWe define state estimator SE to be a mapping from any dialogue d into some space S. For \nexample, a simple state estimator for roOT might represent the dialogue state with boolean \nvariables indicating whether certain pieces of information had yet been obtained from the \nuser (departure and arrival cities, and so on), and a continuous variable tracking the average \nlog-likelihood of the recognized utterances so far. Then sE(d) would be a vector represent(cid:173)\ning these quantities for the dialogue d. Once we have chosen a state estimator SE, we can \ntransform the dialogue d into an S-trajectory, starting from the initial empty state So E S: \n\nSo -tal SE(d[l]) -ta2 sE(d[2]) -ta3 . .. -tat SE(d[t]) \n\nwhere the notation -tao SE(d[i]) indicates a transition to SE(d[i]) E S following action \naj. Given a set of dialogues d1 , .. . , dn , we can construct the empirical MOP MSE \u2022 The state \nspace of MSE is S, the actions are the same as in M, and the probability oftransition from s to \ns' under action a is exactly the empirical probability of such a transition in the S-trajectories \nobtained from d1 , .\u2022\u2022 ,dn . Note that we can build MSE from dialogue corpora, solve for its \noptimal policy, and analyze the resulting value function. \n\nThe point is that by choosing SE carefully, we hope that the empirical MOP MSE will be a \ngood approximation of M. By this we mean that MSE renders dialogues (approximately) \nMarkovian: the probability in M of transition from any dialogue d to anyone-turn extension \nd \u00b7 (a, 0, r) is (approximately) the probability of transition from sE(d) to sE(d \u00b7 (a, 0, r)) in \nMSE \u2022 We hope to find state estimators SE which render dialogues approximately Markovian, \nbut for which the amount of data and computation required to find good policies in MSE will \nbe greatly reduced compared to working directly in dialogue space. \n\nWhile conceptually appealing, this approach is subject to at least three important caveats: \nFirst, the approach is theoretically justified only to the extent that the chosen state estima(cid:173)\ntor renders dialogues Markovian. In practice, we hope that the approach is robust, in that \n\"small\" violations of the Markov property will still produce useful results. Second, while \n\n2These are not to be confused with the internal states of the spoken dialogue system(s) during the \n\ndialogue, which in our view merely contribute observations. \n\n\fReinforcement Learningfor Spoken Dialogue Systems \n\n959 \n\nstate estimators violating the Markov property may lead to meaningful insights, they can(cid:173)\nnot be directly compared. For instance, if the optimal value function derived from one state \nestimator is larger than the optimal value function for another state estimator, we cannot nec(cid:173)\nessarily conclude that the first is better than the second. (This can be demonstrated formally.) \nThird, even with a Markovian state estimator SE, data that is sparse with respect to SE limits \nthe conclusions we can draw; in a large space S, certain states may be so infrequently visited \nin the dialogue corpora that we can say nothing about the optimal policy or value function \nthere. \n\n4 The RLDS System \n\nWe have implemented a software tool (written in C) called RLOS that realizes the above \nformalism. RLOS users specify an input file of sample dialogues; the dialogues include the \nrewards received at each turn. Users also specify input files defining S and a state estimator \nSEe The system has command-line options that specify the discount factor to be used, and \na lower bound on the number of times a state s E S must be visited in order for it to be \nincluded in the empirical MOP USE (to control overfitting to sparse data). Given these inputs \nand options, RLOS converts the dialogues into S -trajectories, as discussed above. It then \nuses these trajectories to compute the empirical MOP USE specified by the data -\nthat is, \nthe data is used to compute next-state distributions and average reward in the obvious way. \nStates with too few visits are pruned from USE' RLOS then uses the standard value iteration \nalgorithm to compute the optimal policy and value function [6] for USE, all using the chosen \ndiscount factor. \n\n5 Experimental Results \n\nThe goal of the experiments reported below is twofold: first, to confirm that our RLOS \nmethodology and software produce intuitively sensible policies; and second, to use the value \nfunctions computed by the RLOS software to discover and understand correlations between \ndialogue properties and performance. We have space to present only a few of our many \nexperiments on TOOT and ELVIS data. \n\nEach experiment reported below involves choosing a state estimator, running RLOS using \neither the TOOT or ELVIS data, and then analyzing the resulting policy and value function. \nFor the TOOT experiments, the reward function was obtained from a question in the user \nsatisfaction survey: the last turn in a dialogue receives a reward of +1 if the user indicated \nthat they would use the system again, a reward of 0 if the user answered \"maybe\", and a \nreward of -1 if the user indicated that they would not use the system again . All turns other \nthan the last receive reward 0 (Le., a reward is received only at the end of a dialogue). For \nthe ELVIS experiments, we used a summed (over several questions) user-satisfaction score \nto reward the last turn in each dialogue (this score ranges between 8 and 40). \nExperiment 1 (A Sensible Policy): In this initial \"sanity check\" experiment, we created a \nstate estimator for TOOT whose boolean state variables track whether the system knows the \nvalue for the following five informational attributes: arrival city (denoted AC), departure city \n(DC), departure date (~O), departure hour (OH), and whether the hour is AM or PM (AP) 3 . \nThus, if the dialogue so far includes a turn in which TOOT prompts the user for their depar(cid:173)\nture city, and the speech recognizer matches the user utterance with \"New York\", the boolean \nstate variable GotOC? would be assigned a value of 1. Note that this ignores the actual values \nof the attributes. In addition, there is another boolean variable called ConfirmedAll? that is \nset to 1 if and only ifthe system took action ConfirmAll (which prompts the user to explicitly \nverify the attribute values perceived by TOOT) and perceived a \"yes\" utterance in response. \nThus. the state vector js sjmply the binary vector \n\n3Remember that TOOT can only track its perceptions of these attributes, since errors may have \n\noccurred in speech recognition. \n\n\f960 \n\nS. Singh, M Kearns, D. Litman and M Walker \n\n[ GotAC? \n\n, GotAP? \n\n, GotDC? \n\n, GotDD? \n\n, GotDH? \n\n, ConfirmedAll? ) \n\nAmong the actions (the system utterances) available to TOOT are prompts to the user to \nspecify values for these informational attributes; we shall denote these actions with labels \nAskOC, AskAC, AskOO, AskOH, and AskAP. The system takes several other actions that \nwe shall mention as they arise in our results. \nThe result of running RLOS was the following policy, where we have indicated the action to \nbe taken from each state: \n[1,0,1,0,0,0): AskAp \n[0,0,0,0,0,0): SayGreeting [1,0,0,0,0,0) : AskDC \n[0,0,0,1,1,0): AskAP \n[1,0,1,1,0,0): AskDH \n[1,0,0,1,1,0): AskAP \n[1, 1 , 0, 1 , 1, 0): AskAll [ 1, 0, 1, 1, 1, 0): AskAP \n[0, 1, 0, 1, 1, 0): AskAll \n[1,1,1,1,1,1): Close \n[1,1,1,1,1,0): ConfirmAll \n\nThus, RLOS finds a sensible policy, always asking the user for information which it has not \nalready received, confirming the user's choices when it has all the necessary information, and \nthen presenting the closest matching train schedule and closing the dialogue (action Close). \nNote that in some cases it chooses to ask the user for values for all the informational attributes \neven though it has values for some of them. It is important to emphasize that this policy was \nderived purely through the application of RLOS to the dialogue data, without any knowledge \nof the \"goal\" of the system. Furthermore, the TOOT data is such that the empirical MOP \nbuilt by RLOS for this state estimator does include actions considerably less reasonable than \nthose chosen above from many states. Examples include confirming the values of specific \ninformational attributes such as DC (since we do not represent whether such confirmations \nwere successful, this action would lead to infinite loops of confirmation), and requesting \nvalues for informational attributes for which we already have values (such actions appear \nin the empirical MOP due to speech recognition errors). The mere fact that RLOS was \ndriven to a sensible policy that avoided these available pitfalls indicates a correlation between \nthe chosen reward measure (whether the user would use the system again) and the intuitive \nsystem goal of obtaining a completely specified train trip. It is interesting to note that RLOS \nfinds it better to confirm values for all 5 attributes when it has them, as opposed to simply \nclosing the dialogue without confirmation. \n\nIn a similar experiment on ELVIS, RLOS again found a sensible policy that summarizes the \nuser's inbox at the beginning of the dialogue, goes on to read the relevant e-mail messages \nuntil done, and then closes. \n\n(a) \n\n(b) \n\n0..2<1 , - - - . . - - - - - , . . - - - - - , - - - - , - - - - , \n\n0.35,---.-----,..-----,..-----,----, \n\nI = Number of Information Attributes \n\nD = Number of Distress Features \n\n1=2 \n\n' \n\nII' \n\n0.12 \n\n0.1 \n\n0.08 \" \n\nC\u00b708CL-----''------'2-----'3--~------' \n\n~t=~~c===~2~====3==~==.--~ \n\nD= \n\nNumber of Attributes Confirmed \n\nNumber of Information Attributes \n\nFigure I: a) Role of Confirmation. b) Role of Distress Features (indicators that the dialogue is in \ntrouble). See description of Experiments 2 and 3 respectively in the text for details. \n\nExperiment 2 (Role of Confirmation): Here we explore the effect of confirming with the \nuser the values that TOOT perceives for the informational attributes -\nthat is, whether the \n\n0.24 \n\n0..22 \n\n0..2 \n\n0.18 \n\n!l \n01 0..'6 \n> \n\n0 .14 \n\n0..3 \n\n0.25 \n\n!l 0.2 \n01 r---~ \n> 0..'5 \n\n0.., , - - -__ -\n\n0.05 \n\nD=I \n\n---~-_/ \n\n\fReinforcement Learningfor Spoken Dialogue Systems \n\n961 \n\ntrade-off between the increased confidence in the utterance and the potential annoyance to \nthe user balances out in favor of confirmation or not (for the particular reward function we \nare using). To do so, we created a simple state estimator with just two state variables. The \nfirst variable counts the number of the informational attributes (DC, AC, etc.) that roar \nbelieves it has obtained, while the second variable counts the number of these that have been \nconfirmed with the user. Figure 1 (a) presents the optimal value as a function of the number of \nattributes confirmed. Each curve in the plot corresponds to a different setting of the first state \nvariable. For instance, the curve labeled with \"1=3\" corresponds to the states where the sys(cid:173)\ntem has obtained 3 informational attributes. We can make two interesting observations from \nthis figure. First, the value function grows roughly linearly with the number of confirmed \nattributes. Second, and perhaps more startlingly, the value function has only a weak depen(cid:173)\ndence on the first feature -\nconfirmed seems independent of how many attributes (the system believes) have been ob(cid:173)\ntained. This is evident from the lack of separation between the plots for varying values of the \nstate variable I. In other words, our simple (and preliminary) analysis suggests that for our \nreward measure, confirmed information influences the value function much more strongly \nthan unconfirmed information. We also repeated this experiment replacing attribute confir(cid:173)\nmation with thresholded speech recognition log-likelihood scores, and obtained qualitatively \nsimilar results. \n\nthe value for states when some number of attributes have been \n\nExperiment 3 (Role of Distress Features): Dialogues often contain timeouts (user silence \nwhen system expected response), resets (user asks for current context of dialogue to be aban(cid:173)\ndoned and the system is reinitialized), user requests for help, and other indicators that the \ndialogue is potentially in trouble. Do such events correlate with low value? We created a \nstate estimator for roar that, in addition to our variable I counting informational attributes, \ncounted the number of such distress events in the dialogue. Figure l(b) presents the optimal \nvalue as a function of the number of attributes obtained. Each curve corresponds to a differ(cid:173)\nent number of distress features. This figure confirms that the value of the dialogue is lower \nfor states with a higher number of distress features. \n\n(a) \n0.7 \n\n0.8 \n\n0.5 \n\n0\u00b0\u00b74 \n\n\" '\" > 0.3 \n\n0.2 \n\n0.1 \n\n0 \n0 \n\nT = Number of Turns \n\nT<4 \n\n4<=T<8 \n\n8 <=T< 12 \n\n12 <=T< 16 \n\n2 \n\n3 \n\nNumber of 1nfonnation Attributes \n\n(b) \n35 \n\n30 \n\n2S \n\n!! \n'\" > \n\n20 \n\n15 \n\n10 \n0 \n\nP=TaskProgrcss \n\n=3 \n\n2 \n8 \nNumber of Turns divided by 4 \n\n4 \n\n3 \n\n5 \n\nFigure 2: a) Role of Dialogue Length in roar. b) Role of Dialogue Length in ELVIS. See description \nof Experiment 4 in the text for details. \n\nExperiment 4 (Role of the Dialogue Length): All other things being equal (e.g. extent of \ntask completion), do users prefer shorter dialogues? To examine this question, we created a \nstate estimator for TOOT that counts the number of informational attributes obtained (vari(cid:173)\nable I as in Experiment 2), and a state estimator for ELVIS that measures \"task progress\" \n(a measure analogous to the variable I for roar; details omitted). In both cases, a second \nvariable tracks the length of the dialogue. \n\n\f962 \n\nS. Singh, M. Kearns, D. Litman and M. Walker \n\nFigure 2(a) presents the results for TOOT. It plots the optimal value as a function of the \nnumber I of informational values; each curve corresponds to a different range of dialogue \nlengths. It is immediately apparent that the longer the dialogue, the lower the value, and \nthat within the same length of dialogue it is better to have obtained more attributes 4. Of \ncourse, the effect of obtaining more attributes is weak for the longest dialogue length; these \nare dialogues in which the user is struggling with the system, usually due to multiple speech \nrecognition errors. \n\nFigure 2(b) presents the results for ELVIS from a different perspective. The dialogue length \nis now the x-axis, while each curve corresponds to a different value of P (task progress). It is \nimmediately apparent that the value increases with task progress. More interestingly, unlike \nTOOT, there seems to be an \"optimal\" or appropriate dialogue length for each level of task \nprogress, as seen in the inverse U-shaped curves. \nExperiment 5 (Role of Initiative): One ofthe important questions in dialogue theory is how \nto choose between system and mixed initiative strategies (cf. Section 1). Using our approach \non both TOOT and ELVIS data, we were able to confirm previous results [4, 7] showing that \nsystem initiative has a higher value than mixed initiative. \n\nExperiment 6 (Role of Reward Functions): To test the robustness of our framework, we \nrepeated Experiments 1-4 for TOOT using a new reward function based on the user's per(cid:173)\nceived task completion. We found that except for a weaker correlation between number of \nturns and value function, the results were basically the same across the two reward functions. \n\n6 Conclusion \n\nThis paper presents a new RL-based framework for spoken dialogue systems. Using our \nframework, we developed RLDS, a general-purpose software tool, and used it for empirical \nstudies on two sets of real dialogues gathered from the TOOT and ELVIS systems. Our \nresults showed that RLDS was able to find sensible policies, that in ELVIS there was an \n\"optimal\" length of dialogue, that in TOOT confirmation of attributes was highly correlated \nwith value, that system initiative led to greater user satisfaction than mixed initiative, and \nthat the results were robust to changes in the reward function. \nAcknowledgements: We give warm thanks to Esther Levin, David McAllester, Roberto \nPieraccini, and Rich Sutton for their many contributions to this work. \n\nReferences \n[1] A. W. Biennann and P. M. Long. The composition of messages in speech-graphics interactive sys(cid:173)\ntems. In Proceedings of the i996international Symposium on Spoken Dialogue. 97-100, 1996. \n[2] A. L. Gorin, B. A. Parker, R. M. Sachs and J. G. Wilpon. How May I Help You. In Proceedings \n\nof International Symposium on Spoken Dialogue. 57-60, 1996. \n\n[3] E. Levin, R. Pieraccini and W. Eckert. Learning dialogue strategies within the Markov decision \nprocess framework. In Proc. iEEE Workshop on Automatic Speech Recognition and Understand(cid:173)\ning 1997. \n\n[4] D. J. Litman and S. Pan. Empirically Evaluating an Adaptable Spoken Dialogue System. In Pro-\n\nceedings of the 7th International Conference on User Modeling 1999. \n\n[5) S. Singh, M. Kearns, D. Litman, and M. Walker. In preparation. \n[6] R. S. Sutton and A. G. Barto. ReinforcementLearning: An Introduction MIT Press, 1998. \n[7) M. A. Walker, J. C. Fromer and S. Narayanan. Learning Optimal Dialogue Strategies: A Case \nStudy of a Spoken Dialogue Agent for Email. In Proceedings of the 36th Annual Meeting of the \nAssociation of Computational Linguistics, COLINGIACL 98 1345-1352, 1998. \n\n4There is no contradiction with Experiment 2 in this statement, since here we are not separating \n\nconfirmed and unconfirmed attributes. \n\n\f", "award": [], "sourceid": 1775, "authors": [{"given_name": "Satinder", "family_name": "Singh", "institution": null}, {"given_name": "Michael", "family_name": "Kearns", "institution": null}, {"given_name": "Diane", "family_name": "Litman", "institution": null}, {"given_name": "Marilyn", "family_name": "Walker", "institution": null}]}