{"title": "Coarticulation in Markov Decision Processes", "book": "Advances in Neural Information Processing Systems", "page_first": 1137, "page_last": 1144, "abstract": null, "full_text": " Coarticulation in Markov Decision\n Processes\n\n\n\n Khashayar Rohanimanesh Robert Platt\n Department of Computer Science Department of Computer Science\n University of Massachusetts University of Massachusetts\n Amherst, MA 01003 Amherst, MA 01003\n khash@cs.umass.edu rplatt@cs.umass.edu\n\n\n Sridhar Mahadevan Roderic Grupen\n Department of Computer Science Department of Computer Science\n University of Massachusetts University of Massachusetts\n Amherst, MA 01003 Amherst, MA 01003\n mahadeva@cs.umass.edu grupen@cs.umass.edu\n\n\n\n Abstract\n\n We investigate an approach for simultaneously committing to mul-\n tiple activities, each modeled as a temporally extended action in\n a semi-Markov decision process (SMDP). For each activity we de-\n fine a set of admissible solutions consisting of the redundant set of\n optimal policies, and those policies that ascend the optimal state-\n value function associated with them. A plan is then generated by\n merging them in such a way that the solutions to the subordinate\n activities are realized in the set of admissible solutions satisfying\n the superior activities. We present our theoretical results and em-\n pirically evaluate our approach in a simulated domain.\n\n\n1 Introduction\n\nMany real-world planning problems involve concurrent optimization of a set of prior-\nitized subgoals of the problem by dynamically merging a set of (previously learned)\npolicies optimizing the subgoals. A familiar example of this type of problem would\nbe a driving task which may involve subgoals such as safely navigating the car, talk-\ning on the cell phone, and drinking coffee, with the first subgoal taking precedence\nover the others. In general this is a challenging problem, since activities often have\nconflicting objectives and compete for limited amount of resources in the system.\nWe refer to the behavior of an agent that simultaneously commits to multiple ob-\njectives as Coarticulation, inspired by the coarticulation phenomenon in speech.\nIn this paper we investigate a framework based on semi-Markov decision processes\n(SMDPs) for studying this problem. We assume that the agent has access to a set\nof learned activities modeled by a set of SMDP controllers = {C1, C2, . . . , Cn} each\nachieving a subgoal i from a set of subgoals = {1, 2, . . . , n}. We further\nassume that the agent-environment interaction is an episodic task where at the be-\n\n\f\nginning of each episode a subset of subgoals are introduced to the agent,\nwhere subgoals are ranked according to some priority ranking system. The agent\nis to devise a global policy by merging the policies associated with the controllers\ninto a global policy that simultaneously commits to them according to their degree\nof significance. In general optimal policies of controllers do not offer flexibility re-\nquired for the merging process. Thus for every controller we also compute a set of\nadmissible suboptimal policies that reflect the degree of flexibility we can afford in\nit. Given a controller, an admissible policy is either an optimal policy, or it is a pol-\nicy that ascends the optimal state-value function associated with the controller (i.e.,\nin average leads to states with higher values), and is not too off from the optimal\npolicy. To illustrate this idea, consider Figure 1(a) that shows a two dimensional\n\n C1 C2\n C\n\n\n\n b c\n c a b\n d a\n S\n S\n\n (a) (b)\n\nFigure 1: (a) actions a, b, and c are ascending on the state-value function associated\nwith the controller C, while action d is descending; (b) action a and c ascend the\nstate-value function C1 and C2 respectively, while they descend on the state-value\nfunction of the other controller. However action b ascends the state-value function\nof both controllers.\n\nstate-value function. Regions with darker colors represents states with higher val-\nues. Assume that the agent is currently in state marked s. The arrows show the\ndirection of state transition as a result of executing different actions, namely actions\na, b, c, and d. The first three actions lead the agent to states with higher values, in\nother words they ascend the state-value function, while action d descends it. Fig-\nure 1(b) shows how introducing admissible policies enables simultaneously solving\nmultiple subgoals. In this figure, action a and c are optimal in controllers C1 and C2\nrespectively, but they both descend the state-value function of the other controller.\nHowever if we allow actions such as action b, we are guaranteed to ascend both\nvalue functions, with a slight degeneracy in optimality.\nMost of the related work in the context of MDPs assume that the subprocesses\nmodeling the activities are additive utility independent [1, 2] and do not address\nconcurrent planning with temporal activities. In contrast we focus on problems that\ninvolve temporal abstraction where the overall utility function may be expressed as\na non-linear function of sub-utility functions that have different priorities. Our ap-\nproach is also similar in spirit to the redundancy utilization formalism in robotics\n[4, 3, 6]. Most of these ideas, however, have been investigated in continuous domains\nand have not been extended to discrete domains. In contrast we focus on discrete\ndomains modeled as MDPs.\nIn this paper we formally introduce the framework of redundant controllers in terms\nof the set of admissible policies associated with them and present an algorithm for\nmerging such policies given a coarticulation task. We also present a set of theoreti-\ncal results analyzing various properties of such controllers, and also the performance\nof the policy merging algorithm. The theoretical results are complemented by an\nexperimental study that illustrates the trade-offs between the degree of flexibility\nof controllers and the performance of the policy generated by the merging process.\n\n\f\n2 Redundant Controllers\n\nIn this section we introduce the framework of redundant controllers and formally\ndefine the set of admissible policies in them. For modeling controllers, we use\nthe concept of subgoal options [7]. A subgoal option can be viewed as a closed\nloop controller that achieves a subgoal of some kind. Formally, a subgoal option\nof an MDP M = S, A, P, R is defined by a tuple C = MC, I, . The MDP\nMC = SC, AC, PC, RC is the option MDP induced by the option C in which\nSC S, AC A, PC is the transition probability function induced by P, and RC is\nchosen to reflect the subgoal of the option. The policy component of such options\nare the solutions to the option MDP MC associated with them. For generality,\nthroughout this paper we refer to subgoal options simply as controllers.\nFor theoretical reasons, in this paper we assume that each controller optimizes a\nminimum cost-to-goal problem. An MDP M modeling a minimum cost-to-goal\nproblem includes a set of goal states SG S. We also represent the set of non-goal\nstates by \n SG = S - SG. Every action in a non-goal state incurs some negative\nreward and the agent receives a reward of zero in goal states. A controller C is a\nminimum cost-to-goal controller, if MC optimizes a minimum cost-to-goal problem.\nThe controller also terminates with probability one in every goal state. We are\nnow ready to formally introduce the concept of ascending policies in an MDP:\nDefinition 1: Given an MDP M = S, A, P, R , a function L : S IR, and\na deterministic policy : S A, let (s) = E {L(s )} - L(s), where\n s P(s)\n s\nE {.} is the expectation with respect to the distribution over next states\n s P(s)\n s\ngiven the current state and the policy . Then is ascending on L, if for every\nstate s (except for the goal states if the MDP models a minimum cost-to-goal\nproblem) we have (s) > 0.\nFor an ascending policy on a function L, function : S IR+ gives a strictly\npositive value that measures how much the policy ascends on L in state s. A\ndeterministic policy is descending on L, if for some state s, (s) < 0. In general\nwe would like to study how a given policy behaves with respect to the optimal\nvalue function in a problem. Thus we choose the function L to be the optimal\nstate value function (i.e., V). The above condition can be interpreted as follows:\nwe are interested in policies that in average lead to states with higher values,\nor in other words ascend the state-value function surface. Note that Definition\n1 is closely related to the Lyapunov functions introduced in [5]. The minimum\nand maximum rate at which an ascending policy in average ascends V are given by:\n\n\nDefinition 2: Assume that the policy is ascending on the optimal state value\nfunction V. Then ascends on V with a factor at least , if for all non-goal states\ns \n SG, (s) > 0. We also define the guaranteed expected ascend rate of as:\n = mins \n S (s). The maximum possible achievable expected ascend rate of \n G\nis also given by = maxs S (s).\n G\nOne problem with ascending policies is that Definition 1 ignores the immediate re-\nward which the agent receives. For example it could be the case that as a result\nof executing an ascending policy, the agent transitions to some state with a higher\nvalue, but receives a huge negative reward. This can be counterbalanced by adding\na second condition that keeps the ascending policies close to the optimal policy:\nDefinition 3: Given a minimum cost-to-goal problem modeled by an MDP\nM = S, A, P, R , a deterministic policy is -ascending on M if: (1) is as-\ncending on V, and (2) is the maximum value in the interval (0, 1] such that\ns S we have Q(s, (s)) 1 V(s).\nHere, measures how close the ascending policy is to the optimal policy. For any\n, the second condition assures that: s S, Q(s, (s)) [ 1 V(s), V(s)] (note\n\n\f\nthat because M models a minimum cost-to-goal problem, all values are negative).\nNaturally we often prefer policies that are -ascending for values close to 1. In\nsection 3 we derive a lower bound on such that no policy for values smaller than\nthis bound is ascending on V (in other words cannot be arbitrarily small). Simi-\nlarly, a deterministic policy is called -ascending on C, if is -ascending on MC.\nNext, we introduce the framework of redundant controllers:\nDefinition 4: A minimum cost-to-goal controller C is an -redundant controller if\nthere exist multiple deterministic policies that are either optimal, or -ascending\non C. We represent the set of such admissible policies by . Also, the minimum\n C\nascend rate of C is defined as: \n = min , where is the ascend rate of a\n C\npolicy (see Definition 2).\n C\nWe can compute the -redundant set of policies for a controller C as follows. Using\nthe reward model, state transition model, V and Q, in every state s S, we\ncompute the set of actions that are -ascending on C represented by A (s) = {a \n C\nA|a = (s), }, that satisfy both conditions of Definition 2.\n C\nNext, we present an algorithm for merging policies associated with a set of priori-\ntized redundant controllers that run in parallel. For specifying the order of priority\nrelation among the controllers we use the expression Cj Ci, where the relation\n\" \" expresses the subject-to relation (taken from [3]). This equation should read:\ncontroller Cj subject-to controller Ci. A priority ranking system is then specified\nby a set of relations {Cj Ci}. Without loss of generality we assume that the\ncontrollers are prioritized based on the following ranking system: {Cj Ci |i < j}.\nAlgorithm MergeController summarizes the policy merging process. In this algo-\n\n\n\n\nAlgorithm 1 Function MergeController(s, C1, C3, . . . , Cm)\n 1: Input: current state s; the set of controllers Ci; the redundant-sets A i (s) for\n Ci\n every controller Ci.\n 2: Initialize: 1(s) = A 1 (s).\n C1\n 3: For i = 2, 3, . . . , n perform:\n i(s) = {a | a A i (s) a \n C f (i)(s)} where f (i) = max j <\n i\n i such that j(s) = (initially f (1) = 1).\n 4: Return an action a f(n+1)(s).\n\n\n\n\nrithm, i(s) represents the ordered intersection of the redundant-sets A j up to\n Cj\nthe controller Ci (i.e., 1 j i) constrained by the order of priority. In other\nwords, each set i(s) contains a set of actions in state s that are all i-ascending\nwith respect to the superior controllers C1, C2, . . . , Ci. Due to the limited amount of\nredundancy in the system, it is possible that the system may not be able to commit\nto some of the subordinate controllers. This happens when none of the actions\nwith respect to some controller Cj (i.e., a A j (s)) are -ascending with respect\n Cj\nto the superior controllers. In this case the algorithm skips the controller Cj, and\ncontinues the search in the redundant-sets of the remaining subordinate controllers.\nThe complexity of the above algorithm consists of the following costs: (1) cost of\ncomputing the redundant-sets A i for a controller which is linear in the number of\n Ci\nstates and actions: O(|S| |A|), (2) cost of performing Algorithm MergeController in\nevery state s, which is O((m - 1) |A|2), where m is the number of subgoals. In the\nnext section, we theoretically analyze redundant controllers and the performance of\nthe policy merging algorithm in various situations.\n\n\f\n3 Theoretical Results\n\nIn this section we present some of our theoretical results characterizing -redundant\ncontrollers, in terms of the bounds on the number of time steps it takes for a con-\ntroller to complete its task, and the performance of the policy merging algorithm.\nFor lack of space, we have left out the proofs and refer the readers to [8]. In section\n2 we stated that there is a lower bound on such that there exist no -ascending\npolicy for values smaller than this bound. In the first theorem we compute this\nlower bound:\nTheorem 1 Let M = S, A, P, R be a minimum cost-to-goal MDP and let \nbe an -ascending policy defined on M. Then is bounded by > |V |\n max , where\n |V |\n min\nV = min V(s) and V = max V(s).\n min s \n SG max s \n SG\nSuch a lower bound characterizes the maximum flexibility we can afford in a redun-\ndant controller and gives us an insight on the range of values that we can choose\nfor it. In the second theorem we derive an upper bound on the expected number of\nsteps that a minimum cost-to-goal controller takes to complete when executing an\n -ascending policy:\nTheorem 2 Let C be an -ascending minimum cost-to-goal controller and let s\ndenote the current state of the controller. Then any -ascending policy on C will\nterminate the controller in some goal state with probability one. Furthermore, ter-\nmination occurs in average in at most -V(s) steps, where is the guaranteed\n \nexpected ascend rate of the policy .\nThis result assures that the controller arrives in a goal state and will achieve its\ngoal in a bounded number of steps. We use this result when studying performance\nof running multiple redundant controllers in parallel. Next, we study how con-\ncurrent execution of two controllers using Algorithm MergeController impacts each\ncontroller (this result can be trivially extended to the case when a set of m > 2\ncontrollers are executed concurrently):\nTheorem 3 Given an MDP M = S, A, P, R , and any two minimum cost-to-goal\nredundant controllers {C1, C2} defined over M, the policy obtained by Algorithm\nMergeController based on the ranking system {C2 C1} is 1-ascending on C1(s).\nMoreover, if s S, A 1 (s) A 2 (s) = , policy will be ascending on both con-\n C1 C2\ntrollers with the ascend rate at least = min{1, 2 }.\nThis theorem states that merging policies of two controllers using Algorithm Merge-\nController would generate a policy that remains 1-ascending on the superior con-\ntroller. In other words it does not negatively impact the superior controller. In the\nnext theorem, we establish bounds on the expected number of steps that it takes\nfor the policy obtained by Algorithm MergeController to achieve a set of prioritized\nsubgoals = {1, . . . , m} by concurrently executing the associated controllers\n{C1, . . . , Cm}:\nTheorem 4 Assume = {C1, C2, . . . , Cm} is a set of minimum cost-to-goal i-\nredundant (i = 1, . . . , m) controllers defined over MDP M. Let the policy denote\nthe policy obtained by Algorithm MergeController based on the ranking system\n{Cj Ci|i < j}. Let (s) denote the expected number of steps for the policy for\nachieving all the subgoals {1, 2, . . . , m} associated with the set of controllers,\nassuming that the current state of the system is s. Then the following expression\nholds:\n m\n -V -V\n max i (s) i (h(i))\n (s) P(h) (1)\n i \n \n i i\n hH i=1\n\nwhere is the maximum possible achievable expected ascend rate for the controller\n i\nCi (see Definition 2), H is the set of sequences h = s, g1, g2, . . . , gm in which\ngi is a goal state in controller Ci (i.e., gi SG ). The probability distribution\n i\n\n\f\nP(h) = PC m\n 1 PCi over sequences h H gives the probability of executing\n sg1 i=2 gi-1gi\nthe set of controllers in sequence based on the order of priority starting in state s,\nand observing the goal state sequence g1, . . . , gm .\nBased on Theorem 3, when Algorithm MergeController always finds a policy that\noptimizes all controllers (i.e., s S, m (s) = ), policy will ascend on all\n i=1A i\n Ci\ncontrollers. Thus in average the total time for all controllers to terminate equals the\ntime required for a controller that takes the most time to complete which has the\nlower bound of max -V(s)\n i\n i . The worst case happens when the policy generated\n (s)\nby Algorithm MergeController can not optimize more than one controller at a time.\nIn this case always optimizes the controller with the highest priority until its\ntermination, then optimizes the second highest priority controller and continues this\nprocess to the end in a sequential manner. The right hand side of the inequality\ngiven by Equation 1 gives an upper bound for the expected time required for all\ncontrollers to complete when they are executed sequentially. The above theorem\nimplicitly states that when Algorithm MergeController generates a policy that in\naverage commits to more than one subgoal it potentially takes less number of steps\nto achieve all the subgoals, compared to a policy that sequentially achieves them\naccording to their degree of significance.\n\n\n4 Experiments\n\nIn this section we present our experimental results analyzing redundant controllers\nand the policy merging algorithm described in section 2. Figure 2(a) shows a\n10 10 grid world where an agent is to visit a set of prioritized locations marked\nby G1, . . . , Gm (in this example m = 4). The agent's goal is to achieve all of the\nsubgoals by focusing on superior subgoals and coarticulating with the subordinate\nones. Intuitively, when the agent is navigating to some subgoal Gi of higher priority,\nif some subgoal of lower priority Gj is en route to Gi, or not too off from the optimal\npath to Gi, the agent may choose to visit Gj. We model this problem by an MDP\n\n G1 G1 G1 G1\n\n\n\n G3\n G4\n\n G2\n\n\n\n (a) (b) (c) (d)\n\nFigure 2: (a) A 10 10 grid world where an agent is to visit a set of prioritized\nsubgoal locations; (b) The optimal policy associated with the subgoal G1; (c) The\n -ascending policy for = 0.95; (d) The -ascending policy for = 0.90.\n\nM = S, A, R, P , where S is the set of states consisting of 100 locations in the\nroom, and A is the set of actions consisting of eight stochastic navigation actions\n(four actions in the compass direction, and four diagonal actions). Each action\nmoves the agent in the corresponding direction with probability p and fails with\nprobability (1 - p) (in all of the experiments we used success probability p = 0.9).\nUpon failure the agent is randomly placed in one of the eight-neighboring locations\nwith equal probability. If a movement would take the agent into a wall, then the\nagent will remain in the same location. The agent also receives a reward of -1 for\nevery action executed. We assume that the gent has access to a set of controllers\nC1, . . . , Cm, associated with the set of subgoal locations G1, . . . , Gm. A controller Ci\nis a minimum cost-to-goal subgoal option Ci = MC , I, , where M = M, the\n i Ci\n\n\f\ninitiation set I includes any locations except for the subgoal location, and forces\nthe option to terminate only in the subgoal location. Figures 2(b)-(d) show exam-\nples of admissible policies for subgoal G1: Figure 2(b) shows the optimal policy of\nthe controller C1 (navigating the agent to the location G1). Figures 2(c) and 2(d)\nshow the -redundant policies for = 0.95 and = 0.90 respectively. Note that by\nreducing , we obtain a larger set of admissible policies although less optimal.\nWe use two different planning methods: (1) sequential planning, where we achieve\nthe subgoals sequentially by executing the controllers one at a time according to\nthe order of priority of subgoals, (2) concurrent planning, where we use Algorithm\nMergeController for merging the policies associated with the controllers. In the\nfirst set of experiments, we fix the number of subgoals. At the beginning of each\nepisode the agent is placed in a random location, and a fixed number of subgoals\n(in our experiments m = 4) are randomly selected. Next, the set of admissible\npolicies (using = 0.9) for every subgoal is computed. Figure 3(a) shows the per-\nformance of both planning methods, for every starting location in terms of number\nof steps for completing the overall task. The concurrent planning method consis-\ntently outperforms the sequential planning in all starting locations. Next, for the\n\n\n 30 24\n Concurrent Concurrent\n 28 Sequential\n 23\n 26\n\n 22\n 24\n\n 22 21\n\n 20\n Average (steps) Average (steps) 20\n 18\n\n 16 19\n\n 0 20 40 60 80 100 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05\n\n State Epsilon\n\n (a) (b)\n\nFigure 3: (a) Performance of both planning methods in terms of the average number\nof steps in every starting state; (b) Performance of the concurrent method for\ndifferent values of .\nsame task, we measure how the performance of the concurrent method varies by\nvarying , when computing the set of -ascending policies for every subgoal. Figure\n3(b) shows the performance of the concurrent method and Figure 4(a) shows the\naverage number of subgoals coarticulated by the agent averaged over all states\n for different values of . We varied from 0.6 to 1.0 using 0.05 intervals. All of\nthese results are also averaged over 100 episodes, each consisting of 10 trials. Note\nthat for = 1, the only admissible policy is the optimal policy and thus it does\nnot offer much flexibility with respect to the other subgoals. This can be seen in\nFigure 3(b) in which the policy generated by the merging algorithm for = 1.0\nhas the minimum commitment to the other subgoals. As we reduce , we obtain a\nlarger set of admissible policies, thus we observe improvement in the performance.\nHowever, the more we reduce , the less optimal admissible policies we obtain. Thus\nthe performance degrades (here we can observe it for the values below = 0.85).\nFigure 4(a) also shows by relaxing optimality (reducing ), the policy generated by\nthe merging algorithm commits to more subgoals simultaneously.\n In the final set of experiments, we fixed to 0.9 and varied the number of sub-\ngoals from m = 2 to m = 50 (all of these results are averaged over 100 episodes,\neach consisting of 10 trials). Figure 4(b) shows the performance of both planning\nmethods. It can be observed that the concurrent method consistently outperforms\nthe sequential method by increasing the number of subgoals (top curve shows the\nperformance of the sequential method and bottom curve shows that of concurrent\nmethod). This is because when there are many subgoals, the concurrent planning\n\n\f\n 1.5 180\n Concurrent Concurrent\n 1.45 160 Sequential\n 1.4 140\n 1.35 120\n 1.3 100\n 1.25\n 80\n 1.2\n 60\n 1.15 Average (steps)\n 1.1 40\n\n 1.05\n Number of subgoals committed 20\n 1 0\n 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 5 10 15 20 25 30 35 40 45 50\n\n Epsilon Number of subgoals\n\n (a) (b)\n\nFigure 4: (a) Average number of subgoals coarticulated using the concurrent plan-\nning method for different values of ; (b) Performance of the planning methods in\nterms of the average number of steps in every starting state.\n\nmethod is able to visit multiple subgoals of lower priority en route the primary\nsubgoals, thus it can save more time.\n\n\n5 Concluding Remarks\n\nThere are a number of questions and open issues that remain to be addressed and\nmany interesting directions in which this work can be extended. In many problems,\nthe strict order of priority of subtasks may be violated: in some situations we may\nwant to be sub-optimal with respect to the superior subtasks in order to improve\nthe overall performance. One other interesting direction is to study situations when\nactions are structured. We are currently investigating compact representation of\nthe set of admissible policies by exploiting the structure of actions.\n\nAcknowledgements\n\nThis research is supported in part by a grant from the National Science Foundation\n#ECS-0218125.\n\n\nReferences\n\n[1] C. Boutilier, R. Brafman, and C. Geib. Prioritized goal decomposition of Markov decision processes:\n Towards a synthesis of classical and decision theoretic planning. In Martha Pollack, editor, Proceed-\n ings of the Fifteenth International Joint Conference on Artificial Intelligence, pages 11561163,\n San Francisco, 1997. Morgan Kaufmann.\n\n[2] C. Guestrin and G. Gordon. Distributed planning in hierarchical factored mdps. In In the Pro-\n ceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, pages 197 206,\n Edmonton, Canada, 2002.\n\n[3] M. Huber. A Hybrid Architecture for Adaptive Robot Control. PhD thesis, University of Mas-\n sachusetts, Amherst, 2000.\n\n[4] Y. Nakamura. Advanced robotics: redundancy and optimization. Addison-Wesley Pub. Co., 1991.\n\n[5] Theodore J. Perkins and Andrew G. Barto. Lyapunov-constrained action sets for reinforcement learn-\n ing. In Proc. 18th International Conf. on Machine Learning, pages 409416. Morgan Kaufmann,\n San Francisco, CA, 2001.\n\n[6] R. Platt, A. Fagg, and R. Grupen. Nullspace composition of control laws for grasping. In the Pro-\n ceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),\n 2002.\n\n[7] D. Precup. Temporal Abstraction in Reinforcement Learning. PhD thesis, Department of Computer\n Science, University of Massachusetts, Amherst., 2000.\n\n[8] K. Rohanimanesh, R. Platt, S. Mahadevan, and R. Grupen. A framework for coarticulation in\n markov decision processes. Technical Report 04-33, (www.cs.umass.edu/~khash/coarticulation04.\n pdf), Department of Computer Science, University of Massachusetts, Amherst, Massachusetts, USA.,\n 2004.\n\n\f\n", "award": [], "sourceid": 2597, "authors": [{"given_name": "Khashayar", "family_name": "Rohanimanesh", "institution": null}, {"given_name": "Robert", "family_name": "Platt", "institution": null}, {"given_name": "Sridhar", "family_name": "Mahadevan", "institution": null}, {"given_name": "Roderic", "family_name": "Grupen", "institution": null}]}