{"title": "Generalizing GANs: A Turing Perspective", "book": "Advances in Neural Information Processing Systems", "page_first": 6316, "page_last": 6326, "abstract": "Recently, a new class of machine learning algorithms has emerged, where models and discriminators are generated in a competitive setting. The most prominent example is Generative Adversarial Networks (GANs). In this paper we examine how these algorithms relate to the Turing test, and derive what - from a Turing perspective - can be considered their defining features. Based on these features, we outline directions for generalizing GANs - resulting in the family of algorithms referred to as Turing Learning. One such direction is to allow the discriminators to interact with the processes from which the data samples are obtained, making them \"interrogators\", as in the Turing test. We validate this idea using two case studies. In the first case study, a computer infers the behavior of an agent while controlling its environment. In the second case study, a robot infers its own sensor configuration while controlling its movements. The results confirm that by allowing discriminators to interrogate, the accuracy of models is improved.", "full_text": "Generalizing GANs: A Turing Perspective\n\nRoderich Gro\u00df and Yue Gu\n\nDepartment of Automatic Control and Systems Engineering\n\nThe University of Shef\ufb01eld\n\n{r.gross,ygu16}@sheffield.ac.uk\n\nWei Li\n\nDepartment of Electronics\n\nThe University of York\nwei.li@york.ac.uk\n\nWyss Institute for Biologically Inspired Engineering\n\nMelvin Gauci\n\nHarvard University\n\nmgauci@g.harvard.edu\n\nAbstract\n\nRecently, a new class of machine learning algorithms has emerged, where models\nand discriminators are generated in a competitive setting. The most prominent\nexample is Generative Adversarial Networks (GANs). In this paper we examine\nhow these algorithms relate to the Turing test, and derive what\u2014from a Turing\nperspective\u2014can be considered their de\ufb01ning features. Based on these features,\nwe outline directions for generalizing GANs\u2014resulting in the family of algorithms\nreferred to as Turing Learning. One such direction is to allow the discriminators\nto interact with the processes from which the data samples are obtained, making\nthem \u201cinterrogators\u201d, as in the Turing test. We validate this idea using two case\nstudies. In the \ufb01rst case study, a computer infers the behavior of an agent while\ncontrolling its environment. In the second case study, a robot infers its own sensor\ncon\ufb01guration while controlling its movements. The results con\ufb01rm that by allowing\ndiscriminators to interrogate, the accuracy of models is improved.\n\n1\n\nIntroduction\n\nGenerative Adversarial Networks [1] (GANs) are a framework for inferring generative models from\ntraining data. They place two neural networks\u2014a model and a discriminator\u2014in a competitive\nsetting. The discriminator\u2019s objective is to correctly label samples from either the model or the\ntraining data. The model\u2019s objective is to deceive the discriminator, in other words, to produce\nsamples that are categorized as training data by the discriminator. The networks are trained using a\ngradient-based optimization algorithm. Since their inception in 2014, GANs have been applied in a\nrange of contexts [2, 3], but most prominently for the generation of photo-realistic images [1, 4].\nIn this paper we analyze the striking similarities between GANs and the Turing test [5]. The Turing\ntest probes a machine\u2019s ability to display behavior that, to an interrogator, is indistinguishable from\nthat of a human. Developing machines that pass the Turing test could be considered as a canonical\nproblem in computer science [6]. More generally, the problem is that of imitating (and hence\ninferring) the structure and/or behavior of any system, such as an organism, a device, a computer\nprogram, or a process.\nThe idea to infer models in a competitive setting (model versus discriminator) was \ufb01rst proposed\nin [7]. The paper considered the problem of inferring the behavior of an agent in a simple environment.\nThe behavior was deterministic, simplifying the identi\ufb01cation task. In a subsequent work [8], the\nmethod, named Turing Learning, was used to infer the behavioral rules of a swarm of memoryless\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fFigure 1: Illustration of the Turing test setup introduced in [5]. Player C (the interrogator) poses\nquestions to and receives labelled answers from players A and B. Player C does not know which\nlabel (blue square or red disk) corresponds to which player. Player C has to determine this after\nquestioning.\n\nrobots. The robot\u2019s movements were tracked using an external camera system, providing the training\ndata. Additional robots executed the rules de\ufb01ned by the models.\nThe contributions of this paper are\n\nmentations and novel applications; for example, ones involving physical systems;\n\n\u2022 to examine the de\ufb01ning features of GANs (and variants)\u2014assuming a Turing perspective;\n\u2022 to outline directions for generalizing GANs, in particular, to encourage alternative imple-\n\u2022 to show, using two case studies, that more accurate models can be obtained if the discrimi-\nnators are allowed to interact with the processes from which data samples are obtained (as\nthe interrogators in the Turing test).1\n\n2 A Turing Perspective\nIn 1950, Turing proposed an imitation game [5] consisting of three players A, B and C. Figure 1\nshows a schematic of this game. Player C, also referred to as the interrogator, is unable to see the other\nplayers. However, the interrogator can pose questions to and receive answers from them. Answers\nfrom the same player are consistently labelled (but not revealing its identity, A or B). At the end of\nthe game, the interrogator has to guess which label belongs to which player. There are two variants\nof the game, and we focus on the one where player A is a machine, while player B is human (the\ninterrogator is always human). This variant, depicted in Figure 1, is commonly referred to as the\nTuring test [9, 10]. To pass the test, the machine would have to produce answers that the interrogator\nbelieves to originate from a human. If a machine passed this test, it would be considered intelligent.\nFor GANs (and variants), player C, the interrogator, is no longer human, but rather a computer\nprogram that learns to discriminate between information originating from players A and B. Player A\nis a computer program that learns to trick the interrogator. Player B could be any system one wishes\nto imitate, including humans.\n\n2.1 De\ufb01ning Features of GANs\n\nAssuming a Turing perspective, we consider the following as the de\ufb01ning features of GANs (and\nvariants):\n\n\u2022 a training agent, T , providing genuine data samples (the training data);\n\u2022 a model agent, M, providing counterfeit data samples;\n\n1Different to [7], we consider substantially more complex case studies, where the discriminators are required\nto genuinely interact with the systems, as a pre-determined sequence of interventions would be unlikely to reveal\nall the observable behavioral features.\n\n2\n\n\f\u2022 a discriminator agent, D, labelling data samples as either genuine or counterfeit;\n\u2022 a process by which D observes or interacts with M and T ;\n\u2022 D and M are being optimized:\n\n\u2013 D is rewarded for labelling data samples of T as genuine;\n\u2013 D is rewarded for labelling data samples of M as counterfeit;\n\u2013 M is rewarded for misleading D (to label its data samples as genuine).\n\nIt should be noted that in the Turing test there is a bi-directional exchange of information between\nplayer C and either player A or B. In GANs, however, during any particular \u201cgame\u201d, data \ufb02ows only\nin one direction: The discriminator agent receives data samples, but is unable to in\ufb02uence the agent\nat the origin during the sampling process. In the case studies presented in this paper, this limitation is\novercome, and it is shown that this can lead to improved model accuracy. This, of course, does not\nimply that active discriminators are bene\ufb01cial for every problem domain.\n\n2.2\n\nImplementation Options of (Generalized) GANs\n\nGANs and their generalizations, that is, algorithms that possess the aforementioned de\ufb01ning features,\nare instances of Turing Learning [8]. The Turing Learning formulation removes (from a Turing\nperspective unnecessary) restrictions of the original GAN formulation, for example, the need for\nmodels and discriminators to be represented as neural networks, or the need for optimizing these\nnetworks using gradient descent. As a result of this, the Turing Learning formulation is very general,\nand applicable to a wide range of problems (e.g., using models with discrete, continuous or mixed\nrepresentations).\nIn the following, we present the aspects of implementations that are not considered as de\ufb01ning\nfeatures, but rather as implementation options. They allow Turing Learning to be tailored, for\nexample, by using the most suitable model representation and optimization algorithm for the given\nproblem domain. Moreover, users can choose implementation options they are familiar with, making\nthe overall framework2 more accessible.\n\n\u2022 Training data. The training data could take any form. It could be arti\ufb01cial (e.g., audio,\nvisual, textual data in a computer), or physical (e.g., a geological sample, engine, painting\nor human being).\n\n\u2022 Model presentation. The model could take any form. In GANs [1], it takes the form of a\nneural network that generates data when provided with a random input. Other representations\ninclude vectors, graphs, and computer programs. In any case, the representation should\nbe expressive enough, allowing a model to produce data with the same distribution as the\ntraining data. The associated process could involve physical objects (e.g., robots [8]). If the\ntraining data originates from physical objects, but the model data originates from simulation,\nspecial attention is needed to avoid the so called reality gap [11]. Any difference caused not\nby the model but rather the process to collect the data (e.g., tracking equipment) may be\ndetected by the discriminators, which could render model inference impossible.\n\n\u2022 Discriminator representation. The discriminator could take any form. Its representation\nshould be expressive enough to distinguish between genuine and counterfeit data samples.\nThese samples could be arti\ufb01cial or physical. For example, a discriminator could be\nnetworked to an experimental platform, observing and manipulating some physical objects\nor organisms.\n\n\u2022 Optimization algorithms. The optimization algorithms could take any form as long as they\nare compatible with the solution representations. They could use a single candidate solution\nor a population of candidate solutions [8, 12]. In the context of GANs, gradient-based\noptimization algorithms are widely applied [13]. These algorithms however require the\nobjective function to be differentiable and (ideally) unimodal. A wide range of metaheuristic\nalgorithms [14] could be explored for domains with more complex objective functions. For\nexample, if the model was represented using a computer program, genetic programming\nalgorithms could be used.\n\n2For an algorithmic description of Turing Learning, see [8].\n\n3\n\n\fFigure 2: In Case Study 1, we consider a non-embodied agent that is subjected to a stimulus, S,\nwhich can be either low (L) or high (H). The task is to infer how the agent responds to the stimulus.\nThe discriminator controls the stimulus while observing the behavior of the agent (expressed as v),\nwhich is governed by above probabilistic \ufb01nite-state machine. Label S&p denotes that if the stimulus\nis S \u2208 {L, H}, the corresponding transition occurs with probability p. We assume that the structure\nof the state machine is known, and that the parameters (p1, p2, v2, v3, . . . , vn) are to be inferred.\n\n\u2022 Coupling mechanism between the model and discriminator optimizers. The optimization\nprocesses for the model and discriminator solutions are dependent on each other. Hence\nthey may require careful synchronization [1]. Moreover, if using multiple models and/or\nmultiple discriminators, choices have to be made for which pairs of solutions to evaluate.\nElaborate evaluation schemes may take into account the performance of the opponents in\nother evaluations (e.g., using niching techniques). Synchronization challenges include those\nreported for coevolutionary systems.3 In particular, due to the so-called Red Queen Effect,\nthe absolute quality of solutions in a population may increase while the quality of solutions\nrelative to the other population may decrease, or vice versa [18]. Cycling [20] refers to\nthe phenomenon that some solutions that have been lost, may get rediscovered in later\ngenerations. A method for overcoming the problem is to retain promising solutions in an\narchive\u2014the \u201chall of fame\u201d [21]. Disengagement can occur when one population (e.g., the\ndiscriminators) outperforms the other population, making it hard to reveal differences among\nthe solutions. Methods for addressing disengagement include \u201cresource sharing\u201d [22] and\n\u201creducing virulence\u201d [20].\n\u2022 Termination criterion. Identifying a suitable criterion for terminating the optimization\nprocess can be challenging, as the performance is de\ufb01ned in relative rather than absolute\nterms. For example, a model that is found to produce genuine data by each of a population\nof discriminators may still not be useful (the discriminators may have performed poorly). In\nprinciple, however, any criterion can be applied (e.g., convergence data, \ufb01xed time limit,\netc).\n\n3 Case Study 1: Inferring Stochastic Behavioral Processes Through\n\nInteraction\n\n3.1 Problem Formulation\n\nThis case study is inspired from ethology\u2014the study of animal behavior. Animals are sophisticated\nagents, whose actions depend on both their internal state and the stimuli present in their environment.\nAdditionally, their behavior can have a stochastic component. In the following, we show how Turing\nLearning can infer the behavior of a simple agent that captures the aforementioned properties.\nThe agent\u2019s behavior is governed by the probabilistic \ufb01nite-state machine (PFSM)4 shown in Figure 2.\nIt has n states, and it is assumed that each state leads to some observable behavioral feature, v \u2208 R,\nhereafter referred to as the agent\u2019s velocity. The agent responds to a stimulus that can take two levels,\nlow (L) or high (H). The agent starts in state 1. If the stimulus is L, it remains in state 1 with certainty.\n\n3Coevolutionary algorithms have been studied in a range of contexts [15, 16, 17], including system identi\ufb01-\ncation [18, 19], though these works differ from GANs and Turing Learning in that no discriminators evolve, but\nrather pre-de\ufb01ned metrics gauge on how similar the model and training data are. For some system identi\ufb01cation\nproblems, the use of such pre-de\ufb01ned metrics can result in poor model accuracy, as shown in [8].\n\n4PFSMs generalize the concept of Markov chains [23, 24].\n\n4\n\n\fIf the stimulus is H, it transitions to state 2 with probability p1, and remains in state 1 otherwise. In\nother words, on average, it transitions to state 2 after 1/p1 steps. In state k = 2, 3, . . . , n \u2212 1, the\nbehavior is as follows. If the stimulus is identical to that which brings the agent into state k from\nstate k \u2212 1, the state reverts to k \u2212 1 with probability p2 and remains at k otherwise. If the stimulus\nis different to that which brings the agent into state k from state k \u2212 1, the state progresses to k + 1\nwith probability p1 and remains at k otherwise. In state n, the only difference is that if the stimulus is\ndifferent to that which brought about state n, the agent remains in state n with certainty (as there is\nno next state to progress to).\nBy choosing p1 close to 0 and p2 = 1, we force the need for interaction if the higher states are to be\nobserved for a meaningful amount of time. This is because once a transition to a higher state happens,\nthe interrogator must immediately toggle the stimulus to prevent the agent from regressing back to\nthe lower state.\n\n3.2 Turing Learning Implementation\n\nWe implement Turing Learning for this problem as follows:\n\nq = (p\n\n\u2217\n\u2217\n1, p\n2, v\n\n\u2217\n2, v\n\n\u2217\n4) = (0.1, 1.0, 0.2, 0.4, 0.6).\n\n\u2022 Training data. To obtain the training data, the discriminator interacts with the PFSM, shown\nin Figure 2. The number of states are set to four (n = 4). The parameters used to generate\nthe (genuine) data samples are given by:\n\u2217\n3, v\n\n(1)\n\u2022 Model representation. It is assumed that the structure of the PFSM is known, while the\nparameters, q, are to be inferred. All parameters can vary in R. To interpret p1 and p2 as\nprobabilities, they are mapped to the closest point in [0, 1], if outside this interval. The\nmodel data is derived analogously to that of the training data.\n\u2022 Discriminator representation. The discriminator is implemented as an Elman neural net-\nwork [25] with 1 input neuron, 5 hidden neurons, and 2 output neurons. At each time step t,\nthe observable feature (the agent\u2019s velocity v) is fed into the input neuron.5 After updating\nthe neural network, the output from one of the output neurons is used to determine the\nstimulus at time step t + 1, L or H. At the end of a trial (100 time steps), the output from\nthe other output neuron is used to determine whether the discriminator believes the agent\nunder investigation to be the training agent (T ) or model agent (M).\n\u2022 Optimization Algorithms. We use a standard (\u00b5 + \u03bb) evolution strategy with self-adapting\nmutation strengths [26] for both the model and the discriminator populations. We use\n\u00b5 = \u03bb = 50 in both cases. The populations are initialized at random. The parameter values\nof the optimization algorithm are set as described in [26].\n\u2022 Coupling mechanism between the model and discriminator optimizers. The coupling comes\nfrom the evaluation process, which in turn affects the population selection. Each of the\n100 candidate discriminators is evaluated once with each of the 100 models, as well as an\nadditional 100 times with the training agent. It receives a point every time it correctly labels\nthe data as either genuine or counterfeit. At the same time, each model receives a point for\neach time a discriminator mistakenly judges its data as genuine.\n\n\u2022 Termination criterion. The optimization process is stopped after 1000 generations.\n\n3.3 Results\n\nTo validate the advantages of the interactive approach, we use three setups for the Turing Learning\nalgorithm. In the default setup, hereafter \u201cInteractive\u201d setup, the discriminator controls the environ-\nmental stimulus while observing the agent. In the other two setups, the discriminator observes the\nagent in a passive manner; that is, its output is not used to update the stimulus. Instead, the stimulus\nis uniformly randomly chosen at the beginning of the trial, and it is toggled with probability 0.1 at\nany time step (the stimulus is hence expected to change on average every 10 time steps). In setup\n\u201cPassive 1\u201d, the discriminator has the same input as in the \u201cInteractive\" setup (the observable feature,\nv). In setup \u201cPassive 2\u201d, the discriminator has one additional input, the current stimulus (S). All\nother aspects of the passive setups are identical to the \u201cInteractive\u201d setup.\n\n5To emulate a noisy tracking process, the actual speed value is multiplied with a number chosen with a\n\nuniform distribution in the range (0.95, 1.05).\n\n5\n\n\f(a)\n\n(b)\n\nFigure 3: Results from Case Study 1. (a) Model parameters Turing Learning inferred about a\nsimulated agent. In the \u201cInteractive\" setup, the discriminator observes the agent while controlling\na stimulus that the agent responds to. In the two passive setups, the discriminator observes the\nagent and/or stimulus, while the latter is randomly generated (for details, see text). The models are\nthose with the highest evaluation value in the \ufb01nal generation (20 runs per setup). The dashed lines\nindicate the optimal parameter value (which is to be identi\ufb01ed). (b) Example showing how one of the\ndiscriminators interacted with the agent during a trial. For the stimulus (blue), L and H are shown as\n0 and 1, respectively.\n\nFor each setup, we performed 20 runs of the Turing Learning algorithm. Figure 3(a) shows the\ndistribution of the inferred models that achieved the highest evaluation value in the 1000th generation.\nThe \u201cInteractive\u201d setup is the only one that inferred all parameters with good accuracy.\nFigure 3(b) shows a typical example of how a discriminator interacts with the agent. The discriminator\ninitially sets the environmental stimulus to alternating values (i.e., toggling between H and L). Once\nthe agent advances from state 1 to state 2, the discriminator instantly changes the stimulus to L and\nholds it constant. Once the agent advances to higher states, the stimulus is switched again, and so\nforth. This strategy allows the discriminator to observe the agent\u2019s velocity in each state.\n\n4 Case Study 2: A Robot Inferring Its Own Sensor Con\ufb01guration\n\n4.1 Problem Formulation\n\nThe reality gap is a well-known problem in robotics: Often, behaviors that work well in simulation\ndo not translate effectively into real-world implementations [11]. This is because simulations are\ngenerally unable to capture the full range of features of the real world, and therefore make simplifying\nassumptions. Yet, simulations can be important, even on-board a physical robot, as they facilitate\nplanning and optimization.\nThis case study investigates how a robot can use Turing Learning to improve the accuracy of a\nsimulation model of itself, though a process of self-discovery, similar to [27]. In a practical scenario,\nthe inference could take place on-board a physical platform. For convenience, we use an existing\nsimulation platform [28], which has been extensively veri\ufb01ed and shown to be able to cross the reality\ngap [29]. The robot, an e-puck [30], is represented as a cylinder of diameter 7.4 cm, height 4.7 cm\nand mass 152 g. It has two symmetrically aligned wheels. Their ground contact velocity (vleft and\nvright) can be set within [-12.8, 12.8] (cm/s). During the motion, random noise is applied to each\nwheel velocity, by multiplying it with a number chosen with a uniform distribution in the range (0.95,\n1.05).\n\n6\n\np1p2v2v3v4-1012modelparametersparametervalueInteractivePassive1Passive20204060801000.00.51.01.5timestepdiscriminatorinputandoutputvelocity(v)stimulus(S)\f(a)\n\n(b)\n\nFigure 4: In Case Study 2, we consider a miniature mobile robot, the e-puck, that perceives its envi-\nronment via eight infrared (IR) proximity sensors. The robot is unaware of the spatial con\ufb01guration\nof these sensors, and has to infer it. The discriminator controls the movements of the robot, while\nobserving the reading values of the sensors. (a) The sensor con\ufb01guration to be inferred is the one\nof the physical e-puck robot. It comprises of 16 parameters, representing the orientations (\u03b8) and\ndisplacements (d) of the 8 proximity sensors. (b) The robot is placed at random into an environment\nwith nine moveable obstacles.\n\nThe robot has eight infrared proximity sensors distributed around its cylindrical body, see Figure 4(a).\nThe sensors provide noisy reading values (s1, s2, . . . , s8). We assume that the robot does not know\nwhere the sensors are located (neither their orientations, nor their displacements from the center).\nSituations like this are common in robotics, where uncertainties are introduced when sensors get\nmounted manually or when the sensor con\ufb01guration may change during operation (e.g., at the time of\ncollision with an object, or when the robot itself recon\ufb01gures the sensors). The sensor con\ufb01guration\ncan be described as follows:\n\nq = (\u03b81, \u03b82, . . . , \u03b88, d1, d2, . . . , d8) ,\n\n(2)\nwhere di \u2208 (0, R] de\ufb01nes the distance of sensor i from the robot\u2019s center (R is the robot\u2019s radius),\nand \u03b8i \u2208 [\u2212\u03c0, \u03c0] de\ufb01nes the bearing of sensor i relative to the robot\u2019s front.\nThe robot operates in a bounded square environment with sides 50 cm, shown in Figure 4(b). The\nenvironment also contains nine movable, cylindrical obstacles, arranged in a grid. The distance\nbetween the obstacles is just wide enough for an e-puck to pass through.\n\n4.2 Turing Learning Implementation\n\nWe implement Turing Learning for this problem as follows:\n\nother words, a total of 16 parameters have to be estimated.\n\n\u2022 Training data. The training data comes from the eight proximity sensors of a \u201creal\u201d e-\npuck robot, that is, using sensor con\ufb01guration q as de\ufb01ned by the robot (see Figure 4(a)).\nThe discriminator controls the movements of the robot within the environment shown in\nFigure 4(b), while observing the readings of its sensors.\n\u2022 Model representation. It is assumed that the sensor con\ufb01guration, q, is to be inferred. In\n\u2022 Discriminator representation. As in Case Study 1, the discriminator is implemented as an\nElman neural network with 5 hidden neurons. The network has 8 inputs that receive values\nfrom the robot\u2019s proximity sensors (s1, s2, . . . , s8). In addition to the classi\ufb01cation output,\nthe discriminator has two control outputs, which are used to set the robot\u2019s wheel velocities\n(vleft and vright). In each trial, the robot starts from a random position and random orientation\nwithin the environment.6 The evaluation lasts for 10 seconds. As the robot\u2019s sensors and\nactuators are updated 10 times per second, this results in 100 time steps.\n\n\u2022 The remaining aspects are implemented exactly as in Case Study 1.\n\n6As the robot knows neither its relative position to the obstacles, nor its sensor con\ufb01guration, the scenario\n\ncan be considered as a chicken-and-egg problem.\n\n7\n\n\f(a)\n\n(b)\n\nFigure 5: Results from Case Study 2. Model parameters Turing Learning inferred about the sensor\ncon\ufb01guration of the e-puck robot: (a) sensor orientations, (b) sensor displacements. In the \u201cInterac-\ntive\" setup, the discriminator observes the sensor reading values while controlling the movements\nof the robot. In the two passive setups, the discriminator observes the sensor reading values and/or\nmovements while the latter are randomly generated (for details, see text). The models are those with\nthe highest evaluation value in the \ufb01nal generation (20 runs per setup). The dashed lines indicate the\noptimal parameter value (which is to be identi\ufb01ed).\n\n4.3 Results\n\nTo validate the advantages of the interactive approach, we use again three setups. In the \u201cInteractive\u201d\nsetup the discriminator controls the movements of the robot while observing its sensor readings. In\nthe other two setups, the discriminator observes the robot\u2019s sensor readings in a passive manner; that\nis, its output is not used to update the movements of the robot. Rather, the pair of wheel velocities\nis uniformly randomly chosen at the beginning of the trial, and, with probability 0.1 at any time\nstep (the movement pattern hence is expected to change on average every 10 time steps). In setup\n\u201cPassive 1\u201d, the discriminator has the same inputs as in the \u201cInteractive\u201d setup (the reading values of\nthe robot\u2019s sensors, s1, s2, . . . , s8). In setup \u201cPassive 2\u201d, the discriminator has two additional inputs,\nindicating the velocities of the left and right wheels (vleft and vright). All other aspects of the passive\nsetups are identical to the \u201cInteractive\u201d setup.\nFor each setup, we performed 20 runs of the Turing Learning algorithm. Figure 5 shows the\ndistribution of the inferred models that achieved the highest evaluation value in the 1000th generation.\nThe \u201cInteractive\u201d setup is the only one that inferred the orientations of the proximity sensors with\ngood accuracy. The displacement parameters were inferred with all setups, though none of them was\nable to provide accurate estimates.\nFigure 6 shows a typical example of how a discriminator controls the robot. At the beginning, the\nrobot rotates clockwise, registering an obstacle with sensors s7, s6, . . . , s2 (in that order). The robot\nthen moves forward, and registers the obstacle with sensors s1 and/or s8, while pushing it. This\ncon\ufb01rms that s1 and s8 are indeed forward-facing. Once the robot has no longer any obstacle in its\nfront, it repeats the process. To validate if the sensor-to-motor coupling was of any signi\ufb01cance for\nthe discrimination task, we recorded the movements of a robot controlled by the best discriminator\nof each of the 20 runs. The robot used either the genuine sensor con\ufb01guration (50 trials) or the best\nmodel con\ufb01guration of the corresponding run (50 trials). In these 2000 \u201cclosed-loop\u201d experiments,\nthe discriminator made correct judgments in 69.45% of the cases. We then repeated the 2000 trials,\nnow ignoring the discriminator\u2019s control outputs, but rather using the movements recorded earlier.\nIn these 2000 \u201copen-loop\u201d experiments, the discriminator made correct judgments in 58.60% of the\ncases\u2014a signi\ufb01cant drop, though still better than guessing (50%).\n\n8\n\n-4-20246\u03b81\u03b82\u03b83\u03b84\u03b85\u03b86\u03b87\u03b88modelparametersparametervalue(rad)InteractivePassive1Passive20.51.01.52.02.53.03.54.0d1d2d3d4d5d6d7d8modelparametersparametervalue(cm)InteractivePassive1Passive2\fFigure 6: Example showing how one of the discriminators in Case Study 2 controlled the robot\u2019s\nmovements during the trial. The discriminator takes as input the robot\u2019s eight sensor reading values\n(shown at the top), and controls the velocities of the wheels (shown at the bottom). The discriminator\nhas to decide whether the sensor con\ufb01guration of the robot corresponds to the one of the physical\ne-puck robot. For details, see text.\n\n5 Conclusion\n\nIn this paper we analyzed how Generative Adversarial Networks (GANs) relate to the Turing test.\nWe identi\ufb01ed the de\ufb01ning features of GANs, if assuming a Turing perspective. Other features,\nincluding choice of model representation, discriminator representation, and optimization algorithm,\nwere viewed as implementation options of a generalized version of GANs, also referred to as Turing\nLearning.\nIt was noted that the discriminator in GANs does not directly in\ufb02uence the sampling process, but\nrather is provided with a (static) data sample from either the generative model or training data set.\nThis is in stark contrast to the Turing test, where the discriminator (the interrogator) plays an active\nrole; it poses questions to the players, to reveal the information most relevant to the discrimination\ntask. Such interactions are by no means always useful. For the purpose for generating photo-realistic\nimages, for example, they may not be needed.7 For the two case studies presented here, however,\ninteractions were shown to cause an improvement in the accuracy of models.\nThe \ufb01rst case study showed how one can infer the behavior of an agent while controlling a stimulus\npresent in its environment. It could serve as a template for studies of animal/human behavior,\nespecially where some behavioral traits are revealed only through meaningful interactions. The\ninference task was not simple, as the agent\u2019s actions depended on a hidden stochastic process. The\nlatter was in\ufb02uenced by the stimulus, which was set to either low or high by the discriminator (100\ntimes). It was not known in advance which of the 2100 sequences are useful. The discriminator thus\nneeded to dynamically construct a suitable sequence, taking the observation data into account.\nThe second case study focused on a different class of problems: active self-discovery. It showed that\na robot can infer its own sensor con\ufb01guration through controlled movements. This case study could\nserve as a template for modelling physical devices. The inference task was not simple, as the robot\nstarted from a random position in the environment, and its motors and sensors were affected by noise.\nThe discriminator thus needed to dynamically construct a control sequence that let the robot approach\nan obstacle and perform movements for testing its sensor con\ufb01guration.\nFuture work could attempt to build models of more complex behaviors, including those of humans.\n\nAcknowledgments\n\nThe authors thank Nathan Lepora for stimulating discussions.\n\n7Though if the discriminator could request additional images by the same model or training agent, problems\n\nlike mode collapse might be prevented.\n\n9\n\n020406080100time step0.20.40.60.81scaled sensor readingss1s2s3s4s5s6s7s8020406080100time step-10-50510speed values (cm/s)vleftvright\fReferences\n[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and\nY. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems\n27, pages 2672\u20132680. Curran Associates, Inc., 2014.\n\n[2] A. Dosovitskiy, J. Tobias-Springenberg, and T. Brox. Learning to generate chairs with convolu-\ntional neural networks. In Proceedings of the 2015 IEEE Conference on Computer Vision and\nPattern Recognition (CVPR), pages 1538\u20131546. IEEE, 2015.\n\n[3] K. Schawinski, C. Zhang, H. Zhang, L. Fowler, and G. K. Santhanam. Generative adversarial\nnetworks recover features in astrophysical images of galaxies beyond the deconvolution limit.\nMonthly Notices of the Royal Astronomical Society: Letters, 467(1):L110, 2017.\n\n[4] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolu-\n\ntional generative adversarial networks. CoRR, abs/1511.06434, 2015.\n\n[5] A. M. Turing. Computing machinery and intelligence. Mind, 59(236):433\u2013460, 1950.\n\n[6] R. M. French. The Turing test: The \ufb01rst 50 years. Trends in Cognitive Sciences, 4(3):115\u2013122,\n\n2000.\n\n[7] W. Li, M. Gauci, and R Gro\u00df. A coevolutionary approach to learn animal behavior through con-\ntrolled interaction. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary\nComputation (GECCO 2013), pages 223\u2013230. ACM, 2013.\n\n[8] W. Li, M. Gauci, and R. Gro\u00df. Turing Learning: A metric-free approach to inferring behavior\n\nand its application to swarms. Swarm Intelligence, 10(3):211\u2013243, 2016.\n\n[9] S. Harnad. Minds, machines and Turing: The indistinguishability of indistinguishables. Journal\n\nof Logic, Language and Information, 9(4):425\u2013445, 2000.\n\n[10] A. Pinar Saygin, I. Cicekli, and V. Akman. Turing test: 50 years later. Minds and Machines,\n\n10(4):463\u2013518, 2000.\n\n[11] N. Jacobi, P. Husbands, and I. Harvey. Noise and the reality gap: The use of simulation in\nevolutionary robotics. In Proceedings of the 3rd European Conference on Advances in Arti\ufb01cial\nLife, pages 704\u2013720. Springer-Verlag, 1995.\n\n[12] D. J. Im, H. Ma, C. Kim, and G. W. Taylor. Generative adversarial parallelization. CoRR,\n\nabs/1612.04021, 2016.\n\n[13] I. J. Goodfellow. NIPS 2016 tutorial: Generative adversarial networks. CoRR, abs/1701.00160,\n\n2017.\n\n[14] F. Glover and K. S\u00f6rensen. Metaheuristics. Scholarpedia, 10(4):6532, 2015.\n\n[15] W. D. Hillis. Co-evolving parasites improve simulated evolution as an optimization procedure.\n\nPhysica D: Nonlinear Phenomena, 42(1):228\u2013234, 1990.\n\n[16] Geoffrey F. Miller and Dave Cliff. Protean behavior in dynamic games: Arguments for the\nco-evolution of pursuit-evasion tactics. In Proceedings of the 3rd International Conference on\nSimulation of Adaptive Behavior: From Animals to Animats 3 (SAB 1994), pages 411\u2013420. MIT\nPress, 1994.\n\n[17] S. Nol\ufb01 and D. Floreano. Coevolving predator and prey robots: Do \u201carms races\" arise in\n\narti\ufb01cial evolution? Arti\ufb01cial Life, 4(4):311\u2013335, 1998.\n\n[18] J. C. Bongard and H. Lipson. Nonlinear system identi\ufb01cation using coevolution of models and\n\ntests. IEEE Transactions on Evolutionary Computation, 9(4):361\u2013384, 2005.\n\n[19] J. C. Bongard and H. Lipson. Active coevolutionary learning of deterministic \ufb01nite automata.\n\nThe Journal of Machine Learning Research, 6:1651\u20131678, 2005.\n\n[20] J. Cartlidge and S. Bullock. Combating coevolutionary disengagement by reducing parasite\n\nvirulence. Evolutionary Computation, 12(2):193\u2013222, 2004.\n\n10\n\n\f[21] C. Rosin and R. Belew. New methods for competitive coevolution. Evolutionary Computation,\n\n5(10):1\u201329, 1997.\n\n[22] Hugues Juille and Jordan B. Pollack. Coevolving the \u201cideal\" trainer: Application to the discovery\nof cellular automata rules. In Genetic Programming 1998: Proceedings of the Third Annual\nConference, pages 519\u2013527. Morgan Kaufmann, 1998.\n\n[23] E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta, and R. C. Carrasco. Probabilistic\n\ufb01nite-state machines \u2013 Part I. IEEE Transactions on Pattern Analysis and Machine Intelligence,\n27(7):1013\u20131025, 2005.\n\n[24] E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta, and R. C. Carrasco. Probabilistic\n\ufb01nite-state machines \u2013 Part II. IEEE Transactions on Pattern Analysis and Machine Intelligence,\n27(7):1026\u20131039, 2005.\n\n[25] J. L Elman. Finding structure in time. Cognitive Science, 14(2):179\u2013211, 1990.\n\n[26] H. G. Beyer and H. P. Schwefel. Evolution strategies \u2013 A comprehensive introduction. Natural\n\nComputing, 1(1):3\u201352, 2002.\n\n[27] Josh Bongard, Victor Zykov, and Hod Lipson. Resilient machines through continuous self-\n\nmodeling. Science, 314(5802):1118\u20131121, 2006.\n\n[28] S. Magnenat, M. Waibel, and A. Beyeler. Enki: The fast 2D robot simulator, 2011. https:\n\n//github.com/enki-community/enki.\n\n[29] M. Gauci, J. Chen, W. Li, T. J. Dodd, and R. Gro\u00df. Self-organized aggregation without\n\ncomputation. The International Journal of Robotics Research, 33(8):1145\u20131161, 2014.\n\n[30] F. Mondada, M. Bonani, X. Raemy, J. Pugh, C. Cianci, A. Klaptocz, et al. The e-puck, a robot\ndesigned for education in engineering. In Proceedings of the 9th Conference on Autonomous\nRobot Systems and Competitions, pages 59\u201365. IPCB, 2009.\n\n11\n\n\f", "award": [], "sourceid": 3170, "authors": [{"given_name": "Roderich", "family_name": "Gross", "institution": "The University of Sheffield"}, {"given_name": "Yue", "family_name": "Gu", "institution": "The University of Sheffield"}, {"given_name": "Wei", "family_name": "Li", "institution": "University of York"}, {"given_name": "Melvin", "family_name": "Gauci", "institution": "Harvard University"}]}