{"title": "Bayesian optimization explains human active search", "book": "Advances in Neural Information Processing Systems", "page_first": 55, "page_last": 63, "abstract": "Many real-world problems have complicated objective functions. To optimize such functions, humans utilize sophisticated sequential decision-making strategies. Many optimization algorithms have also been developed for this same purpose, but how do they compare to humans in terms of both performance and behavior? We try to unravel the general underlying algorithm people may be using while searching for the maximum of an invisible 1D function. Subjects click on a blank screen and are shown the ordinate of the function at each clicked abscissa location. Their task is to find the function\u2019s maximum in as few clicks as possible. Subjects win if they get close enough to the maximum location. Analysis over 23 non-maths undergraduates, optimizing 25 functions from different families, shows that humans outperform 24 well-known optimization algorithms. Bayesian Optimization based on Gaussian Processes, which exploit all the x values tried and all the f(x) values obtained so far to pick the next x, predicts human performance and searched locations better. In 6 follow-up controlled experiments over 76 subjects, covering interpolation, extrapolation, and optimization tasks, we further confirm that Gaussian Processes provide a general and unified theoretical account to explain passive and active function learning and search in humans.", "full_text": "Bayesian optimization explains human active search\n\nAli Borji\n\nDepartment of Computer Science\n\nUSC, Los Angeles, 90089\n\nborji@usc.edu\n\nDepartments of Neuroscience and Computer Science\n\nLaurent Itti\n\nUSC, Los Angeles, 90089\n\nitti@usc.edu\n\nAbstract\n\nMany real-world problems have complicated objective functions. To optimize\nsuch functions, humans utilize sophisticated sequential decision-making strate-\ngies. Many optimization algorithms have also been developed for this same pur-\npose, but how do they compare to humans in terms of both performance and be-\nhavior? We try to unravel the general underlying algorithm people may be using\nwhile searching for the maximum of an invisible 1D function. Subjects click on\na blank screen and are shown the ordinate of the function at each clicked abscissa\nlocation. Their task is to \ufb01nd the function\u2019s maximum in as few clicks as possible.\nSubjects win if they get close enough to the maximum location. Analysis over\n23 non-maths undergraduates, optimizing 25 functions from different families,\nshows that humans outperform 24 well-known optimization algorithms. Bayesian\nOptimization based on Gaussian Processes, which exploits all the x values tried\nand all the f (x) values obtained so far to pick the next x, predicts human per-\nformance and searched locations better. In 6 follow-up controlled experiments\nover 76 subjects, covering interpolation, extrapolation, and optimization tasks, we\nfurther con\ufb01rm that Gaussian Processes provide a general and uni\ufb01ed theoretical\naccount to explain passive and active function learning and search in humans.\n\nIntroduction\n\n1\nTo \ufb01nd the best solution to a complex real-life search problem, e.g., discovering the best drug to treat\na disease, one often has few chances for experimenting, as each trial is lengthy and costly. Thus,\na decision maker, be it human or machine, should employ an intelligent strategy to minimize the\nnumber of trials. This problem has been addressed in several \ufb01elds under different names, including\nactive learning [1], Bayesian optimization [2, 3], optimal search [4, 5, 6], optimal experimental\ndesign [7, 8], hyper-parameter optimization, and others. Optimal decision making algorithms show\nsigni\ufb01cant promise in many applications, including human-machine interaction, intelligent tutoring\nsystems, recommendation systems, sensor placement, robotics control, and many more.\nHere, inspired by the optimization literature, we design and conduct a series of experiments to\nunderstand human search and active learning behavior. We compare and contrast humans with\nstandard optimization algorithms, to learn how well humans perform 1D function optimization and\nto discover which algorithm best approaches or explains human search strategies. This contrast hints\ntoward developing even more sophisticated algorithms and offers important theoretical and practical\nimplications for our understanding of human learning and cognition.\nWe aim to decipher how humans choose the next x to be queried when attempting to locate the\nmaximum of an unknown 1D function. We focus on the following questions: Do humans perform\nlocal search (for instance by randomly choosing a location and following the gradient of the function,\ne.g., gradient descent), or do they try to capture the overall structure of the underlying function (e.g.,\npolynomial, linear, exponential, smoothness), or some combination of both? Do the sets of sample\nx locations queried by humans resemble those of some algorithms more than others? Do humans\nfollow a Bayesian approach, and if so which selection criterion might they employ? How do humans\nbalance between exploration and exploitation during optimization? Can Gaussian processes [9] offer\na unifying theory of human function learning and active search?\n\n1\n\n\fFigure 1: Left: a sample search trial. The unknown function (blue curve)\nwas only displayed at the end of training trials. During search for the function\u2019s\nmaximum, a red dot at (x, f (x)) was drawn for each x selected by participants.\nRight: A sample function and the pdf of human clicks.\n\n2 Experiments and Results\nWe seek to study human search behavior directly on 1D function optimization, for the \ufb01rst time\nsystematically and explicitly. We are motivated by two main reasons: 1) Optimization has been\nintensively studied and today a large variety of optimization algorithms and theoretical analyses ex-\nist, 2) 1D search allows us to focus on basic search mechanisms utilized by humans, eliminating\nreal-world confounds such as context, salient distractors, semantic information, etc. A total of 99\nundergraduate students with basic calculus knowledge from our university participated in 7 exper-\niments. They were from the following majors: Neurosciences, Biology, Psychology, Kinesiology,\nBusiness, English, Economics, and Political Sciences (i.e., not Maths or Engineering). Subjects had\nnormal or corrected-to-normal vision and were compensated by course credits. They were seated\nbehind a 42\" computer monitor at a distance of 130 cm (subtending a \ufb01eld of view of 43\u25e6 \u00d7 25\u25e6).\nThe experimental methods were approved by our university\u2019s Institutional Review Board (IRB).\n2.1 Experiment 1: Optimization 1\nParticipants were 23 students (6 m, 17\nf) aged 18 to 22 (19.52 \u00b1 1.27 yr).\nStimuli were a variety of 25 1D\nfunctions with different characteristics\ndifferentiable/non-\n(linear/non-linear,\nincluding: Poly-\ndifferentiable, etc.),\nnomial, Exponential, Gaussian, Dirac,\nSinc, etc. The goal was to cover many\ncases and to investigate the generaliza-\ntion power of algorithms and humans.\nTo generate a polynomial stimulus of degree m (m > 2), we randomly generated m + 1 pairs\nof (x, y) points and \ufb01tted a polynomial to them using least squares regression. Coef\ufb01cients were\nsaved for later use. Other functions were de\ufb01ned with pre-speci\ufb01ed formulas and parameters (e.g.,\nSchwefel, Psi). We generated two sets of stimuli, one for training and the other for testing. The x\nrange was \ufb01xed to [\u2212300 300] and the y range varied depending on the function. Fig. 1 shows a\nsample search trial during training, as well as smoothed distribution of clicked locations for \ufb01rst\nclicks, and progressively for up to 15 clicks. In the majority of cases, the distribution of clicks starts\nwith a strong leftward bias for the \ufb01rst clicks, then progressively focusing around the true function\nmaximum as subjects make more clicks and approach the maximum. Subjects clicked less on\nsmooth regions and more on spiky regions (near local maxima). This indicates that they sometimes\nfollowed the local gradient direction of the function.\nProcedure. Subjects were informed about the goal of the experiment. They were asked to \u201c\ufb01nd the\nmaximum value (highest point) of a function in as few clicks as possible\u201d. During the experiment,\neach subject went through 30 test trials (in random order). Starting from a blank screen, subjects\ncould click on any abscissa x location, and we would show them the corresponding f (x) ordinate.\nPreviously clicked points remained on the screen until the end of the trial. Subjects were instructed\nto terminate the trial when they thought they had reached the maximum location within a margin\nof error (xTolh=6) shown at the bottom of the screen (small blue line in Fig. 1). This design was\nintentional to both obtain information about the human satis\ufb01cing process and to make the compari-\nson fair with algorithms (e.g., as opposed to automatically terminating a trial if humans happened to\nclick near the maximum). We designed the following procedure to balance speed vs. accuracy. For\neach trial, a subject gained A points for \u201cHIT\u201d, lost A points for \u201cMISS\u201d, and lost 1 point for each\nclick. Scores of subjects were kept on the record, to compete against other subjects. The subject\nwith the highest score was rewarded with a prize. We used A = 10 (for 13 subjects) and A = 20\n(for 10 subjects); since we did not observe a signi\ufb01cant difference across both conditions, here we\ncollapsed all the data. We highlighted to subjects that they should decide carefully where to click\nnext, to minimize the number of clicks before hitting the maximum location. They were not allowed\nto click outside the function area. Before the experiment, we had a training session in which subjects\ncompleted 5 trials on a different set of functions than those used in the real experiment. The purpose\nwas for subjects to understand the task and familiarize themselves to the general complexity and\nshapes of functions (i.e., developing a prior). We revealed the entire function at the end of each\ntraining trial only (not after test trials). The maximum number of clicks was set to 40. To prohibit\nsubjects from using the vertical extent of the screen to guess the maximum location, we randomly\nelevated or lowered the function plot. We also recorded the time spent on each trial.\n\n2\n\n\u2212300\u2212200\u221210001002003000102030405060708090123456789 Hit , pls click on the continue botton for the next.Block 1 (TRAINING) Tries (penalty)= 9 Total Reward: 11 Selection Panelx-toleranceclick here to terminate. Function Value\u2212300\u2212200\u2212100010020030000.20.40.60.81function no. 1xyOriginal functionHistogram of clicksHistogram of first clicks\fTable 1: Baseline algorithms. We set maxItr\nto 500 when the only parameter is xTolm.\nAlgorithm\nFminSearch [10]\nFminBnd\nFminUnc\nminFunc-#\nGD\n\nParameters\nxTolm = 0.005:0.005:0.1\nxTolm = 5e-7:1e-6:1e-5\nxTolm = 0.01:0.05:1\nxTolm = 0.01:0.05:1\nxTolm = 0.001:0.001:0.005\n\u03b1 = 0.1:0.1:0.5; tol = 1e-6\n\nType\nLoc.\nL\nL\nL\nL\n\nFigure 2: Dif\ufb01cult and easy stimuli.\n\nHuman Results. On average, over all 25 functions and 23 subjects, subjects attempted 12.8 \u00b1 0.4\ntries to reach the target. Average hit rate (i.e., whether subjects found the maximum) over all tri-\nals was 0.74 \u00b1 0.04. Across subjects, standard deviations of the number of tries and hit rates\nwere 3.8 and 0.74. Relatively low values here suggest inter-subject consistency in our task.\nEach trial lasted about 22 \u00b1 4 seconds. Figure 2 shows exam-\nple hard and easy stimuli (f n are function numbers, see Supple-\nment). The Dirac function had the most clicks (16.5), lowest hit\nrate (0.26), and longest time (32.4\u00b118.8 s). Three other most dif-\n\ufb01cult functions, in terms of function calls were, listed as (function\nnumber, number of clicks): {(f2,15.8), (f8,15.2), (f12,15.1)}. The\neasiest ones were: {(f16,9.3), (f20, 9.9), (f17, 10)}. For hit rate,\nthe hardest functions were: {(f24,0.35), (f15,0.45), (f7,0.56)},\nand the easiest ones: {(f20,1), (f17,1), (f22,0.95)}. Subjects were\nfaster on Gaussian (f 17) and Exponential (f 20) functions (16.2 and 16.9 seconds).\nModel-based Results. We compared human data to 24\nwell-established optimization algorithms. These base-\nline methods employ various search strategies (local,\nglobal, gradient-based, etc.) and often have several pa-\nrameters (Table 1). Here, we emphasize one to two\nof the most important parameters for each algorithm.\nThe following algorithms are considered: local search\n(e.g., Nelder-Mead simplex/FminSearch [10]); multi-\nstart local search; population-based (e.g., Genetic Al-\ngorithms (GA) [12, 13], Particle Swarm Optimization\n(PSO) [11]); DIvided RECTangles (DIRECT) [15]; and\nBayesian Optimization (BO) techniques [2, 3]. BO\nconstructs a probabilistic model for f (\u00b7) using all previ-\nous observations and then exploits this model to make\ndecisions about where along X to evaluate the function\nnext. This results in a procedure that can \ufb01nd the max-\nimum of dif\ufb01cult non-convex, non-differentiable functions with relatively few evaluations, at the\ncost of performing more computation to determine the next point to try. BO methods are based on\nGaussian processes (GP) [9] and several selection criteria. Here we use a GP with a zero mean\nprior and a RBF covariance kernel. Two parameters of the kernel function, kernel width and signal\nvariance, are learned from our training functions. We consider 5 types of selection criteria for BO:\nMaximum Mean (MM) [16], Maximum Variance (MV) [17], Maximum Probability of Improving\n(MPI) [18, 19], Maximum Expected Improvement (MEI)[20, 21], and Upper Con\ufb01dence Bounds\n(UCB) [22]. Further, we consider two BO methods by Osborne et al. [23], with and without gradi-\nent information (See supplement). To measure to what degree human search behavior deviates from\na random process, we devise a Random search algorithm which chooses the next point uniformly\nrandom from [\u2212300 300] without replacement. We also run the Gradient Descent (GD) algorithm\nand its descendants denoted as minFunc-# in Table 1 where # refers to different methods (conjugate\ngradient (cg), quasi-Newton (qnewton), etc.).\nTo evaluate which algorithm better explains human 1D search behavior, we propose two measures:\n1) an algorithm should have about the same performance, in terms of hit rate and function calls, as\nhumans (1st-level analysis), and 2) it should have similar search statistics as humans, for example in\nterms of searched locations or search order (2nd-level analysis). For fair human-algorithm compari-\nson, we simulate for algorithms the same conditions as in our behavioral experiment, when counting\na trial as a hit or a miss (e.g., using same xTolh). It is worth noting that in our behavioral experiment\nwe did our best not to provide information to humans that we cannot provide to algorithms.\nIn the 1st-level analysis, we tuned algorithms for their best accuracy by performing a grid search\nover their parameters to sample the hit-rate vs. function-calls plane. Table 1 shows two stopping\nconditions that are considered: 1) we either run an algorithm until a tolerance on x is met (i.e.,\n|xi\u22121 \u2212 xi| < xTolm), or 2) we allow it to run up to a variable (maximum) number of function calls\n(maxItr). For each parameter setting (e.g., a speci\ufb01ed population size and generations in GA), since\neach run of an algorithm may result in a different answer, we run it 200 times to reach a reliable\nestimate of its performance. To generate a starting point for algorithms, we randomly sampled from\n\nmult-FminSearch Glob. xTolm = 0.005, starts = 1:10\nxTolm = 5e-7, starts = 1:10\nmult-FminBnd\nxTolm = 0.01, starts = 1:10\nmult-FminUnc\npop = 1:10; gen = 1:20\nPSO [11]\npop and gen = 5:10:100\nGA [12, 13]\ngeneration gap = 0.01\nstopTemp = 0.01:0.05:1\n\u03b2 = 0.1:0.1:1\nmaxItr = 5:5:70\nmaxItr = 5:5:150\nmaxItr = 5:5:35\n\nDIRECT [15]\nRandom\nGP [2, 3]\n\nSA [14]\n\nG\nG\nG\nG\n\nG\n\nG\nG\nG\n\n3\n\nHardEasyfunction callhit ratef12f2f8f16f20f20f17f17f7f24f15f22\fthe distribution of human \ufb01rst clicks (over all subjects and functions, p(x1); see Fig. 1). As in\nthe behavioral experiment, after termination of an algorithm, a hit is declared when: \u2203 xi \u2208 B :\n|xi \u2212 argmaxxf (x)| \u2264 xTolh, where set B includes the history of searched locations in a search\ntrial. Fig. 3 shows search accuracy of optimization algorithms. As shown, humans are better than\nall algorithms tested, if hit rate and function calls are weighed equally (i.e., best is to approach the\nbottom-right corner of Fig. 3). That is, undergraduates from non-maths majors managed to beat the\nstate of the art in numerical optimization. BO algorithms with GP-UCB and GP-MEI criteria are\ncloser to human performance (so are GP-Osborne methods). The DIRECT method did very well\nand found the maximum with \u2265 30 function calls. It can achieve better-than-human hit rate, with\na number of function calls which is smaller than BO algorithms, though still higher than humans\n(it was not able to reach human performance with equal number of function calls). As expected,\nsome algorithms reach human accuracy but with much higher number of function calls (e.g., GA,\nmult-start-#), sometimes by up to 3 orders of magnitude.\nWe chose the following promising algorithms for the 2nd-level analysis: DIRECT, GP-Osborne-G,\nGP-Osborne, GP-MPI, GP-MUI, GP-MEI, GP-UCB, GP-UCB-Opt, GP-MV, PSO, and Random.\nGP-UCB-Opt is basically the same as GP-UCB with its exploration/exploitation parameter (\u03ba in\n\u00b5x + \u03ba\u03c3x; GP mean + GP std) learned from train data for each function. These algorithms were\nchosen because their performance curve in the \ufb01rst analysis intersected a window where accuracy is\nhalf of humans and function call is twice as humans (black rectangle in Fig. 3). We \ufb01rst \ufb01nd those\nparameters that led these algorithms to their closest performance to humans. We then run them again\nand this time save their searched locations for further analysis.\nWe design 4 evaluation scores to quantify simi-\nlarities between algorithms and humans on each\nfunction: 1) mean sequence distance between\nan algorithm\u2019s searched locations and human\nsearched locations, in each trial for the \ufb01rst 5\nclicks, 2) mean shortest distance between an\nalgorithm\u2019s searched locations and all human\nclicks (i.e., point matching), 3) agreement be-\ntween probability distributions of searched lo-\ncations by all humans and an algorithm, and 4)\nagreement between pdfs of normalized step sizes\n(to [0 1] on each trial). Let pm(t) and ph(t) be\npdfs of the search statistic t by an algorithm and\nhumans, respectively. The agreement between\npm and ph is de\ufb01ned as pm(argmaxt ph(t))\n(i.e., the value of an algorithm\u2019s pdf at the loca-\ntion of maximum for human pdf). Median scores\nFigure 3: Human vs. algorithm 1D search accuracy.\n(over all 25 functions) are depicted in Fig. 4. Dis-\ntance score is lower for Bayesian models compared to DIRECT, Random, PSO, and GP-Osborne al-\ngorithms (Fig. 4.a). Point matching distances are lower for GP-MPI, and GP-UCB (Fig. 4.b). These\ntwo algorithms also show higher agreement to humans in terms of searched locations (Fig. 4.c). The\nclearest pattern happens over step size agreement with BO methods (except GP-MV) being closest\nto humans (Fig. 4.d). GP-MPI and GP-UCB show higher resemblance to human search behavior\nover all scores. Further, we measure the regret of algorithms and humans de\ufb01ned as fmax(\u00b7) \u2212 f\u2217\nwhere f\u2217 is the best value found so far for up to 15 function calls averaged over all trials. As shown\nin Fig. 4.e, BO models approach the maximum of f (\u00b7) as fast as humans. Hence, although imperfect,\nBO algorithms overall are the most similar to humans, out of all algorithms tested.\nThree reasons prompt us to consider BO methods as promising candidates for modeling human\nbasic search: 1) BO methods perform ef\ufb01cient search in a way that resembles human behavior in\nterms of accuracy and search statistics (results of Exp. 1), 2) BO methods exploit GP which offers\na principled and elegant approach for adding structure to Bayesian models (in contrast to purely\ndata-driven Bayesian). Furthermore, the sequential nature of the BO and updating the GP posterior\nafter each function call seems a natural strategy humans might be employing, and 3) GP models\nexplain function learning in humans over simple functional relationships (linear and quadratic) [24].\nFunction learning and search mechanisms are linked in the sense that, to conduct ef\ufb01cient search,\none needs to know the search landscape and to progressively update one\u2019s knowledge about it.\n\n4\n\n FminSearchFminBndFminuncPSOGAGDSAmultFminSmultFminBmultFminURandomGP\u2212OsborneGP\u2212Osborne\u2212GDirectminFunc\u2212cgminFunc\u2212csdminFunc\u2212sdminFunc\u2212qnewtonminFunc\u2212lbfgsGP\u2212MUIGP\u2212MMGP\u2212MPIGP\u2212UCB-OptGP\u2212UCBGP\u2212MEIGP\u2212MVHuman00.10.20.30.40.50.60.70.80.91101102hit ratefunction calls\fFigure 4: Results of our second-level analysis. The lower the distance and the higher the agreement, the better\n(red arrows). Boxes represent median (red line) and 25 th, 75 th percentiles. Panel (e) shows average regret of\nalgorithms and humans (f\u2217 is normalized to fmax for each trial separately).\nWe thus designed 6 additional controlled experiments to further explore GP as a uni\ufb01ed computa-\ntional principle guiding human function learning and active search. In particular, we investigate the\nbasic idea that humans might be following a GP, at least in the continuous domain, and change some\nof its properties to cope with different tasks. For example, humans may use GP to choose the next\npoint to dynamically balance exploration vs. exploitation (e.g., in search task), or to estimate the\nfunction value of a point (e.g., in function interpolation). In experiments 2 to 5, subjects performed\ninterpolation and extrapolation tasks, as well as active versions of these tasks by choosing points to\nhelp them learn about the functions. In experiments 6 and 7, we then return to the optimization task,\nfor a detailed model-based analysis of human search behavior over functions from the same family.\nNote that many real-world problems can be translated into our synthetic tasks here.\nWe used polynomial functions of degree 2, 3, and 5 as our stimuli (denoted as Deg2, Deg3, and\nDeg5, respectively). Two different sets of functions were generated for training and testing, shown in\nFig. 5. For each function type, subjects completed 10 training trials followed by 30 testing trials. As\nin Exp. 1, function plots were disclosed to subjects only during training. To keep subjects engaged,\nin addition to the competition for a prize, we showed them the magnitude of error during both\ntraining and testing sessions. In experiments 2 to 6, we \ufb01tted a GP to different types of functions\nusing the same set of (x, y) points shown to subjects during training (Fig. 6). A grid search was\nconducted to learn GP parameters from the training functions to predict subjects\u2019 test data.\n2.2 Experiments 2 & 3: Interpolation and Active Interpolation\nParticipants. Twenty subjects (7m, 13f) aged 18 to 22 participated (mean: 19.75 \u00b1 1.06 yr).\nIn the interpolation task, on each function, subjects were shown 4 points x \u2208\nProcedure.\n{\u2212300, a, b, 300} along with their f (x) values. Points a and b were generated randomly once in\nadvance and were then tied to each function. Subjects were asked to guess the function value at the\ncenter (x = 0) as accurately as possible. In the active interpolation task, the same 4 points as in inter-\npolation were shown to subjects. Subjects were \ufb01rst asked to choose a 5th point between [\u2212300 300]\nto see its y = f (x) value. They were then asked to guess the function value at a randomly-chosen\n6th x location as accurately as possible. Subjects were instructed to pick the most informative \ufb01fth\npoint regarding estimating the function value at the follow-up random x (See Fig. 6).\nResults. Fig. 7.a shows mean distance of human clicks from the GP mean at x = 0 over test tri-\nals (averaged over absolute pairwise distances between clicks and the GP) in the interpolation task.\nHuman errors rise as functions become more complicated. Distances of the GP and the actual func-\ntion from humans are the same over Deg2 and Deg3 functions (no signi\ufb01cant difference in medians,\nWilcoxon signed-rank test, p > 0.05). Interestingly, on Deg5 functions, GP is closer to human\nclicks than the actual function (signed-rank test, p = 0.053) implying that GP captures clicks well\nin this case. GP did \ufb01t the human data even better than the actual function, thereby lending support\nto our hypothesis that GP may be a reasonable approximation to human function estimation.\nCould it be that subjects locally \ufb01t a line to the two middle points to guess f (0)? To evaluate\nthis hypothesis, we measured the distance, at x = 0, from human clicks to a line passing through\n(a, f (a)) and (b, f (b)). By construction of our stimuli, a line model explains human data well on\nDeg3 and Deg5, but fails dramatically on Deg2 which de\ufb02ect around the center. GP is signi\ufb01cantly\nbetter than the line model on Deg2 (p < 0.0005), while being as good on Deg3 and Deg5 (p = 0.78).\n\nFigure 5: Train and test polynomial stimuli used in experiments 2 through 6.\n\n5\n\nefunction callregret (fmax - f*)24681012140.10.20.30.40.50.60.70.8 PSOFminSearchFminBndRandomGP\u2212OsborneGP\u2212Osbo.\u2212GDIRECTGP-UCBGP-MEIGP-MPIGP-MUIGP-MVGP-UCB-OptPSOFminSearchFminBndRandomGP\u2212OsborneGP\u2212Osbo.\u2212GDIRECTGP-UCBGP-MEIGP-MPIGP-MUIGP-MVGP-UCB-OptPSOFminSearchFminBndRandomGP\u2212OsborneGP\u2212Osbo.\u2212GDIRECTGP-UCBGP-MEIGP-MPIGP-MUIGP-MVGP-UCB-Opt1PSOFminSearchFminBndRandomGP\u2212OsborneGP\u2212Osbo.\u2212GDIRECTGP-UCBGP-MEIGP-MPIGP-MUIGP-MVGP-UCB-Optagreement on search step sizes00.20.40.60.81agreement on searched locations00.10.20.30.40.50.6sequence distance100150200250point matching distance01234cbdaPSORandomGP\u2212OsborneGP\u2212Osb.\u2212GDIRECTGP-UCBGP-MEIGP-MPIGP-MUIGP-MVGP-UCB-OptHumanDeg3-TestDeg3-TrainDeg5-TestDeg5-TrainDeg2-Test\u2212300\u2212200\u22121000100200300020406080100xy\u2212300\u2212200\u22121000100200300020406080100x\u2212300\u2212200\u22121000100200300020406080100x\u2212300\u2212200\u22121000100200300020406080100x\u2212300\u2212200\u22121000100200300020406080100x\u2212300\u2212200\u22121000100200300020406080100xDeg2-Train\fFigure 6: Illustration of experiments. In extrapolation, polynomials (degrees 1, 2 & 3) fail to explain our data.\nAnother possibility could be that humans choose a point randomly on the y axis, thus discarding the\nshape of the function. To control for this, we devised two random selection strategies. The \ufb01rst one\nuniformly chooses y values between 0 and 100. The second one, known as shuf\ufb02ed random (SRand),\ntakes samples from the distribution of y values selected by other subjects over all functions. The\npurpose is to account for possible systematic biases in human selections. We then calculate the\naverage of the pairwise distances between human clicks and 200 draws from each random model.\nBoth random models fail to predict human answers on all types of functions (signi\ufb01cantly worse\nthan GP, signed-rank test ps < 0.05). One advantage of the GP over other models is providing a\nlevel of uncertainty at every x. Fig. 7.a (inset) demonstrates similar uncertainty patterns for humans\nand the GP, showing that both uncertainties (at x = 0) rise as functions become more complicated.\nInterpolation results suggest that humans try to capture the shape of functions. If this is correct,\nwe expect that humans will tend to click on high uncertainty regions (according to GP std) in the\nactive interpolation task (see Fig. 6 for an example). Fig. 7.b shows the average of GP standard\ndeviation at locations of human selections. Humans did not always choose x locations with the\nhighest uncertainty (shown in red in Fig. 7.b). One reason for this might be that several regions had\nabout the same std. Another possibility is because subjects had slight preference to click toward\ncenter. However, GP std at human-selected locations was signi\ufb01cantly higher than the GP std at\nrandom and SRand points, over all types of functions (signed-rank test, ps < 1e\u20134; non-signi\ufb01cant\non Deg2 vs. SRand p = 0.18). This result suggests that since humans did not know in advance\nwhere a follow-up query might happen, they chose high-uncertainty locations according to GP, as\nclicking at those locations would most shrink the overall uncertainty about the function.\n2.3 Experiments 4 & 5: Extrapolation and Active Extrapolation\nParticipants. 16 new subjects (7m, 9f) completed experiments 4 and 5 (Age: 19.62 \u00b1 1.45 yr).\nProcedure. Three points x \u2208 {\u2212300, c, 100} and their y values were shown to subjects. Point c\nwas random, speci\ufb01c to each function. In the extrapolation task, subjects were asked to guess the\nfunction value at x = 200 as accurately as possible (Fig. 6). In the active extrapolation task, subjects\nwere asked to choose the most informative 4th point in [\u2212300 100] regarding estimating f (200).\nResults. A similar analysis as in the interpolation task is conducted. As seen in Fig. 7.c, in alignment\nwith interpolation results, humans are good at Deg2 and Deg3 but fail on Deg5, and so does the GP\nmodel. Here again, with Deg5, GP and humans are closer to each other than to the actual function,\nfurther suggesting that their behaviors and errors are similar. There is no signi\ufb01cant difference\nbetween GP and the actual functions over all three function types (signed-rank test; p > 0.25).\nInterestingly, a line model \ufb01tted to points c and 100 is impaired signi\ufb01cantly (p < 1e\u20135 vs. GP)\nover all function types (Fig. 6). Both random strategies also performed signi\ufb01cantly worse than GP\non this task (signed-rank test; ps < 1e\u20136). SRand performs better than uniform random, indicating\n\nFigure 7: a) Mean distance of human clicks from models. Errors bars show standard error of the mean (s.e.m)\nover test trials. Inset shows the standard deviation of humans and the GP model at x = 0. b) mean GP std at\nhuman vs. random clicks in active interpolation. c and d correspond to a and b, for the extrapolation task.\n\n6\n\nActive InterpolationOptimization 2Optimization 3ExtrapolationActive Extrapolation\u2212300\u2212200\u22121000100200300050100150200polynomial deg 3linedeg 2mean humanmean GPInterpolationxy\u2212300\u2212200\u22121000100200300020406080100humanrandmax varianceshu!ed rand\u2212300\u2212200\u22121000100200300\u221250050100human clicks GP meanGP stdhuman clicks(mean + std)Actual func\u2212300\u2212200\u22121000100200300050100GP-MVGP-UCBGP-MPIGP-MEI\u2212300\u2212200\u22121000100200300020406080100mean humanclick 2mean humanclick 1histogram of clicks\u2212300\u2212200\u22121000100200300020406080100mean standard deviation of chosen locationsmean standard deviation of chosen locationsmean distance of human selections from a modelmean distance of human selections from a modelDeg2Deg3Deg50481216 humanrandomshuffled Randommax stdmin stdDeg2Deg3Deg50246810121416b) Active Interpolationa) Interpolationd) Active Extrapolation humanrandomshuffled Randommax stdmin stdDeg2Deg3Deg5515253545618STD HumanGP actual funclineGPrandYshuffledRandY Deg2Deg3Deg510203040 actual funclineGPrandYshuffledRandYSTD1019 HumanGP c) Extrapolation\fFigure 8: Results of the optimization task 2. a and b) distance of human and random clicks from locations of\nmax Q (i.e., GP mean and max GP std). c) actual function and GP mean values at human and random clicks.\nd) normalized GP standard deviation at human vs. random clicks. Errors bars show s.e.m over test trials.\nexistence of systematic biases in human clicks. Subjects learned that f (200) did not happen on\nextreme lows or highs (same argument is true for f (0) in interpolation). As in the interpolation\ntask, GP and human standard deviations rise as functions become more complex (Fig. 7.c; inset).\nActive extrapolation (Fig. 7.d), similar to active interpolation, shows that humans tended to choose\nlocations with signi\ufb01cantly higher uncertainty than uniform and SRand points, for all function types\n(ps < 0.005). Some subjects in this task tended to click toward the right (close to 100), maybe\nto obtain a better idea of the curvature between 100 and 200. This is perhaps why the ratio of\nhuman std to max std is lower in active extrapolation compared to active interpolation (0.75 vs.\n0.82), suggesting that maybe humans used an even more sophisticated strategy on this task.\n2.4 Experiment 6: Optimization 2\nParticipants were another 21 subjects (4m, 17f) in the age range of 18 to 22 (mean: 20 \u00b1 1.18).\nProcedure. Subjects were shown function values at x \u2208 {\u2212300,\u2212200, 200, 300} and were asked\nto \ufb01nd the x location where they think the function\u2019s maximum is. They were allowed to make two\nequally important clicks and were shown the function value after each one. For quadratic functions,\nwe only used 13 concave-down functions that have one unique maximum.\nResults. We perform two analyses shown in Fig. 8. In the \ufb01rst one, we measure the mean distance\nof human clicks (1st and 2nd clicks) from the location of the maximum GP mean and maximum GP\nstandard deviation (Fig. 8.a). We updated the GP after the \ufb01rst click.\nWe hypothesized that the human \ufb01rst click would be at a location of high GP variance (to reduce\nuncertainty about the function), while the second click would be close to the location of highest GP\nmean (estimated function maximum). However, results showed that human 1st clicks were close to\nthe max GP mean and not very close to the max GP std. Human 2nd clicks were even closer (signed-\nrank test, p < 0.001) to the max GP mean and further away from the max GP std (p < 0.001).\nThese two observations together suggest that humans might have been following a Gaussian process\nwith a selection criterion heavily biased towards \ufb01nding the maximum, as opposed to shrinking the\nmost uncertain region. Repeating this analysis for random clicks (uniform and SRand) shows quite\nthe opposite trend (Fig. 8.b). Random locations are further apart from maximum of the GP mean\n(compared to human clicks) while being closer to the maximum of the GP std point (compared\nto human clicks). This cross pattern between human and random clicks (wrt. GP mean and GP\nstd) shows a systematic search strategy utilized by humans. Distances of human clicks from the\nmax GP mean and max GP std rise as functions become more complicated. In the second analysis\n(Fig. 8.c), we measure actual function and GP values at human and random clicks. Humans had\nsigni\ufb01cantly higher function values at their 2nd clicks; p < 1e\u20134 (so was true using GP; p < 0.05).\nValues at random points are signi\ufb01cantly lower than human clicks. Humans were less accurate as\nfunctions became more complex, as indicated by lower function values. Finally, Fig. 8.d shows that\nhumans chose points with signi\ufb01cantly less std (normalized to the entire function) in their 2nd clicks\ncompared to random and \ufb01rst clicks. Human 1st clicks have higher std than uniform random clicks.\n2.5 Experiment 7: Optimization 3\nParticipants were 19 new subjects (6m, 13f) in the age range of 19 to 25 (mean: 20.26 \u00b1 1.64 yr).\nStimuli. Functions were sampled from a Gaussian process with predetermined parameters to assure\nfunctions come from the same family and resemble each other (as opposed to Exp. 1; See Fig. 6).\nProcedure was the same as in Exp. 1. Number of train and test trials, in order, were 10 and 20.\n\n7\n\nhuman1human2human1human2randomshuffled Randomrandomshuffled RandomDeg2Deg3Deg5Deg2Deg3Deg5Deg2Deg3Deg5Deg2Deg3Deg5Deg2Deg3Deg5c) left: mean selected function values right: mean selected GP valuesa) distance of human clicksfrom location of max Qb) distance of random clicksfrom location of max Qd) mean selected GP std values2060100140180 60708090100100 0.20.30.40.50.60.70.8 2060100140180rand \u2212 meanrand \u2212 stdSRand\u2212meanSRand\u2212std human1 \u2212 meanhuman1 \u2212 stdhuman2 \u2212 meanhuman2 \u2212 std\fFigure 9: Exploration vs. exploitation balance in Optimization 3 task.\n\nResults. Subjects had average accuracy of\n0.76 \u00b1 0.11 (0.5 \u00b1 0.18 on train) over all\nsubjects and functions, and average clicks\nof 8.86 \u00b1 1.12 (7.15 \u00b1 1.2 on train) be-\nfore ending a search trial. To investigate\nthe sequential strategy of subjects, we pro-\ngressively updated a GP using a subject\u2019s\nclicks on each trial, and exploited this GP\nto evaluate the next click of the same sub-\nject. In other words, we attempted to know\nto what degree a subject follows a GP. Re-\nsults are shown in Fig. 9. The regret of the\nGP model and humans decline with more\nclicks, implying that humans chose infor-\nmative clicks regarding optimization (\ufb01gure inset). Humans converge to the maximum location\nslightly faster than a GP \ufb01tted to their data, and much faster than random.\nFig. 9.a shows that subjects get closer to the location of maximum GP mean and further away from\nmax GP std (for 15 clicks). Fig. 9.b shows the normalized mean and standard deviation of human\nclicks (from the GP model), averaged over all trials. At about 6.4 clicks, subjects are at 58% of\nthe function maximum while they have reduced the variance by 42%. Interestingly, we observe\nthat humans tended to click on higher uncertainty regions (according to GP) in their \ufb01rst 6 clicks\n(average over all subjects and functions), then gradually relying more on the GP mean (i.e., balancing\nexploration vs. exploitation). Results of optimization tasks suggest that human clicks during search\nfor a maximum of a 1D function can be predicted by a Gaussian process model.\n3 Discussion and Conclusion\nOur contributions are twofold: First, we found a striking capability of humans in 1D function opti-\nmization. In spite of the relative naivety of our subjects (not maths or engineering majors), the high\nhuman ef\ufb01ciency in our search task does open the challenge that even more ef\ufb01cient optimization\nalgorithms must be possible. Additional pilot investigations not shown here suggest that humans\nmay perform even better in optimization when provided with \ufb01rst and second derivatives. Following\nthis road may lead to designing ef\ufb01cient selection criteria for BO methods (for example new ways to\naugment gradient information with BO). However, it remains to be addressed how our \ufb01ndings scale\nup to higher dimensions and benchmark optimization problems. Second, we showed that Gaussian\nprocesses provide a reasonable (though not perfect) unifying theoretical account of human function\nlearning, active learning, and search (GP plus a selection strategy). Results of experiments 2 to 5\nlead to an interesting conclusion: In interpolation and extrapolation tasks, subjects try to minimize\nthe error between their estimation and the actual function, while in active tasks they change their\nobjective function to explore uncertain regions. In the optimization task, subjects progressively sam-\nple the function, update their belief and use this belief again to \ufb01nd the location of maximum. (i.e.,\nexploring new parts of the search space and exploiting parts that look promising).\nOur \ufb01ndings support previous work by Grif\ufb01ths et al. [24] (also [25, 26, 27]). Yet, while they\nshowed that Gaussian processes can predict human errors and dif\ufb01culty in function learning, here\nwe focused on explaining human active behavior with GP, thus extending explanatory power of GP\none step ahead. One study showed promising evidence that our results may extend to a larger class of\nnatural tasks. Najemnik and Geisler [6, 28, 29] proposed a Bayesian ideal-observer model to explain\nhuman eye movement strategies during visual search for a small Gabor patch hidden in noise. Their\nmodel computes posterior probabilities and integrates information across \ufb01xations optimally. This\nprocess can be formulated with BO with an exploitative search mechanism (i.e., GP-MM). Castro\net al. [30] studied human active learning on the well-understood problem of \ufb01nding the threshold\nin a binary search problem. They showed that humans perform better when they can actively select\nsamples, and their performance is nearly optimal (below the theoretical upper bound). However,\nthey did not address how humans choose the next best sample. One aspect of our study which we\ndid not elaborate much here, is the satis\ufb01cing mechanisms humans used in our task to decide when\nto end a trial. Further modeling of our data may be helpful to develop new stopping criteria for\nactive learning methods. Related efforts have studied strategies that humans may use to quickly \ufb01nd\nan object (i.e., search, active vision) [31, 32, 33, 34, 35], optimal foraging [36], and optimal search\ntheories [4, 5], which we believe could all now be revisited with GP as an underlying mechanism.\nSupported by NSF (CCF-1317433, CMMI-1235539) and ARO (W911NF-11-1-0046, W911NF-12-1-0433).\n\n8\n\nnumber of clicksnumber of clicksregretmean distance from max GP mean and std13579111315050100150200250 human\u2212 max GP meanhuman\u2212 max GP std normalized GP meannormalized GP std humanrandomshuffled randomGP05101510\u2212210\u2212110005101500.20.40.60.81a)b)\fReferences\n[1] B. Settles, \u201cActive learning.,\u201d in Morgan & Claypool, 2012. 1\n[2] D. Jones, \u201cA taxonomy of global optimization methods based on response surfaces.,\u201d Journal of Global\n\nOptimization, vol. 21, pp. 345\u2013383, 2001. 1, 3\n\n[3] E. Brochu, M. Cora, and N. de Freitas, \u201cA tutorial on bayesian optimization of expensive cost functions,\n\nwith application to active user modeling and hierarchical reinforcement learning,\u201d Tech R., 2009. 1, 3\n\n[4] L. D. Stone, \u201cThe theory of optimal search,\u201d 1975. 1, 8\n[5] B. O. Koopman, \u201cThe theory of search. i. kinematic bases.,\u201d Operations Research, vol. 4, 1956. 1, 8\n[6] J. Najemnik and W. S. Geisler, \u201cOptimal eye movement strategies in visual search.,\u201d Nature, 2005. 1, 8\n[7] W. B. Powell and I. O. Ryzhov in Optimal Learning. (J. Wiley and Sons., eds.), 2012. 1\n[8] V. V. Fedorov, Theory of Optimal Experiments. Academic Press, 1972. 1\n[9] C. E. Rasmussen and C. K. I. Williams, \u201cGaussian processes for machine learning.,\u201d in MIT., 2006. 1, 3\n[10] J. A. Nelder and R. Mead, \u201cA simplex method for function minimization,\u201d Computer Journa, 1965. 3\n[11] J. Kennedy and R. Eberhart, \u201cParticle swarm optimization.,\u201d in proc. IEEE ICNN., 1995. 3\n[12] J. H. Holland, \u201cAdoption in natural and arti\ufb01cial systems.,\u201d 1975. 3\n[13] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. 1989. 3\n[14] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, \u201cOptimization by simulated annealing.,\u201d Science., 1983. 3\n[15] D. Finkel, DIRECT Optimization Algorithm User Guide. 2003. 3\n[16] A. Moore, J. Schneider, J. Boyan, and M. S. Lee, \u201cQ2: Memory-based active learning for optimizing\n\nnoisy continuous functions.,\u201d in ICML, pp. 386\u2013394, 1998. 3\n\n[17] D. Lewis and W. Gale, \u201cA sequential algorithm for training text classi\ufb01ers.,\u201d in In Proc. ACM SIGIR\n\nConference on Research and Development in Information Retreival, 1994. 3\n\n[18] H. J. Kushner, \u201cA new method of locating the maximum of an arbitrary multipeak curve in the presence\n\nof noise.,\u201d J. Basic Engineering, vol. 86, pp. 97\u2013106, 1964. 3\n\n[19] J. Elder, \u201cGlobal rd optimization when probes are expensive: the grope algorithm,\u201d in IEEE International\n\nConference on Systems, Man and Cybernetics, 1992. 3\n\n[20] J. Mockus, V. Tiesis, and A. Zilinskas, \u201cThe application of bayesian methods for seeking the extremum.,\u201d\n\nin Towards Global Optimization. (L. Dixon and E. Szego, eds.), 1978. 3\n\n[21] M. Locatelli, \u201cBayesian algorithms for one-dimensional global optimization.,\u201d Journal of Global Opti-\n\nmization., vol. 10, pp. 57\u201376, 1997. 3\n\n[22] D. D. Cox and S. John, \u201cA statistical method for global optimization,\u201d in In Proc. IEEE Conference on\n\nSystems, Man and Cybernetics, 1992. 3\n\n[23] M. Osborne, R. Garnett, and S. Roberts, \u201cGaussian processes for global optimization,\u201d in LION3, 2009. 3\n[24] T. L. Grif\ufb01ths, C. Lucas, J. J. Williams, and M. L. Kalish, \u201cModeling human function learning with\n\ngaussian processes.,\u201d in NIPS, 2009. 4, 8\n\n[25] B. R. Gibson, X. Zhu, T. T. Rogers, C. Kalish, and J. Harrison, \u201cHumans learn using manifolds, reluc-\n\ntantly,\u201d in NIPS, 2010. 8\n\n[26] J. D. Carroll, \u201cFunctional learning: The learning of continuous functional mappings relating stimulus and\n\nresponse continua.,\u201d in Education Testing Service, Princeton, NJ, 1963. 8\n\n[27] K. Koh and D. E. Meyer, \u201cFunction learning: Induction of continuous stimulus-response relations,\u201d Jour-\n\nnal of Experimental Psychology: Learning, Memory, and Cognition, vol. 17, pp. 811\u2013836, 1991. 8\n\n[28] J. Najemnik and G. Geisler, \u201cEye movement statistics in humans are consistent with an optimal search\n\nstrategy,\u201d Journal of Vision, vol. 8, no. 3, pp. 1\u201314, 2008. 8\n\n[29] W. S. Geisler and R. L. Diehl, \u201cA bayesian approach to the evolution of perceptual and cognitive systems.,\u201d\n\nCogn. Sci., vol. 27, pp. 379\u2013402, 2003. 8\n\n[30] R. Castro, C. Kalish, R. Nowak, R. Qian, T. Rogers, and X. Zhu, \u201cHuman active learning,\u201d NIPS, 2008. 8\n[31] P. V. J. Palmer and M. Pavel, \u201cThe psychophysics of visual search.,\u201d Vision Research., vol. 40, 2000. 8\n[32] J. M. Wolfe in Attention. (H. E. S. in Attention (ed. Pashler, H.) 13-74 (Psychology Press, ed.), 1998. 8\n[33] K. Nakayama and P. Martini, \u201cSituating visual search,\u201d Vision Research, vol. 51, pp. 1526\u20131537, 2011. 8\n[34] M. P. Eckstein, \u201cVisual search: A retrospective.,\u201d Journal of Vision, vol. 11, no. 5, 2011. 8\n[35] A. Borji and L. Itti, \u201cState-of-the-art in modeling visual attention.,\u201d IEEE PAMI, 2012. 8\n[36] A. C. Kamil, J. R. Krebs, and H. R. Pulliam, Foraging Behavior. (ed) 1987 (New York: Plenum). 8\n\n9\n\n\f", "award": [], "sourceid": 68, "authors": [{"given_name": "Ali", "family_name": "Borji", "institution": "University of Southern California (USC)"}, {"given_name": "Laurent", "family_name": "Itti", "institution": "University of Southern California (USC)"}]}