{"title": "Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger", "book": "Advances in Neural Information Processing Systems", "page_first": 10738, "page_last": 10748, "abstract": "We formulate the problem of defogging as state estimation and future state prediction from previous, partial observations in the context of real-time strategy games. We propose to employ encoder-decoder neural networks for this task, and introduce proxy tasks and baselines for evaluation to assess their ability of capturing basic game rules and high-level dynamics. By combining convolutional neural networks and recurrent networks, we exploit spatial and sequential correlations and train well-performing models on a large dataset of human games of StarCraft: Brood War. Finally, we demonstrate the relevance of our models to downstream tasks by applying them for enemy unit prediction in a state-of-the-art, rule-based StarCraft bot. We observe improvements in win rates against several strong community bots.", "full_text": "Forward Modeling for Partial Observation\n\nStrategy Games - A StarCraft Defogger\n\nGabriel Synnaeve\u2217\n\nFacebook, NYC\n\ngab@fb.com\n\nZeming Lin\u2217\nFacebook, NYC\nzlin@fb.com\n\nJonas Gehring\nFacebook, Paris\n\nDan Gant\n\nFacebook, NYC\n\njgehring@fb.com\n\ndanielgant@fb.com\n\nVegard Mella\nFacebook, Paris\n\nVasil Khalidov\nFacebook, Paris\n\nvegardmella@fb.com\n\nvkhalidov@fb.com\n\nNicolas Carion\nFacebook, Paris\nalcinos@fb.com\n\nNicolas Usunier\nFacebook, Paris\nusunier@fb.com\n\nAbstract\n\nWe formulate the problem of defogging as state estimation and future state predic-\ntion from previous, partial observations in the context of real-time strategy games.\nWe propose to employ encoder-decoder neural networks for this task, and introduce\nproxy tasks and baselines for evaluation to assess their ability of capturing basic\ngame rules and high-level dynamics. By combining convolutional neural networks\nand recurrent networks, we exploit spatial and sequential correlations and train\nwell-performing models on a large dataset of human games of StarCraft R(cid:13): Brood\nWar R(cid:13)\u2020. Finally, we demonstrate the relevance of our models to downstream tasks\nby applying them for enemy unit prediction in a state-of-the-art, rule-based Star-\nCraft bot. We observe improvements in win rates against several strong community\nbots.\n\n1\n\nIntroduction\n\nA current challenge in AI is to design policies to act in complex and partially observable environments.\nMany real-world scenarios involve a large number of agents that interact in different ways, and only a\nfew of these interactions are observable at a given point in time. Yet, long-term planning is possible\nbecause high-level behavioral patterns emerge from the agents acting to achieve one of a limited\nset of long-term goals, under the constraints of the dynamics of the environment. In contexts where\nobservational data is cheap but exploratory interaction costly, a fundamental question is whether we\ncan learn reasonable priors \u2013 of these purposeful behaviors and the environment\u2019s dynamics \u2013 from\nobservations of the natural \ufb02ow of the interactions alone.\n\nWe address this question by considering the problems of state estimation and future state prediction\nin partially observable real-time strategy (RTS) games, taking StarCraft: Brood War as a running\nexample. RTS games are multi-player games in which each player must gather resources, build an\neconomy and recruit an army to eventually win against the opponent. Each player controls their units\nindividually, and has access to a bird\u2019s-eye view of the environment where only the vicinity of the\nplayer\u2019s units is revealed.\n\nThough still arti\ufb01cial environments, RTS games offer many of the properties of real-world scenarios\nat scales that are extremely challenging for the current methods. A typical state in StarCraft can be\nrepresented by a 512 \u00d7 512 2D map of \u201cwalk tiles\u201d, which contains static terrain and buildings, as\n\n\u2217These authors contributed equally\n\u2020StarCraft is a trademark or registered trademark of Blizzard Entertainment, Inc., in the U.S. and/or other\ncountries. Nothing in this paper should be construed as approval, endorsement, or sponsorship by Blizzard\nEntertainment, Inc.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fwell as up to 200 units per player. These units can move or attack anywhere in the map, and players\ncontrol them individually; there are about 45 unit types, each of which has speci\ufb01c features that de\ufb01ne\nthe consequences of actions. Top-level human players perform about 350 actions per minute [1].\nThe high number of units interacting together makes the low-level dynamics of the game seemingly\nchaotic and hard to predict. However, when people play purposefully, at lower resolution in space\nand time, the \ufb02ow of the game is intuitive for humans. We formulate the tasks of state estimation and\nfuture state prediction as predicting unobserved and future values of relevant high-level features of\nthe game state, using a dataset of real game replays and assuming full information at training time.\nThese high-level features are created from the raw game state by aggregating low-level information at\ndifferent resolutions in space and time. In that context, state estimation and future state prediction are\nclosely related problems, because hidden parts of the state can only be predicted by estimating how\ntheir content might have changed since the last time they were observed. Thus both state estimation\nand future state prediction require learning the natural \ufb02ow of games.\n\nWe present encoder-decoder architectures with recurrent units in the latent space, evaluate these\narchitectures against reasonable rule-based baselines, and show they perform signi\ufb01cantly better than\nthe baselines and are able to perform non-trivial prediction of tactical movements. In order to assess\nthe relevance of the predictions on downstream tasks, we inform the strategic and tactical planning\nmodules of a state-of-the-art full game bot with our models, resulting in signi\ufb01cant increases in\nterms of win rate against multiple strong open-source bots. We release the code necessary for the\nreproduction of the project at https://github.com/facebookresearch/starcraft_defogger.\n\n2 Related Work\n\nEmploying unsupervised learning to predict future data from a corpus of historical observations\nis sometimes referred to as \u201cpredictive learning\u201d. In this context, one of the most fundamental\napplications for which deep learning has proven to work well is language modeling, i.e. predicting\nthe next word given a sequence of previous words [2, 3, 4]. This has inspired early work on predicting\nfuture frames in videos, originally on raw pixels [5, 6] and recently also in the space of semantic\nsegmentations [7]. Similar approaches have been used to explicitly learn environment dynamics\nsolely based on observations: in [8], models are being trained to predict future images in generated\nvideos of falling block towers with the aim of discovering fundamental laws of gravity and motion.\n\nCombining the most successful neural network paradigms for image data (CNNs) and sequence\nmodeling (LSTMs) to exploit spatio-temporal relations has led to fruitful applications of the above\nideas to a variety of tasks. [9] propose a LSTM with convolutional instead of linear interactions and\ndemonstrate good performance on predicting percipitation based on radar echo images; [10] use a\nsimilar approach to forecast passenger demand for on-demand ride services; [11] estimate future\ntraf\ufb01c in Bejing based on image representations of road networks.\n\nStarCraft: Brood War has long been a popular test bed and challenging benchmark for AI algorithms\n[12]. The StarCraft domain features partial observability; hence, a large part of the current game state\nis unknown and has to be estimated. This has been previously attempted with a focus on low-level\ndynamics, e.g. by modeling the movement of individual enemy units with particle \ufb01lters [13]. [14] try\nto anticipate timing, army composition and location of upcoming opponent attacks with a bayesian\nmodel that explicitly deals with uncertainty due to the fog of war. [15] do not deal with issues caused\nby partial information but demonstrate usage of a combat model (which is conditioned on both state\nand action) learned from replay data that can be used in Monte Carlo Tree Search.\n\nIn the above works on StarCraft, features for machine learning techniques were hand-crafted with\nextensive domain knowledge. Deep learning methods, aiming to learn directly from raw input data,\nhave only very recently been applied to this domain in the context of reinforcement learning in limited\nscenarios, e.g. [16, 17]. To our best knowledge, the only deep learning approach utilizing replays of\nhuman games to improve actual full-game play is [18], which use a feed-forward model to predict\nthe next unit that the player should produce. They apply their model as a production manager in a\nStarCraft bot and achieve a win rate of 68% against the game\u2019s built-in AI.\n\n2\n\n\f3 Task Description\n\nFormally, we consider state estimation and future state prediction in StarCraft to be the problem of\ninferring the full game state, yt+s for s \u2265 0, given a sequence of past and current partial observations,\no0, . . . , ot. In this work, we restrict full state to be the locations and types of all units on the game\nmap, ignoring attributes such as health. A partial observation ot contains all of a player\u2019s units as\nwell as enemy or neutral units (e.g. resource units) in their vicinity, subject to sight range per unit\ntype. Note that during normal play, we do not have access to yt. We can however utilize past games\nfor training as replaying them in StarCraft provides both ot and yt.\n\nWe note that humans generally make forward predictions on a high level at a long time-scale \u2013 humans\nwill not make \ufb01ne predictions but instead predict a general composition of units over the whole game\nmap. Thus, we propose to model a low-resolution game state by considering accumulated counts of\nunits by type, downsampled onto a coarse spatial grid. We ignore dynamic unit attributes like health,\nenergy or weapon cool-down. By downsampling spatially, we are unable to account for minute\nchanges in e.g. unit movement but can capture high-level game dynamics more easily. This enables\nour models to predict relatively far into the future. Good players are able to estimate these dynamics\nvery accurately which enables them to anticipate speci\ufb01c army movements or economic development\nof their opponents, allowing them to respond by changing their strategy.\n\nStarCraft: Brood War is played on rectangular maps with sizes of up to 8192 \u00d7 8192 pixels. For most\npractical purposes it is suf\ufb01cient to consider \u201cwalk tiles\u201d instead which consist of 8 \u00d7 8 pixels each.\nIn our setup, we accumulate units over r \u00d7 r walk tiles, with a stride of g \u00d7 g; Figure 4 shows a grid\nfor r = 32 and g = 32 on top of a screenshot of StarCraft.\n\ng \u2309, Wr,g = \u2308 W \u2212r\n\nFor a map of size H \u00d7 W walk tiles, the observation ot and output yt are thus a Hr,g \u00d7 Wr,g \u00d7 Cu\ntensor, with Hr,g = \u2308 H\u2212r\ng \u2309, and number of channels Cu corresponding to the\nnumber of possible unit types. We use disjoint channels for allied and enemy units. Each element in\not and yt thus represents the absolute number of units of a speci\ufb01c type and player at a speci\ufb01c grid\nlocation, where ot only contains the part of the state observed by the current player. Additional static\ninformation \u03c4 includes (a) terrain features, a H \u00d7 W \u00d7 CT tensor that includes elements such as\nwalkability, buildability and ground height, and (b) the faction, fme and fop, that each player picks.\nThus, each input xt = (ot, \u03c4, fme, fop)\n\nAdditionally, we pick a temporal resolution of at least s = 5 seconds between consecutive states,\nagain aiming to model high-level dynamics rather than small but often irrelevant changes. At the\nbeginning of a game, players often follow a \ufb01xed opening and do not encounter enemy units. We\ntherefore do not consider estimating states in the \ufb01rst 3 minutes of the game. To achieve data\nuniformity and computational ef\ufb01ciency, we also ignore states beyond 11 minutes. In online and\nprofessional settings, StarCraft is usually played at 24 frames per second with most games lasting\nbetween 10 and 20 minutes [19].\n\n4 Encoder-Decoder Models\n\nThe broad class of architectures we consider is composed of convolutional models that can be\nsegmented in two parts: an encoder and a decoder, depicted in Figure 1.\n\nIn all models, we preprocess static information with small networks. To process the static terrain\ninformation \u03c4 , a convolutional network EM of kernel size r and stride g is used to downsample the\nH \u00d7 W \u00d7 CT tensor into an Hr,g \u00d7 Wr,g \u00d7 FT embedding. The faction of both players is represented\nby a learned embedding of size FF . Finally, this is replicated temporally and concatenated with the\ninput to generate a T \u00d7 Hr,g \u00d7 Wr,g \u00d7 (FT + FF + Cu) tensor as input to the encoder.\n\nThe encoder then embeds it into a FE sized embedding and passes it into a recurrent network with\nLSTM cells. The recurrent cells allow our models to capture information from previous frames,\nwhich is necessary in a partially observable environment such as StarCraft, because events that we\nobserved minutes ago are relevant to predict the hidden parts of the current state. Then, the input\nto the decoder D takes the FE sized embedding, replicate along the spatial dimension of ot and\nconcatenates it along the feature dimension of ot.\n\nThe decoder then uses D to produce an embedding with the same spatial dimensions as yt. This\nembedding is used to produce two types of predictions. The \ufb01rst one, Pc in Figure 1, is a global head\n\n3\n\n\fthat takes as input a spatial sum-pooling and is a linear layer with sigmoid outputs for each unit type\nthat predict the existence or absence of at least one unit of the corresponding type. This corresponds\nto g_op_b described in section 5.1.1. The second type of prediction Pr predicts the number of units\nof each type across the spatial grid at the same resolution as ot. This corresponds to the other tasks\ndescribed in section 5.1.1. The Pc heads are trained with binary cross entropy loss, while the Pr\nheads are trained with a Huber loss.\n\nstate\n\nFeats.\n\nfaction\n\nEF\n\nconcat\n\nc\no\nn\nc\na\nt\n\nmap\n\nEM\n\nEncoder\n\nRNN\n\nDecoder\n\nPr\n\nPr\n...\nPr\n\n\u02c6y0,0\n\n\u02c6y0,1\n\n\u02c6yHr,g ,Wr,g\n\n\u2295 Pc\n\n\u02c6c\n\nFigure 1: Simpli\ufb01ed architecture of the model. Rectangles denote 1 \u00d7 1 convolutions or MLPs,\ntrapezes denote convolutional neural networks, circles and loops denote recurrent neural networks. \u2295\ndenotes spatial pooling. The dashed arrows represent the additional connections in the architecture\nwith the convolutional LSTM encoder.\n\nWe describe the two types of encoders we examine, a ConvNet (C) encoder and a Convolutional-\nLSTMs (CL) encoder.\n\nSince the maps all have a maximum size, we introduce a simple ConvNet encoder with enough\ndownsampling to always obtain a 1 \u00d7 1 \u00d7 h encoding at the end. In our experiments we have to handle\ninput sizes of up to 16 \u00d7 16. Thus, we obtain a 1 \u00d7 1-sized output by applying four convolutional\nlayers with a stride of two each.\n\nOur CL encoder architecture is based on what we call a spatially-replicated LSTM: Given that, for a\nsequence of length T , a convolution outputs a T \u00d7 H \u00d7 W \u00d7 C sized tensor X, a spatially replicated\nLSTM takes as input each of the X(:, i, j, :) cells, and encodes its output to the same spatial location.\nThat is, the weights of the LSTM at each spatial location is shared, but the hidden states are not.\nThus, at any given layer in the network, we will have HW LSTMs with shared weights but unshared\nhidden states.\n\nThe CL encoder is made out of a few blocks, where each block is a convolution network, a down-\nsampling layer, and a spatially-replicated LSTM. This encoder is similar to the model proposed in\n[20], with a kernel size of 1 and additional downsampling and upsampling layers. The last output is\nfollowed by a global sum-pooling to generate the 1 \u00d7 1 \u00d7 FE sized embedding required.\n\nWe also introduce skip connections from the encoder to the decoder in the CL model, where each\nintermediate blocks\u2019 outputs are upsampled and concatenated with ot as well, so the decoder can take\nadvantage of the localized memory provided. With k blocks, we use a stride of 2 in each block, so\nthe output of the k-th block must be upsampled by a factor of 2k. Thus, only in the CL model do\nwe also concatenate the the intermediate LSTM cell outputs to the input to the decoder D. These\nskip-connections from the intermediate blocks are a way to propagate spatio-temporal memory to the\ndecoder at different spatial scales, taking advantage of the speci\ufb01c structure of this problem.\n\nWe use the same number of convolution layers in both C and CL.\n\n5 Experiments\n\nWe hypothesize that our models will be able to make predictions of the global build tree and local unit\nnumbers better than strong existing baselines. We evaluate our models on a human games dataset,\nand compare them to heuristic baselines that are currently employed by competitive rule-based bots.\nWe also test whether we are able to use the defogger directly in a state-of-the-art rule-based StarCraft\nbot, and we evaluate the impact of the best models within full games.\n\n4\n\n\f5.1 Experiments On Human Replays\n\nWe de\ufb01ne four tasks as proxies for measuring the usefulness of forward modeling in RTS games,\nand use those to assess the performance of our models. We then explain baselines and describe the\nhyper-parameters. Our models are trained and evaluated on the STARDATA corpus, which consists\nof 65,000 high quality human games of StarCraft: Brood War [19]. We use the train, valid, and test\nsplits given by the authors, which comprise 59060, 3289 and 3297 games, respectively. Our models\nare implemented in PyTorch [21], and data preprocessing is done with TorchCraft [22].\n\n5.1.1 Evaluation Proxy Tasks\n\nIn full games, the prediction of opponents\u2019 strategy and tactical choices may improve your chances at\nvictory. We de\ufb01ne tactics as where to send your units, and strategy as what kind of units to produce.\n\nIn strategy prediction, presence or absence of certain buildings is key to determining what types of\nunits the opponent will produce, and which units the player needs to produce to counter the opponent.\nThus, we can measure the prediction accuracy of all opponent buildings in a future frame.\n\nWe use two proxy tasks to measure the strength of defogging and forward modeling in tactics. One\ntask is the model correctly predicting the existence or absence of all enemy units on the game map,\ncorrelated to being able to model the game dynamics. Another is the prediction of only units hidden\nby the fog of war, correlated to whether our models can accurately make predictions under partially\nobservable states. Finally, the task of correctly predicting the location and number of enemy units is\nmeasured most accurately by the regression Huber loss, which we directly train for.\n\nThis results in four proxy tasks that correlate with how well the forward model can be used:\n\n\u2022 g_op_b (global opponent buildings) Existence of each opponent building type on any tile.\n\n\u2022 hid_u (hidden units) Existence of units that we do not see at time t + s, at each spatial\n\nlocation (i, j), necessarily belonging to your opponent.\n\n\u2022 op_u (opponent units) Existence of all opponent units at each spatial location (i, j).\n\n\u2022 Huber loss between the real and predicted unit counts in each tile, averaged over every\n\ndimension (temporal, spatial and unit types).\n\nFor the \ufb01rst three tasks, we track the F1 score, and for the last the Huber loss. The scores for\ng_op_b are averaged over (T, Cu), other tasks (hid_u, op_u, Huber loss) are measured by averaging\nover all (T, Hr,g, Wr,g, Cu); and then averaged over all games. When predicting existence/absence\n(g_op_b, hid_u, op_u) from a head (be it a regression or classi\ufb01cation head), we use a threshold\nthat we cross-validate per model, per head, on the F1 score.\n\n5.1.2 Baselines\n\nTo validate the strength of our models, we compare their performance to relevant baselines that are\nsimilar to what rule-based bots use traditionally in StarCraft: Brood War competitions. Preliminary\nexperiments with a kNN baseline showed that the rules we will present now worked better. These\nbaselines rely exclusively on what was previously seen and game rules, to infer hidden units not in\nthe current observation ot.\n\nWe rely on four different baselines to measure success:\n\n\u2022 Previous Seen (PS): takes the last seen position for each currently hidden unit, which is what\nmost rule based bots do in real games. When a location is revealed and no units are at the\nspot, the count is again reset to 0.\n\n\u2022 Perfect memory (PM): remembers everything, and units are never removed, maximizing\nrecall. That is, with any t1 < t2, if a unit appears in ot1 , then it is predicted to appear in ot2 .\n\n\u2022 Perfect memory + rules (PM+R): designed to maximize g_op_b, by using perfect memory\nand game rules to infer the existence of unit types that are prerequisite for unit types that\nhave ever been seen.\n\n\u2022 Input: predicts by copying the input frame, here as a sanity check.\n\n5\n\n\fIn order to beat these baselines, our models have to learn to correlate occurrences of units and\nbuildings, and remember what was seen before. We hope our models will also be able to model\nhigh-level game dynamics and make long term predictions to generate even better forward predictions.\n\n5.1.3 Hyperparameters\n\nWe train and compare multiple models by varying the encoder type as well as spatial and temporal\nresolutions. For each combination, we perform a grid search over multiple hyper-parameters and\npick the best model according to our metrics on the proxy tasks as measured on the validation\nset. We explored the following properties: kernel width of convolutions and striding (3,5); model\ndepth; non-linearities (ReLU, GLU); residual connections; skip connections in the encoder LSTM;\noptimizers (Adam, SGD); learning rates.\n\nDuring hyperparameter tuning, we found Adam to be more stable than SGD over a large range of\nhyperparameters. We found that models with convolutional LSTMs encoders worked more robustly\nover a larger range of hyperparameters. Varying model sizes did not amount to signi\ufb01cant gains,\nso we picked the smaller sizes for computational ef\ufb01ciency. Please check the appendix for a more\ndetailed description of hyperparameters searched over.\n\n5.1.4 Results\n\nWe report baselines and models scores according to the metrics described above, on 64 and 32\nwalktiles effective grids (g) due to striding, with predictions at 0, 5, 15, and 30 seconds in the future\n(s), in Table 1.\n\nTo obtain the existence thresholds from a regression output, we sweep the validation set for threshold\nvalues on a logarithmic scale from 0.001 to 1.5. A unit is assumed to be present in a cell, if the\ncorresponding model output is greater than the existence threshold. Lower threshold values performed\nbetter, indicating that our model is sure of grid locations with zero units. Similarly, we \ufb01ne-tune the\nexistence threshold for the opponent\u2019s buildings. The value that maximizes the F1 score is slightly\nabove 0.5. We report the results on the test set with the best thresholds on the validation set.\n\nWe note that for g_op_b prediction, the baselines already do very well, it is hard to beat the best\nbaseline, PM+R. Most of our models have higher recall than the baseline, indicating that they predict\nmany more unexpected buildings, at the expense of mispredicting existing buildings. On all tasks,\nour models do as well or better than baseline.\n\nOur models make the most gains above baseline on unit prediction (columns op_u and hid_u). Since\nunits often move very erratically due to the dynamics of path\ufb01nding and rapid decision making, this\nis dif\ufb01cult for a baseline that only uses the previous frame. In order to predict units well, the model\nmust have a good understanding of the dynamics of the game as well as the possible strategies taken\nby players in the game. For our baselines, the more coarse the grid size (g = 64, \ufb01rst row), the easier\nit is to predict unit movement, since small movements won\u2019t change the featurization. The results\ncon\ufb01rm that our models are able to predict the movement of enemy units, which none of the baselines\nare able to do. Our models consistently outperform both tasks by a signi\ufb01cant amount.\n\nIn Table 1 the Huber loss gives a good approximation to how useful it will be when a state-of-the-art\nbot takes advantage of its predictions. During such control, we wish to minimize the number of\nmispredictions of opponent units. We average this loss across the spatial and temporal dimensions,\nand then across games, so the number is not easily interpretable. These losses are only comparable in\nthe same (g, s) scenario, and we do much better than baseline on all three accounts.\n\nTo give an intuition of prediction performance of our models, we visualized predicted unit types,\nlocations and counts against the actual ones for hidden enemy units in Figure 2. We can see how\nwell the model learns the game dynamics \u2013 in (a), the model gets almost nothing as input at the\ncurrent timestep, yet still manages to predict a good distribution over the enemy units from what is\nseen in the previous frames. (b) shows that on longer horizons the prediction is less precise, but still\ncontains quite some valuable information to plan tactical manoeuvres.\n\n5.1.5 Evaluation in a Full-Game Bot\n\nAfter observing good performance on metrics representing potential downstream tasks, we test these\ntrained models in a StarCraft: Brood War full-game setting. We run a forward model alongside our\n\n6\n\n\fTask:\ng : s\n\n64 : 15\n32 : 30\n32 : 15\n32 : 5\n32 : 0\n\nop_u F1\n\nB\n\nC\n\n0.53\n0.33\n0.34\n0.35\n0.35\n\n0.53\n0.48\n0.48\n0.44\n0.44\n\nCL\n\n0.62\n0.47\n0.51\n0.52\n0.50\n\nhid_u F1\n\nB\n\nC\n\n0.47\n0.26\n0.26\n0.27\n0.27\n\n0.51\n0.44\n0.45\n0.38\n0.38\n\nCL\n\n0.56\n0.44\n0.48\n0.47\n0.45\n\ng_op_b F1\n\nB\n\nC\n\n0.88\n0.88\n0.88\n0.89\n0.89\n\n0.89\n0.92\n0.91\n0.90\n0.90\n\nCL\n\n0.92\n0.90\n0.94\n0.95\n0.89\n\nHuber \u00b710\u22124\n\nB\n\nC\n\nCL\n\n28.97\n1.173\n1.134\n1.079\n1.079\n\n14.94\n0.503\n0.488\n0.431\n0.429\n\n10.40\n0.503\n0.430\n0.424\n0.465\n\nTable 1: Scores of our proposed models (C for ConvNet, CL for Convolutional LSTMs) and of the\nbest baseline (B) for each task, in F1. The Huber loss is only comparable across the same stride g.\n\n(a) prediction at 5s\n\n(b) prediction at 30s\n\nFigure 2: Enemy unit counts of the speci\ufb01ed type per map cell, where darker dots correspond to\nhigher counts. The top row of each plot shows the model input, where green indicates an input\nto our model. Grey areas designate the areas hidden by fog of war but are not input to our model.\nMiddle row shows predicted unit distributions after 5 and 30 seconds. Bottom row shows real unit\ndistributions.\n\n7\n\n\fNormal Vision\n\n61 (baseline)\n\nNormal Vision\n\n59 (baseline)\n\nWin rate\n\nWin rate\n\nEnhanced vision used for\nTactics Build\nBoth\n\nEnhanced vision used for\nTactics Build\nBoth\n\nFull Vision\n\nDefog t + 0s\nDefog t + 5s\nDefog t + 30s\n\n57\n\n57\n61\n50\n\n66\n\n62\n66\n59\n\n72\n\n59\n61\n49\n\nTable 2: Average win rates with strategies\nthat integrate predictions from defogger\nmodels but are otherwise unmodi\ufb01ed.\n\nFull Vision\n\nDefog t + 0s\nDefog t + 5s\nDefog t + 30s\n\n61\n\n61\n61\n52\n\n64\n\n63\n63\n51\n\n70\n\n55\n55\n43\n\nTable 3: Average win rates from a subset of the\ngames in the previous table. These games feature\njust one of our bot\u2019s strategies (focused on building\nup economy \ufb01rst) against 4 Terran opponents.\n\nmodular, rule-based, state-of-the-art StarCraft bot. We apply minimal changes, allowing the existing\nrules \u2013 which were tuned to win without any vision enhancements \u2013 to make use of the predictions.\n\nWe played multiple games with our bot against a battery of competitive opponents (see Table 5 in\nAppendix). We then compare results across \ufb01ve different sources of vision:\n\n\u2022 Normal: Default vision settings\n\u2022 Full Vision: Complete knowledge of the game state.\n\u2022 Defog + 0s: Defogging the current game state.\n\u2022 Defog + 5s: Defogging 5 seconds into the future.\n\u2022 Defog + 30s: Defogging 30 seconds into the future.\n\nWe take the best defogger models in opu from the validation set. For each of the Full vision and\nDefog settings, we run three trials. In each trial we allow one or two of the bot\u2019s modules to consider\nthe enhanced information:\n\n\u2022 Tactics: Positioning armies and deciding when to \ufb01ght.\n\u2022 Build Actions: Choosing which units and structures to build.\n\u2022 Both: Both Tactics and Build Actions.\n\nIn Tables 2 and 3, we investigate the effects of enhanced vision on our StarCraft bot\u2019s gameplay. For\nthese experiments, our changes were minimal: we did not change the existing bot\u2019s rules, and our\nbot uses its existing strategies, which were previously tuned for win rates with normal vision, which\nessentially uses the Previous Seen baseline. Our only changes consisted of substituting the predicted\nunit counts (for Build Actions) and to put the predicted unseen units at the center of the hidden tiles\nwhere they are predicted (for Tactics). In our control experiment with Full Vision, we do the exactly\nthe same, but instead of using counts predicted by defogging, we input the real counts, effectively\ncheating by given our bot full vision but snapping to the center of the hidden tiles to emulate the\ndefogger setting. We run games with our bot against all opponents listed in Table 5, playing a total of\n1820 games for each setting in Table 2.\n\nBecause our bot is Zerg, and most Zerg vs Zerg matchups depend much more on execution than\nopponent modeling, we do not try any Zerg bots. On average, over all match-ups and strategies, using\nthe defogger model boosts the win rate of our rule-based bot to 66% from 61% (baseline), as Table 2\ndemonstrates. Overall, the defogger seems to hurt the existing Tactics module more than help it, but\nimproves the performance of Build Actions.\n\nWe note that any variance in defogger output over time produces different kinds of errors in Build\nActions and Tactics. Variance in inputs to Build Actions are likely to smooth out over time as the\ncorrect units are added in future time steps. Underestimations in Tactics cause the army to engage;\nsubsequent overestimations cause it to retreat, leading to unproductive losses through indecision.\n\nWe broke down those results in a single match-up (Zerg vs. Terran), using a single strategy, in Table 3,\nthe trends are the same, with encouraging use of the defogger predictions for Build Actions alone,\n\n8\n\n\f(a) vs. IronBot\n\n(b) vs. McRave\n\nFigure 3: Plot showing the enemy army \u201csupply\u201d (\u2248 unit counts) during the game: green is the\nground truth, blue is the prediction of the defogger, and red is the known count our bot would normally\nhave access with Normal Vision, equivalent to the PS baseline.\n\nand poor result when combining this with Tactics. It suggests we could get better results by using\na different defogger model for Tactics, or tuning the rules for the enhanced information conditions.\nFinally, in Figure 3, we observe that the defogger model is able to much more precisely predict the\nnumber of units in the game compared to using heuristics, equivalent to the PS baseline.\n\n6 Conclusion and Future Work\n\nWe proposed models for state estimation and future state prediction in real-time strategy games,\nwith the goals of inferring hidden parts of the state and learning higher-level strategic patterns and\nbasic rules of the game from human games. We demonstrated via off-line tests that encoder-decoder\narchitectures with temporal memory perform better than rule-based baselines at predicting both the\ncurrent state and the future state. Moreover, we provide analysis of the advantages and pitfalls of\ninforming a the tactical and strategic modules of a rule-based bot with a forward model trained solely\non human data.\n\nForward models such as the defogger lack a model of how the agent acts in the environment. We\nbelieve the promising results presented in this paper open the way towards learning models of the\nevolution of the game conditioned on the players\u2019 strategy, to perform model-based reinforcement\nlearning or model predictive control.\n\nReferences\n\n[1] Wikipedia contributors. Actions per minute. https://en.wikipedia.org/wiki/Actions_\n\nper_minute, 2018. [Online; accessed 26-October-2018].\n\n[2] Reinhard Kneser and Hermann Ney. Improved backing-off for m-gram language modeling. In\nAcoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference\non, volume 1, pages 181\u2013184. IEEE, 1995.\n\n[3] Yoshua Bengio, R\u00e9jean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic\n\nlanguage model. Journal of machine learning research, 3(Feb):1137\u20131155, 2003.\n\n[4] Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. Exploring\n\nthe limits of language modeling. arXiv preprint arXiv:1602.02410, 2016.\n\n[5] MarcAurelio Ranzato, Arthur Szlam, Joan Bruna, Michael Mathieu, Ronan Collobert, and\nSumit Chopra. Video (language) modeling: a baseline for generative models of natural videos.\nIn International Conference on Learning Representations, ICLR, 2015.\n\n9\n\n\f[6] Michael Mathieu, Camille Couprie, and Yann LeCun. Deep multi-scale video prediction beyond\n\nmean square error. arXiv:1511.05440 [cs, stat], November 2015. arXiv: 1511.05440.\n\n[7] Pauline Luc, Natalia Neverova, Camille Couprie, Jakob Verbeek, and Yann Lecun. Predicting\nDeeper into the Future of Semantic Segmentation. In ICCV 2017 - International Conference on\nComputer Vision, page 10, Venise, Italy, October 2017.\n\n[8] Adam Lerer, Sam Gross, and Rob Fergus. Learning physical intuition of block towers by\nexample. In Proceedings of the 33rd International Conference on International Conference on\nMachine Learning - Volume 48, ICML\u201916, pages 430\u2013438. JMLR.org, 2016.\n\n[9] Xingjian Shi, Zhihan Gao, Leonard Lausen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, and\nWang-chun Woo. Deep learning for precipitation nowcasting: A benchmark and a new model.\nIn Advances in Neural Information Processing Systems, pages 5622\u20135632, 2017.\n\n[10] Jintao Ke, Hongyu Zheng, Hai Yang, and Xiqun Michael Chen. Short-term forecasting of\npassenger demand under on-demand ride services: A spatio-temporal deep learning approach.\nTransportation Research Part C: Emerging Technologies, 85:591\u2013608, 2017.\n\n[11] Haiyang Yu, Zhihai Wu, Shuqin Wang, Yunpeng Wang, and Xiaolei Ma. Spatiotemporal\nrecurrent convolutional networks for traf\ufb01c prediction in transportation networks. Sensors,\n17(7):1501, 2017.\n\n[12] Santiago Ontan\u00f3n, Gabriel Synnaeve, Alberto Uriarte, Florian Richoux, David Churchill, and\nMike Preuss. A survey of real-time strategy game ai research and competition in starcraft. IEEE\nTransactions on Computational Intelligence and AI in games, 5(4):293\u2013311, 2013.\n\n[13] Ben George Weber, Michael Mateas, and Arnav Jhala. A particle model for state estimation in\n\nreal-time strategy games. In AIIDE, 2011.\n\n[14] Gabriel Synnaeve and Pierre Bessiere. Special tactics: A bayesian approach to tactical decision-\nmaking. In Computational Intelligence and Games (CIG), 2012 IEEE Conference on, pages\n409\u2013416. IEEE, 2012.\n\n[15] Alberto Uriarte and Santiago Ontan\u00f3n. Automatic learning of combat models for rts games. In\n\nEleventh Arti\ufb01cial Intelligence and Interactive Digital Entertainment Conference, 2015.\n\n[16] Nicolas Usunier, Gabriel Synnaeve, Zeming Lin, and Soumith Chintala. Episodic exploration\nfor deep deterministic policies: An application to starcraft micromanagement tasks. arXiv\npreprint arXiv:1609.02993, 2016.\n\n[17] Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson.\n\nCounterfactual multi-agent policy gradients. arXiv preprint arXiv:1705.08926, 2017.\n\n[18] Niels Justesen and Sebastian Risi. Learning macromanagement in starcraft from replays using\ndeep learning. In Computational Intelligence and Games (CIG), 2017 IEEE Conference on,\npages 162\u2013169. IEEE, 2017.\n\n[19] Zeming Lin, Jonas Gehring, Vasil Khalidov, and Gabriel Synnaeve. STARDATA: A StarCraft\nAI Research Dataset. In AAAI Conference on Arti\ufb01cial Intelligence and Interactive Digital\nEntertainment, 2017.\n\n[20] Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun\nWoo. Convolutional LSTM network: A machine learning approach for precipitation nowcasting.\nCoRR, abs/1506.04214, 2015.\n\n[21] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito,\nZeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in\npytorch. 2017.\n\n[22] Gabriel Synnaeve, Nantas Nardelli, Alex Auvolat, Soumith Chintala, Timoth\u00e9e Lacroix, Zeming\nLin, Florian Richoux, and Nicolas Usunier. TorchCraft: a Library for Machine Learning\nResearch on Real-Time Strategy Games. arXiv:1611.00625 [cs], November 2016. arXiv:\n1611.00625.\n\n10\n\n\f[23] S. Onta\u00f1\u00f3n, G. Synnaeve, A. Uriarte, F. Richoux, D. Churchill, and M. Preuss. A Survey of\nReal-Time Strategy Game AI Research and Competition in StarCraft. IEEE Transactions on\nComputational Intelligence and AI in Games, 5(4):293\u2013311, December 2013.\n\n[24] Gabriel Synnaeve. Bayesian programming and learning for multi-player video games. PhD\n\nthesis, Grenoble University, 2012.\n\n11\n\n\f", "award": [], "sourceid": 6843, "authors": [{"given_name": "Gabriel", "family_name": "Synnaeve", "institution": "Facebook"}, {"given_name": "Zeming", "family_name": "Lin", "institution": "Facebook AI Research"}, {"given_name": "Jonas", "family_name": "Gehring", "institution": "Facebook AI Research"}, {"given_name": "Dan", "family_name": "Gant", "institution": "Facebook AI Research"}, {"given_name": "Vegard", "family_name": "Mella", "institution": "Facebook AI Research"}, {"given_name": "Vasil", "family_name": "Khalidov", "institution": "Facebook AI Research"}, {"given_name": "Nicolas", "family_name": "Carion", "institution": "Facebook AI Research Paris"}, {"given_name": "Nicolas", "family_name": "Usunier", "institution": "Facebook AI Research"}]}