{"title": "The Neurothermostat: Predictive Optimal Control of Residential Heating Systems", "book": "Advances in Neural Information Processing Systems", "page_first": 953, "page_last": 959, "abstract": null, "full_text": "The Neurothermostat: \n\nPredictive Optimal Control of \nResidential Heating Systems \n\nMichael C. Mozert , Lucky Vidmart , Robert H. Dodiert \n\ntDepartment of Computer Science \n\ntDepartment of Civil, Environmental, and Architectural Engineering \n\nUniversity of Colorado, Boulder, CO 80309-0430 \n\nAbstract \n\nThe Neurothermostat is an adaptive controller that regulates in(cid:173)\ndoor air temperature in a residence by switching a furnace on or \noff. The task is framed as an optimal control problem in which \nboth comfort and energy costs are considered as part of the con(cid:173)\ntrol objective. Because the consequences of control decisions are \ndelayed in time, the N eurothermostat must anticipate heating de(cid:173)\nmands with predictive models of occupancy patterns and the ther(cid:173)\nmal response of the house and furnace. Occupancy pattern predic(cid:173)\ntion is achieved by a hybrid neural net / look-up table. The Neu(cid:173)\nrothermostat searches, at each discrete time step, for a decision \nsequence that minimizes the expected cost over a fixed planning \nhorizon. The first decision in this sequence is taken, and this pro(cid:173)\ncess repeats. Simulations of the Neurothermostat were conducted \nusing artificial occupancy data in which regularity was systemat(cid:173)\nically varied, as well as occupancy data from an actual residence. \nThe Neurothermostat is compared against three conventional poli(cid:173)\ncies, and achieves reliably lower costs. This result is robust to the \nrelative weighting of comfort and energy costs and the degree of \nvariability in the occupancy patterns. \n\nFor over a quarter century, the home automation industry has promised to revolu(cid:173)\ntionize our lifestyle with the so-called Smart House@ in which appliances, lighting, \nstereo, video, and security systems are integrated under computer control. How(cid:173)\never, home automation has yet to make significant inroads, at least in part because \nsoftware must be tailored to the home occupants. \nInstead of expecting the occupants to program their homes or to hire someone to \ndo so, one would ideally like the home to essentially program itself by observing \nthe lifestyle of the occupants. This is the goal of the Neural Network House (Mozer \net al., 1995), an actual residence that has been outfitted with over 75 sensors(cid:173)\nincluding temperature, light, sound, motion-and actua.tors to control air heating, \nwater heating, lighting, and ventilation. In this paper, we describe one research \n\n\f954 \n\nM. C. Mozer. L. Vidmar and R. H. Dodier \n\nproject within the house, the Neurothermostat, that learns to regulate the indoor \nair temperature automatically by observing and detecting patterns in the occupants' \nschedules and comfort preferences. We focus on the problem of air heating with \na whole-house furnace, but the same approach can be taken with alternative or \nmultiple heating devices, and to the problems of cooling and ventilation. \n\n1 TEMPERATURE REGULATION AS AN OPTIMAL \n\nCONTROL PROBLEM \n\nTraditionally, the control objective of air temperature regulation has been to mini(cid:173)\nmize energy consumption while maintaining temperature within an acceptable com(cid:173)\nfort margin during certain times of the day and days of the week. This is sensible \nin commercial settings, where occupancy patterns follow simple rules and where \nenergy considerations dominate individual preferences. In a residence, however, the \ndesires and schedules of occupants need to be weighted equally with energy con(cid:173)\nsiderations. Consequently, we frame the task of air temperature regulation as a \nproblem of maximizing occupant comfort and minimizing energy costs. \n\nThese two objectives clearly conflict, but they can be integrated into a unified \nframework via an optimal control aproach in which the goal is to heat the house \naccording to a policy that minimizes the cost \n\nJ = lim 1 l:!~~\"+ 1 [e( ut) + m( xt}], \n\n\"-+00 \" \n\n0 \n\nwhere time, t, is quantized into nonoverlapping intervals during which we assume \nall environmental variables remain constant, to is the interval ending at the current \ntime, Ut is the control decision for interval t (e.g., turn the furnace on), e(u) is the \nenergy cost associated with decision u, Xt is the environmental state during interval \nt, which includes the indoor temperature and the occupancy status of the home, \nand m(x) is the misery of the occupant given state x. To add misery and energy \ncosts, a common currency is required. Energy costs are readily expressed in dollars. \nWe also determi'ne misery in dollars, as we describe later. \nWhile we have been unable to locate any earlier work that combined energy and \ncomfort costs in an optimal control framework, optimal control has been used in a \nvariety of building energy system control applications (e.g., Henze & Dodier, 1996; \nKhalid & Omatu, 1995), \n\n2 THE NEUROTHERMOSTAT \n\nFigure 1 shows the system architecture of the Neurothermostat and its interaction \nwith the environment. The heart of the Neurothermostat is a controller that, at \ntime intervals of 6 minutes, can switch the house furnace on or off. Because the con(cid:173)\nsequences of control decisions are delayed in time, the controller must be predictive \nto anticipate heating demands. The three boxes in the Figure depict components \nthat predict or model various aspects of the environment. We explain their purpose \nvia a formal description of the controller operation. \nThe controller considers sequences of '\" decisions, denoted u, and searches for the \nsequence that minimizes the expected total cost, i u , over the planning horizon of \n\",6 minutes: \n\nwhere the expectation is computed over future states of the environment conditional \non the decision sequence u. The energy cost in an interval depends only on the \ncontrol decision during that interval. The misery cost depends on two components \n\n\fThe Neurothermostat \n\n955 \n\n,_ . r ( . \u2022 ' \u00b7 . , .... \n\u2022 \n\n._~\" .:J . ~ I;\"; \n\nEnvlrcJni;imt ... \n\n. , . \n\n=-~':.~ \n\nFigure 1: The Neurothermostat and its interaction with the environment \n\nof the state-the house occupancy status, o(t) (0 for empty, 1 for occupied), and \nthe indoor air temperature, hu(t): \n\nmu(o(t), hu(t\u00bb = p[o(t) = 1] m(l, hu(t\u00bb + p[o(t) = 0] m(O, hu(t\u00bb \n\nBecause the true quantities e, hu, m, and p are unknown, they must be estimated . \nThe house thermal model of Figure 1 provides e and hu, the occupant comfort cost \nmodel provides m, and the home occupancy predictor provides p. \nWe follow a tradition of using neural networks for prediction in the context of build(cid:173)\ning energy system control (e.g., Curtiss, Kreider, & Brandemuehl, 1993; Ferrano & \nWong, 1990; Miller & Seem, 1991), although in our initial experiments we require \na network only for the occupancy prediction. \n\n2.1 PREDICTIVE OPTIMAL CONTROLLER \nWe propose a closed-loop controller that combines prediction with fixed-horizon \nplanning, of the sort proposed by Clarke, Mohtadi, and 'lUffs (1987). At each \ntime step, the controller considers all possible decision sequences over the pl~nning \nhorizon and selects the sequence that minimizes the expected total cost, J. The \nfirst decision in this sequence is then performed. After b minutes, the planning and \nexecution process is repeated. This approach assumes that beyond the planning \nhorizon, all costs are independent of the first decision in the sequence. \nWhile dynamic programming is an efficient search algorithm, it requires a discrete \nstate space. Wishing to avoid quantizing the continuous variable of indoor temper(cid:173)\nature, and the errors that might be introduced, we performed performed exhaustive \nsearch through the possible decision sequences, which was tractable due to rela(cid:173)\ntively short horizons and two additional domain constraints. First, because the \nhouse occupancy status can reasonably be assumed to be independent of the in(cid:173)\ndoor temperature, p need not be recalculated for every possible decision sequence. \nSecond, the current occupancy status depends on the recent occupancy history. \nConsequently, one needs to predict occupancy patterns over the planning horizon, \no E {O, l}'\" to compute p. However, because most occupancy sequences are highly \nimprobable, we find that considering only the most likely sequences-those contain(cid:173)\ning at most two occupancy state transitions-produces the same decisions as doing \nthe search over the entire distribution, reducing the cost from 0(2\"') to O(K;2). \n\n2.2 OCCUPANCY PREDICTOR \nThe basic task of the occupancy predictor is to estimate the probability that the \noccupant will be home b minutes in the future. The occupancy predictor can be \nrun iteratively to estimate the probability of an occupancy pattern. \n\nIf occupants follow a deterministic daily schedule, a look up table indexed by time \nof day and current occupancy state should capture occupancy patterns. We thus \nuse a look up table to encode whatever structure possible, and a neural network \n\n\f956 \n\nM. C. Mozer. L. Vidmar and R. H. Dodier \n\nto encode residual structure. The look up table divides time into fixed 6 minute \nbins. The neural network consisted of the following inputs: current time of day; day \nof the week; average proportion of time home was occupied in the 10, 20, and 30 \nminutes from the present time of day on the previous three days and on the same \nday of the week during the past four weeks; and the proportion of time the home \nwas occupied during the past 60, 180, and 360 minutes. The network, a standard \nthree-layer architecture, was trained by back propagation. The number of hidden \nunits was chosen by cross validation . \n\n2.3 THERMAL MODEL OF HOUSE AND FURNACE \nA thermal model of the house and furnace predicts future indoor temperature(s) \nas a function of the outdoor temperature and the furnace operation. While one \ncould perform system identification using neural networks, a simple parameterized \nresistance-capacitance (RC) model provides a reasonable first-order approximation. \nThe RC model assumes that : the inside of the house is at a uniform temperature, \nand likewise the outside; a homogeneous flat wall separates the inside and outside, \nand this wall has a thermal resistance R and thermal capacitance C; the entire \nwall mass is at the inside temperature; and the heat input to the inside is Q when \nthe furnace is running or zero otherwise. Assuming that the outdoor temperature, \ndenoted g, is constant, the the indoor temperature at time t, hu(t), is: \n\nhu(t) = hu(t - 1) exp( -606 / RC) + (RQu(t) + 9 )(1 - exp( -606 / RC)), \n\nwhere hu(to) is the actual indoor temperature at the current time. Rand C were \ndetermined by architectural properties of the Neural Network House to be 1.33 \nKelvins/kilowatt and 16 megajoules/Kelvin, respectively. The House furnace is \nrated at 133,000 Btu/hour and has 92.5% fuel use efficiency, resulting in an output \nof Q = 36.1 kilowatts. With natural gas at $.485 per CCF, the cost of operating \nthe furnace , e, is $.7135 per hour. \n2.4 OCCUPANT COMFORT COST MODEL \nIn the Neural Network House, the occupant expresses discomfort by adjusting a \nset point temperature on a control panel. However, for simplicity, we assume in \nthis work the setpoint is a constant, A. When the home is occupied, the misery \ncost is a monotonic function of the deviation of the actual indoor temperature from \nthe set point. When the home is empty, the misery cost is zero regardless of the \ntemperature. \nThere is a rich literature directed at measuring thermal comfort in a given environ(cid:173)\nment (i.e., dry-bulb temperature, relative humidity, air velocity, c~othing insulation, \netc.) for the average building occupant (e.g., Fanger, 1972; Gagge, Stolwijk, & Nishi, \n1971). Although the measurements indicate the fraction of the population which \nis uncomfortable in a particular environment, one might also interpret them as a \nmeasure of an individual's level of discomfort. As a function of dry-bulb tempera(cid:173)\nture, this curve is roughly parabolic. We approximate it with a measure of misery \nin a 6-minute period as follows: \n\n( \n\nA \n\nm 0, \n\nh) _ _6_ max(0,1.x-hl-\u00a3)2 \n\n- oa 24 x 60 \n\n25 \n\n. \n\nThe first term, 0, is a binary variable indicating the home occupancy state. The \nsecond term is a conversion factor from arbitrary \"misery\" units to dollars. The \nthird term scales the misery cost from a full day to the basic update interval. \nThe fourth term produces the parabolic relative misery function, scaled so that a \ntemperature difference of 5\u00b0 C produces one unit of misery, with a deadband region \nfrom A - ( to A + L \nWe have chosen the conversion factor a using an economic perspective. Consider \nthe lost productivity that results from trying to work in a home that is 5\u00b0 C colder \n\n\fThe Neurothermostat \n\n957 \n\nthan desired for a 24 hour period. Denote this loss p, measured in hours. The cost \nin dollars of this loss is then a = 'YP, where'Y is the individual's hourly salary. With \nthis approch, a can be set in a natural, intuitive manner. \n\n3 SIMULATION METHODOLOGY \nIn all experiments we report below, fJ = 10 minutes, K, = 12 steps (120 minute plan(cid:173)\nning horizon), ,\\ = 22.5\u00b0 C, f = 1, and 'Y = 28 dollars per hour. The productivity \nloss, p, was varied from 1 to 3 hours. \nWe report here results from the Neurothermostat operating in a simulated envi(cid:173)\nronment, rather than in the actual Neural Network House. The simulated environ(cid:173)\nment incorporates the house/furnace thermal model described earlier and occupants \nwhose preferences follow the comfort cost model. The outdoor temperature is as(cid:173)\nsumed to remain a constant 0\u00b0 C. Thus, the Neurothermostat has an accurate model \nof its environment, except for the occupancy patterns, which it must predict based \non training data. This allows us to evaluate the performance of the Neurothermo(cid:173)\nstat and the occupancy model as occupancy patterns are varied, uncontaminated \nby the effect of inaccuracy in the other internal models. \nWe have evaluated the Neurothermostat with both real and artificial occupancy \ndata. The real data was collected from the Neural Network House with a single \nresident over an eight month period, using a simple algorithm based on motion \ndetector output and the opening and closing of outside doors. The artificial data \nwas generated by a simulation of a single occupant. The occupant would go to work \neach day, later on the weekends, would sometimes come home for lunch, sometimes \ngo out on weekend nights, and sometimes go out of town for several days. To \nexamine performance of the Neurothermostat as a function of the variability in \nthe occupant's schedule, the simulation model included a parameter, the variability \nindex. An index of 0 means that the schedule is entirely deterministic; an index of \n1 means that the schedule was very noisy, but still contained statistical regularities. \nThe index determined factors such as the likelihood and duration of out-of-town \ntrips and the variability in departure and return times. \n\n3.1 ALTERNATIVE HEATING POLICIES \nIn addition to the Neurothermostat, we examined three nonadaptive control policies. \nThese policies produce a setpoint at each time step, and the furnace is switched on \nif the temperature drops below the setpoint and off if the temperature rises above \nthe setpoint. (We need not be concerned about damage to the furnace by cycling \nbecause control decisions are made ten minutes apart.) The constant-temperature \npolicy produces a fixed setpoint of 22.5\u00b0 C. The occupancy-triggered policy produces \na set point of 18\u00b0 C when the house is empty, 22.5\u00b0 C when the house is occupied. \nThe setback-thermostat policy lowers the setpoint from 22.5\u00b0 C to 18\u00b0 C half an \nhour before the mean work departure time for that day of the week, and raises it \nback to 22.5\u00b0 C half an hour before the mean work return time for that day of the \nweek. The setback temperature for the occupancy-triggered and setback-thermostat \npolicies was determined empirically to minimize the total cost. \n\n4 RESULTS \n4.1 OCCUPANCY PREDICTOR \nPerformance of three different predictors was evaluated using artificial data across \na range of values for the variability index. For each condition, we generated eight \ntraining/test sets of artificial data, each training and test set consisting of 150 days \nof data. Table 1 shows the normalized mean squared error (MSE) on the test set, \naveraged over the eight replications. The normalization involved dividing the MSE \nfor each replication by the error obtained by predicting the future occupancy state \n\n\f958 \n\nM. C. Mozer, L. Vidmar and R. H. Dodier \n\nTable 1: Normalized MSE on Test Set for Occupancy Prediction-Artificial Data \n\nlookup table \nneural net \n\nlookup table + neural net \n\n0 \n.49 \n.02 \n.02 \n\nvariability index \n.75 \n.25 \n.92 \n.81 \n.86 \n.63 \n.60 \n.77 \n\n.50 \n.94 \n.83 \n.78 \n\n1 \n.94 \n.91 \n.74 \n\nProductivity Loss = 1.0 hr. \n\nProductivity Loss ,. 3.0 hr. \n\n8.2 \n8 \n\n'0 \n\nc \n\ni \n- - \" . \" \n~7.8 ~.'~~~ ... .. . . :..::::.. .... ~.:::..::\" .. . .. .. . \nr---_ \n115 \n87.6 \n= 7.4 __ .-' \n~ 7 .2_ \n0.25 \n\n- . ~.~ ____ _ \n\n- - - - - -\n\n0.75 \n\n0.5 \n\n.\" \n\no \n\nVariability Index \n\n\" \n\n9~--- ___ \n\n~ 10 \n~ \n~ \nO~ \n\n\" \n\n- , ' * \" - ./ \n\n, _ . _ . _ ./ \n\n/ \n\n:; .-\n\n-\n\n; \" \" \n\n- ~'\"-\"\" \n----\n\n... - . . \n\n. ., \n\n0.25 \n\n0.5 \n\nVariability Index \n\n0.75 \n\n-\n\no \n\nFigure 2: Mean cost per day incurred by four control policies on (artificial) test data as \na function of the data's variability index for p = 1 (comfort lightly weighted, left panel) \nand p = 3 (comfort heavily weighted, right panel). \n\nis the same as the present state. The main result here is that the combination of \nneural network and look up table perform better than either component in isolation \n(ANOVA: F(l, 7) = 1121,p < .001 for combination vs. table; F(l, 7) = 64,p < .001 \nfor combination vs. network), indicating that the two components are capturing \ndifferent structure in the data. \n\n4.2 CONTROLLER WITH ARTIFICIAL OCCUPANCY DATA \nHaving trained eight occupancy predictors with different (artificial data) training \nsets, we computed misery and energy costs for the Neurothermostat on the cor(cid:173)\nresponding test sets. Figure 2 shows the mean total cost per day as a function \nof variability index, control policy, and relative comfort cost . The robust result is \nthat the Neurothermostat outperforms the three nonadaptive control policies for all \nlevels of the variability index and for both a wide range of values of p. \nOther patterns in the data are noteworthy. Costs for the Neurothermostat tend to \nrise with the variability index, as one would expect because the occupant's sched(cid:173)\nule becomes less predictable. The constant-temperature policy is worst if occupant \ncomfort is weighted lightly, and begins to approach the Neurothermostat in per(cid:173)\nformance as comfort costs are increased. If comfort costs overwhelm energy costs, \nthen the constant-temperature policy and the Neurothermostat converge. \n\n4.3 CONTROLLER WITH REAL OCCUPANCY DATA \nEight months of real occupancy data collected in the Neural Network House be(cid:173)\nginning in September 1994 was also used to generate occupancy models and test \ncontrollers. Three training/test splits were formed by training on five consecu(cid:173)\ntive months and testing on the next month. Table 2 shows the mean daily cost \nfor the four controllers. The Neurothermostat significantly outperforms the three \nnonadaptive controllers, as it did with the artificial data. \n\n5 DISCUSSION \nThe simulation studies reported here strongly suggest that adaptive control of res(cid:173)\nidential heating and cooling systems is worthy of further investigation. One is \n\n\fThe Neurothermostat \n\n959 \n\nTable 2: Mean Daily Cost Based on Real Occupancy Data \n\nNeurothermostat \n\nconstant temperature \noccupancy triggered \nsetback thermostat \n\nproductivity loss \np=3 \np=1 \n$7.05 \n$6.77 \n$7.85 \n$7.85 \n$8.66 \n$7.49 \n$8.12 \n$9.74 \n\ntempted to trumpet the conclusion that adaptive control lowers heating costs, but \nbefore doing so, one must be clear that the cost being lowered is a combination of \ncomfort and energy costs. If one is merely interested .in lowering energy costs, then \nsimply shut off the furnace. A central contribution of this work is thus the framing \nof the task of air temperature regulation as an optimal control problem in which \nboth comfort and energy costs are considered as part of the control objective. \nA common reaction to this research project is, \"My life is far too irregular to be \npredicted. I don 't return home from work at the same time every day.\" An impor(cid:173)\ntant finding of this work is that even a highly nondeterministic schedule contains \nsufficient statistical regularity to be exploited by a predictive controller. We found \nthis for both artificial data with a high variability index and real occupancy data. \nA final contribution of our work is to show that for periodic data such as occupancy \npatterns that follow a weekly schedule, the combination of a look up table to encode \nthe periodic structure and a neural network to encode the residual structure can \noutperform either method in isolation. \n\nAcknowledgements \nSupport for this research has come from Lifestyle Technologies, NSF award IRI-\n9058450, and a CRCW grant-in-aid from the University of Colorado. This project \nowes its existence to the dedication of many students, particularly Marc Anderson, \nJosh Anderson, Paul Kooros, and Charles Myers. Our thanks to Reid Hastie and \nGary McClelland for their suggestions on assessing occupant misery. \n\nReferences \nClarke, D. W., Mohtadi, C., & Tuffs, P. S. (1987). Generalized predictive control-Part I. \n\nThe basic algorithm. Automatica, 29, 137- 148. \n\nCurtiss, P., Kreider, J. F ., & Brandemuehl, M. J . (1993). Local and global control of \ncommercial building HVAC systems using artificial neural networks. Proceedings of \nthe American Control Conference, 9, 3029-3044. \n\nFanger, P. O. (1972) . Thermal comfort. New York: McGraw-Hill. \nFerrano, F. J., & Wong, K. V. (1990). Prediction of thermal storage loads using a neural \n\nnetwork. ASHRAE Transactions, 96, 723-726. \n\nGagge, A. P., Stolwijk, J . A. J., & Nishi, Y. (1971). An effective temperature scale based on \na simple model of human physiological regulatory response. ASHRAE Transactions, \n77,247- 262. \n\nHenze, G. P., & Dodier, R. H. (1996). Development of a predictive optimal controller for \n\nthermal energy storage systems. Submitted for publication. \n\nKhalid, M., & Omatu, S. (1995). Temperature regulation with neural networks and alter(cid:173)\n\nnative control schemes. IEEE Transactions on Neural Networks, 6, 572-582. \n\nMiller, R. C., & Seem, J. E. (1991). Comparison of artificial neural networks with tradi(cid:173)\n\ntional methods of predicting return time from night or weekend setback. ASHRAE \nTransactions, 97, 500- 508. \n\nMozer, M. C., Dodier, R. H., Anderson, M., Vidmar, L., Cruickshank III, R. F., & Miller, \nD. (1995) . The neural network house: An overview. In L. Niklasson & M. Boden \n(Eds.), Current trends in connectionism (pp. 371-380). Hillsdale, NJ: Erlbaum. \n\n\f", "award": [], "sourceid": 1299, "authors": [{"given_name": "Michael", "family_name": "Mozer", "institution": null}, {"given_name": "Lucky", "family_name": "Vidmar", "institution": null}, {"given_name": "Robert", "family_name": "Dodier", "institution": null}]}