{"title": "Multistage Campaigning in Social Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 4718, "page_last": 4726, "abstract": "We consider control problems for multi-stage campaigning over social networks. The dynamic programming framework is employed to balance the high present reward and large penalty on low future outcome in the presence of extensive uncertainties. In particular, we establish theoretical foundations of optimal campaigning over social networks where the user activities are modeled as a multivariate Hawkes process, and we derive a time dependent linear relation between the intensity of exogenous events and several commonly used objective functions of campaigning. We further develop a convex dynamic programming framework for determining the optimal intervention policy that prescribes the required level of external drive at each stage for the desired campaigning result. Experiments on both synthetic data and the real-world MemeTracker dataset show that our algorithm can steer the user activities for optimal campaigning much more accurately than baselines.", "full_text": "Multistage Campaigning in Social Networks\n\nMehrdad Farajtabar\u2217\n\nXiaojing Ye\u22c4\n\nSahar Harati\u2020\n\nGeorgia Institute of Technology\u2217\n\nLe Song\u2217\n\nHongyuan Zha\u2217\n\nGeorgia State University\u22c4\n\nEmory University\u2020\n\nmehrdad@gatech.edu\n\nxye@gsu.edu\n\nsahar.harati@emory.edu\n\n{lsong,zha}@cc.gatech.edu\n\nAbstract\n\nWe consider the problem of how to optimize multi-stage campaigning over social\nnetworks. The dynamic programming framework is employed to balance the high\npresent reward and large penalty on low future outcome in the presence of exten-\nsive uncertainties. In particular, we establish theoretical foundations of optimal\ncampaigning over social networks where the user activities are modeled as a mul-\ntivariate Hawkes process, and we derive a time dependent linear relation between\nthe intensity of exogenous events and several commonly used objective functions\nof campaigning. We further develop a convex dynamic programming framework\nfor determining the optimal intervention policy that prescribes the required level\nof external drive at each stage for the desired campaigning result. Experiments on\nboth synthetic data and the real-world MemeTracker dataset show that our algo-\nrithm can steer the user activities for optimal campaigning much more accurately\nthan baselines.\n\n1 Introduction\nObama was the \ufb01rst US president in history who successfully leveraged online social media in pres-\nidential campaigning, which has been popularized and become a ubiquitous approach to electoral\npolitics (such as in the on-going 2016 US presidential election) in contrast to the decreasing rele-\nvance of traditional media such as TV and newspapers [1, 2]. The power of campaigning via social\nmedia in modern politics is a consequence of online social networking being an important part of\npeople\u2019s regular daily social lives. It has been quite common that individuals use social network sites\nto share their ideas and comment on other people\u2019s opinions. In recent years, large organizations,\nsuch as governments, public media, and business corporations, also start to announce news, spread\nideas, and/or post advertisements in order to steer the public opinion through social media platform.\nThere has been extensive interest for these entities to in\ufb02uence the public\u2019s view and manipulate\nthe trend by incentivizing in\ufb02uential users to endorse their ideas/merits/opinions at certain monetary\nexpenses or credits. To obtain most cost-effective trend manipulations, one needs to design an opti-\nmal campaigning strategy or policy such that quantities of interests, such as in\ufb02uence of opinions,\nexposure of a campaign, adoption of new products, can be maximized or steered towards the target\namount given realistic budget constraints.\nThe key factor differentiating social networks from traditional media is peer in\ufb02uence. In fact, events\nin an online social network can be categorized roughly into two types: endogenous events where\nusers just respond to the actions of their neighbors within the network, and exogenous events where\nusers take actions due to drives external to the network. Then it is natural to raise the following\nfundamental questions regarding optimal campaigning over social networks: can we model and\nexploit those event data to steer the online community to a desired exposure level? More speci\ufb01cally,\ncan we drive the overall exposure to a campaign to a certain level (e.g., at least twice per week per\nuser) by incentivizing a small number of users to take more initiatives? What about maximizing the\noverall exposure for a target group of people?\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fMore importantly, those exposure shaping tasks are more effective when the interventions are imple-\nmented in multiple stages. Due to the inherent uncertainty in social behavior, the outcome of each\nintervention may not be fully predictable but can be anticipated to some extent before the next in-\ntervention happens. A key aspect of such situations is that interventions can\u2019t be viewed in isolation\nsince one must balance the desire for high present reward with the penalty of low future outcome.\nIn this paper, the dynamic programming framework [3] is employed to tackle the aforementioned\nissues. In particular, we \ufb01rst establish the fundamental theory of optimal campaigning over social\nnetworks where the user activities are modeled as a multivariate Hawkes process (MHP) [4, 5] since\nMHP can capture both endogenous and exogenous event intensities. We also derive a time dependent\nlinear relation between the intensity of exogenous events and the overall exposure to the campaign.\nExploiting this connection, we develop a convex dynamic programming framework for determining\nthe optimal intervention policy that prescribes the required level of external drive at each stage in\norder for the campaign to reach a desired exposure pro\ufb01le. We propose several objective functions\nthat are commonly considered as campaigning criteria in social networks. Experiments on both\nsynthetic data and real world network of news websites in the MemeTracker dataset show that our\nalgorithms can shape the exposure of campaigns much more accurately than baselines.\n2 Basics and Background\nAn n-dimensional temporal point process is a random process whose realization consists of a\nlist of discrete events in time and their associated dimension, {(tk, dk)} with tk \u2208 R+ and\ndk \u2208{ 1, . . . , n}. Many different types of data produced in online social networks can be rep-\nresented as temporal point processes, such as likes and tweets. A temporal point process can be\nequivalently represented as a counting process, N (t) = (N 1(t), . . . ,N n(t))\u22a4 associated to n users\nin the social network. Here, N i(t) records the number of events user i performs before time t for\n1 \u2264 i \u2264 n. Let the history Hi(t) be the list of times of events {t1, t2, . . . , tk} of the i-th user up\nto time t. Then, the number of observed events in a small time window [t, t + dt) of length dt is\ndN i(t) =!tk\u2208Hi(t) \u03b4(t \u2212 tk) dt, and hence N i(t) =\" t\n0 dN i(s), where \u03b4(t) is a Dirac delta func-\ntion. The point process representation of temporal data is fundamentally different from the discrete\ntime representation typically used in social network analysis. It directly models the time interval\nbetween events as random variables, avoids the need to pick a time window to aggregate events, and\nallows temporal events to be modeled in a \ufb01ne grained fashion. Moreover, it has a remarkably rich\ntheoretical support [6].\nAn important way to characterize temporal point processes is via the conditional intensity function\n\u2014 a stochastic model for the time of the next event given all the times of previous events. Formally,\nthe conditional intensity function \u03bbi(t) (intensity, for short) of user i is the conditional probability\n\nof observing an event in a small window [t, t + dt) given the history H(t) =#H1(t), . . . ,Hn(t)$:\n\n\u03bbi(t)dt := P{user i performs event in [t, t + dt)|H(t)} = E[dN i(t)|H(t)],\n\n(1)\nwhere one typically assumes that only one event can happen in a small window of size dt. The\nfunctional form of the intensity \u03bbi(t) is often designed to capture the phenomena of interests.\nThe Hawkes process [7] is a class of self and mutually exciting point process models,\n\n\u03bbi(t) = \u00b5i(t) + %k:tk<t\n\nn%j=1& t\n\n0\n\n\u03c6idk (t, tk) = \u00b5i(t) +\n\n\u03c6ij(t, s)dN j(s),\n\n(2)\n\nwhere the intensity is history dependent. \u03c6ij(t, s) is the impact function capturing the temporal\nin\ufb02uence of an event by user j at time s to the future events of user j at time t ! s. Here, the \ufb01rst term\n\u00b5i(t) is the exogenous event intensity modeling drive outside the network and indecent of the history,\n\nand the second term!k:tk<t \u03c6idk (t, tk) is the endogenous event intensity modeling interactions\n\nwithin the network [8]. De\ufb01ning \u03a6(t, s) = [\u03c6ij(t, s)]i,j=1...n, and \u03bb(t) = (\u03bb1(t), . . . ,\u03bb n(t))\u22a4, and\n\u00b5(t) = (\u00b51(t), . . . , \u00b5n(t))\u22a4 we can compactly rewrite Eq 2 in matrix form:\n\n\u03bb(t) = \u00b5(t) +& t\n(3)\nIn practice it is standard to employ shift-invariant impact function, i.e., \u03a6(t, s) =\u03a6( t \u2212 s). Then,\nby using notation of convolution f (t) \u2217 g(t) =\" t\n\n0 f (t \u2212 s)g(s)ds we have\n\n\u03a6(t, s)dN (s).\n\n\u03bb(t) = \u00b5(t) +\u03a6( t) \u2217 dN (t).\n\n(4)\n\n0\n\n2\n\n\f3 From Intensity to Average Activity\nIn this section we will develop a closed form relation between the expected total intensity E[\u03bb(t)]\nand the intensity \u00b5(t) of exogenous events. This relation establish the basis of our campaigning\nframework. First, de\ufb01ne the mean function as M(t) := E[N (t)] = EH(t)[E(N (t)|H(t))]. Note that\nM(t) is history independent, and it gives the average number of events up to time t for each of the\ndimension. Similarly, the rate function \u03b7(t) is given by \u03b7(t)dt := dM(t). On the other hand,\n(5)\ndM(t) = dE[N (t)] = EH(t)[E(dN (t)|H(t))] = EH(t)[\u03bb(t)|H(t)]dt = E[\u03bb(t)]dt.\nTherefore \u03b7(t) = E[\u03bb(t)] which serves as a measure of activity in the network. In what follows we\nwill \ufb01nd an analytical form for the average activity. Proofs are presented in Appendix C.\nLemma 1. Suppose \u03a8: [0 , T ] \u2192 Rn\u00d7n is a non-increasing matrix function, then for every \ufb01xed\nconstant intensity \u00b5(t) = c \u2208 Rn\n\n+, \u03b7c(t) :=\u03a8( t)c solves the semi-in\ufb01nite integral equation\n\nif and only if \u03a8(t) satis\ufb01es\n\n\u2200t \u2208 [0, T ],\n\n\u03a6(t \u2212 s)\u03b7(s)ds,\n\n\u03b7(t) = c +& t\n\u03a8(t) = I +& t\n\u03a8(t) = e(A\u2212\u03c9I )t + \u03c9(A \u2212 \u03c9I )\u22121(e(A\u2212\u03c9I )t \u2212 I)\n\n\u03a6(t \u2212 s)\u03a8(s)ds,\n\n\u2200t \u2208 [0, T ].\n\n0\n\n0\n\nIn particular, if \u03a6(t) = Ae\u2212\u03c9t1\u22650(t) = [aije\u2212\u03c9t1\u22650(t)]ij where 0 \u2264 \u03c9/\u2208 Spectrum(A), then\n\n(6)\n\n(7)\n\n(8)\n\n(9)\n\n(11)\n\n(12)\n\nfor t \u2208 [0, T ], where, 1\u22650(t) is an indicator function for t \u2265 0.\nLet \u00b5 : [0, T ] \u2192 Rn\n\n+ be a right-continuous piecewise constant function\n\n\u00b5(t) =\n\ncm1[\u03c4m\u22121,\u03c4m)(t),\n\nM%m=1\n\nwhere 0 = \u03c40 <\u03c4 1 < \u00b7\u00b7\u00b7 <\u03c4 M = T is a \ufb01nite partition of time interval [0, T ] and function\n1[\u03c4m\u22121,\u03c4m)(t) indicates \u03c4m\u22121 \u2264 t <\u03c4 m. The next theorem shows that if \u03a8(t) satis\ufb01es (7), then one\ncan calculate \u03b7(t) for piecewise constant intensity \u00b5 : [0, T ] of form (9).\nTheorem 2. Let \u03a8(t) satisfy (7) and \u00b5(t) be a right-continuous piecewise constant intensity function\nof form (9), then the rate function \u03b7(t) is given by\n\n\u03b7(t) =\n\n\u03a8(t \u2212 \u03c4k)(ck \u2212 ck\u22121),\n\n(10)\n\nm%k=0\n\nfor all t \u2208 (\u03c4m\u22121,\u03c4 m] and m = 1, . . . , M, where c\u22121 := 0 by convention.\nUsing the above lemma, for the \ufb01rst time, we derive the average intensity for a general exogenous\nintensity. Appendix E includes a few experiments to investigate these results empirically.\nTheorem 3. If \u03a8 \u2208 C1([0, T ]) and satis\ufb01es (7), and exogenous intensity \u00b5 is bounded and piece-\nwise absolutely continuous on [0, T ] where \u00b5(t+) = \u00b5(t) at all discontinuous points t, then \u00b5 is\ndifferentiable almost everywhere, and the semi-inde\ufb01nite integral\n\nyields a rate function \u03b7 : [0, T ] \u2192 Rn\n\n\u03a6(t \u2212 s)\u03b7(s)ds,\n\n\u2200t \u2208 [0, T ],\n\n0\n\n\u03b7(t) = \u00b5(t) +& t\n\u03b7(t) =& t\n\n+ given by\n\n0\n\n\u03a8(t \u2212 s)d\u00b5(s).\n\nCorollary 4. Suppose \u03a8 and \u00b5 satisfy the same conditions as in Thm. 3, and de\ufb01ne \u03c8 =\u03a8 \u2032, then\nthe rate function is \u03b7(t) = (\u03c8 \u2217 \u00b5)(t). In particular, if \u03a6(t) = Ae\u2212\u03c9t1\u22650(t) = [aije\u2212\u03c9t1\u22650(t)]ij\nthen the rate function \u03b7(t) = \u00b5(t) + A\" t\n\n0 e(A\u2212wI)(t\u2212s)\u00b5(s)ds.\n\n3\n\n\f4 Multi-stage Closed-loop Control Problem\nGiven the analytical relation between exogenous intensity and expected overall intensity (rate func-\ntion), one can solve a single one-stage campaigning problem to \ufb01nd the optimal constant intervention\nintensity [8]. Alternatively, the time window can be partitioned into multiple stages and one can im-\npose different levels of interventions in these stages. This yields an open-loop optimization of the\ncost function where one selects all the intervention actions at initial time 0. More effectively, we\ntackle the campaigning problem in a dynamic and adaptive manner where we can postpone deciding\nthe intervention by observing the process until the next stage begins. This is called the closed-loop\noptimization of the objective function.\nIn this section, we establish the foundation to formulate the problem as a multi-stage closed-loop op-\ntimal control problem. We assume that n users are generating events according to multi-dimensional\nHawkes process with exogenous intensity \u00b5(t) \u2208 Rn and impact function \u03a6(t, s) \u2208 Rn\u00d7n.\nEvent exposure. Event exposure is the quantity of major interests in campaigning. The exposure\nprocess is mathematically represented as a counting process, E(t) = (E 1(t), . . . ,E n(t))\u22a4: Here,\nE i(t) records the number of times user i is exposed (she or one of her neighbors performs an activity)\nto the campaign by time t. Let B be the adjacency matrix of the user network, i.e., bij = 1 if user\ni follows user j or equivalently user j in\ufb02uences user i. We assume bii = 1 for all i. Then the\nexposure process is given by E(t) = B N (t).\nStages and interventions. Let [0, T ] be the time horizon and 0 = \u03c40 <\u03c4 1 < . . . <\u03c4 M\u22121 <\n\u03c4M = T be a partition into the M stages. In order to steer the activities of network towards a\ndesired level (criteria given below) at these stages, we impose a constant intervention um \u2208 Rn to\nthe existing exogenous intensity \u00b5 during time [\u03c4m,\u03c4 m+1) for each stage m = 0, 1, . . . , M \u2212 1. The\nactivity intensity at the m-th stage is \u03bbm(t) = \u00b5 + um +\" t\n0 \u03a6(t, s) dN (s) for \u03c4m \u2264 t <\u03c4 m+1\nwhere N (t) tracks the counting process of activities since t = 0. Note that the intervention itself\nexhibits a stochastic nature: adding ui\nm to \u00b5i is equivalent to incentivizing user i to increase her\nactivity rate but it is still uncertain when she will perform an activity, which appropriately mimics\nthe randomness in real-world campaigning.\nStates and state evolution. Note that the Hawkes process is non-Markov and one needs complete\nknowledge of the history to characterize the entire process. However, the conditional intensity \u03bb(t)\nonly depends on the state of process at time t when the standard exponential kernel \u03a6(t, s) =\nAe\u2212\u03c9(t\u2212s)1\u22650(t \u2212 s) is employed. In this case, the activity rate at stage m is\n\n(13)\n\n\u03c4m\n\n+& t\n\u2019\n\nAe\u2212\u03c9(t\u2212s) dN (s)\n*\n\n()\n\ncurrent stage\n\n\u03bbm(t) = \u00b5 + um +& \u03c4m\n\u2019\n\n0\n\nAe\u2212\u03c9(t\u2212s) dN (s)\n*\n\nfrom previous stages\n\n()\nDe\ufb01ne xm := \u03bbm\u22121(\u03c4m) \u2212 um\u22121 \u2212 \u00b5 (and x0 = 0 by convention) then the intensity due to events\nof all previous m stages can be written as\" \u03c4m\n0 Ae\u2212\u03c9(t\u2212s) dN (s) = xme\u2212\u03c9(t\u2212\u03c4m). In other words,\nxm is suf\ufb01cient to encode the information of activity in the past m stages that is relevant to future.\nThis is in sharp contrast to the general case where the state space grows with the number of events.\nObjective function. For a sequence of controls u(t) =!M\u22121\nm=0 um1[\u03c4m,\u03c4m+1)(t), the activity count-\ning process N (t) is generated by intensity \u03bb(t) = \u00b5 + u(t) +\" t\n0 Ae\u2212\u03c9(t\u2212s) dN (s). For each stage\nm from 0 to M \u2212 1, xm encodes the effects from previous m stages as above and um is the current\ncontrol imposed at this stage. Let E i\ndN i(s) be the number of times user i is\nexposed to the campaign by time t \u2208 [\u03c4m,\u03c4 m+1) in stage m, then the goal is to steer the expected\ntotal number of exposure \u00afE i\nm(\u03c4m+1; xm, um)] to a desired level. In what follows,\nwe introduce several instances of the objective function g(xm, um) in terms of { \u00afE i\nm(xm, um)}n\ni=1\nin each stage m that characterize different exposure shaping tasks. Then the overall control problem\nis to \ufb01nd u(t) that optimizes the total objective!M\u22121\n\u2022 Capped Exposure Maximization (CEM): In real networks, there is a cap on the exposure each user\ncan tolerate due to the limited attention of a user. Suppose we know the upper bound \u03b2i\nm , on user\ni\u2019s exposure tolerance over which the extra exposure is not counted towards the objective. Then,\nwe can form the following capped exposure maximization\n\nm(t; xm, um) := B\" t\n\nm(xm, um) := E[E i\n\nm=0 gm(xm, um).\n\n\u03c4m\n\ngm(xm, um) =\n\n1\nn\n\nn%i=1\n\nmin# \u00afE i\n\nm(xm, um),\u03b2 i\n\nm$\n\n(14)\n\n4\n\n\fAlgorithm 1: Closed-loop Multi-stage Dynamic Programming\n\nInput: Intervention constraints: c0 . . . cM\u22121, C0 . . . CM\u22121, \u03b10 . . .\u03b1 M\u22121,\nInput: Objective-speci\ufb01c constraints: \u03b20 . . .\u03b2 M\u22121 for CEM and \u03b30 . . .\u03b3 M\u22121 for LES\nInput: Time: T , Hawkes parameters: A, \u03c9\nOutput: Optimal intervention u0 . . . uM\u22121, Optimal cost: Cost\nSet x0 \u2190 0 and Cost \u2190 0\nfor l \u2190 0 : M \u2212 1 do\n\n(vl . . . vM\u22121) = open loop(xl) (Problems (24), (25), (26) for CEM, MEM, LES respectively)\nSet ul \u2190 vl and drop vl+1 . . . vM\u22121\nUpdate next state xl+1 \u2190 fl(xl, ul) and Cost = Cost + gl(xl, ul)\n\n\u2022 Minimum Exposure Maximization (MEM): Suppose our goal is instead to maintain the exposure\nof campaign on each user above a certain minimum level, at each stage or, alternatively to make\nthe user with the minimum exposure as exposed as possible, we can consider the following cost\nfunction:\n\ngm(xm, um) = min\n\n(15)\n\u2022 Least-squares Exposure Shaping (LES): Sometimes we want to achieve a pre-speci\ufb01ed target ex-\nposure levels, \u03b3m \u2208 Rn, for the users. For example, we may like to divide users into groups and\ndesire a different level of exposure in each group. To this end, we can perform least-squares cam-\npaigning task with the following cost function where D encodes potentially additional constraints\n(e.g., group partitions):\n\n\u00afE i\nm(xm, um)\n\ni\n\ngm(xm, um) = \u2212\n\n1\nn\u2225D \u00afEm(xm, um) \u2212 \u03b3m\u22252\n\n(16)\n\nPolicy and actions. By observing the counting process in previous stages (summarized in a se-\nquence of xm) and taking the future uncertainty into account, the control problem is to design a\npolicy \u03c0 = {\u03c0m : Rn \u2192 Rn : m = 0, . . . , M \u2212 1} such that the controls um = \u03c0m(xm) can maxi-\nmize the total objective!M\u22121\nm=0 gm(xm, um). In addition, we may have constraints on the amount of\ncontrol. For example, a budget constraint on the sum of all interventions to users at each stage, or, a\ncap over the amount of intensity a user can handle. A feasible set or an action space over which we\n\ufb01nd the best intervention is represented as Um :=#um \u2208 Rn|c\u22a4mum \u2264 Cm, 0 \" um \" \u03b1m$. Here,\n+ contains the price of each person per unit increase of exogenous intensity and Cm \u2208 R+\ncm \u2208 Rn\nis the total budget at stage m. Also, \u03b1m \u2208 Rn\nTo summarize, the following problem is formulated to \ufb01nd the optimal control policy \u03c0:\n\n+ is the cap on the amount of activities of the users.\n\nmaximize\n\n\u03c0\n\ngm(xm,\u03c0 m(xm)), subject to \u03c0m(xm) \u2208U m, for m = 0, . . . , M \u2212 1.\n\n(17)\n\nM\u22121%m=0\n\n5 Closed-loop Dynamic Programming Solution\nWe have formulated the control problem as an optimization in (17). However, when control\npolicy \u03c0m is to be implemented, only xm is observed and there are still uncertainties in future\n{xm+1, . . . , xM\u22121}. For instance, when \u03c0m is implemented according to xm starting from time\n\u03c4m, the intensity xm+1 := f (xm,\u03c0 m(xm)) at time \u03c4m+1 depends on xm and the control \u03c0m(xm),\nbut is also random due to the stochasticity of the process during time [\u03c4m,\u03c4 m+1). Therefore, the\ndesign of \u03c0 needs to take future uncertainties into considerations.\nSuppose we have arrived at stage M at time \u03c4M\u22121 with observation xM\u22121, then the optimal policy\n\u03c0M\u22121 satis\ufb01es gM\u22121(xM\u22121,\u03c0 M\u22121(xM\u22121)) = maxu\u2208UM\u22121 gM\u22121(xM\u22121, u) =: JM\u22121(xM\u22121).\nWe then repeat this procedure for m from M \u2212 1 to 0 backward to \ufb01nd the sequence of controls via\ndynamic programming such that the control \u03c0m(xm) \u2208U m yields optimal objective value\n(18)\n\nE[gm(xm, um) + Jm+1(f (xm, um))]\n\nJm(xm) = max\num\u2208Um\n\nApproximate Dynamic Programming. Solving (18) for \ufb01nding Jm(xm) analytically is intractable.\nTherefore, we will adopt an approximate dynamic programming scheme. In fact approximate con-\ntrol is as essential part of dynamic programming as the optimization is usually intractable due to\n\n5\n\n\fcurse of dimensionality except a few especial cases [3]. Here we adopt a suboptimal control scheme,\ncertainty equivalent control (CEC), which applies at each stage the control that would be optimal\nif the uncertain quantities were \ufb01xed at some typical values like the average behavior. It results in\nan optimal control sequence, the \ufb01rst component of which is used at the current stage, while the re-\nmaining components are discarded. The procedure is repeated for the remaining stages. Algorithm 1\nsummarizes the dynamic programing steps. This algorithm has two parts: (i) certainty equivalence\nwhich the random behavior is replaced by its average; and (ii) the open-loop optimization. Let\u2019s\nassume we are at the beginning of stage l of the Alg. 1 with state vector xl at \u03c4l.\n\nCertainty equivalence. We use the machinery developed in Sec. 3 to compute the average of\nexposure at any stage m = l, l + 1, . . . , M \u2212 1.\n\u00afEm(xm, um) = BE[N (\u03c4m+1) \u2212N (\u03c4m)] = BE+& \u03c4m+1\nwhere \u03b7m(t) = E[\u03bbm(t)] and \u03bbm(t) = \u00b5 + um + xle\u2212\u03c9(t\u2212\u03c4l) +\" t\nAe\u2212\u03c9(t\u2212s)dN (s) for t \u2208\n[\u03c4m,\u03c4 m+1). Now, we use the superposition property of point processes [4] to decompose the process\nas N (t) = N c(t) + N v(t) corresponding to \u03bbm(t) = \u03bbc\nm(t) =\n\u00b5 + um +\" t\nAe\u2212\u03c9(t\u2212s)dN c(s) consists of events caused by exogenous intensity at current stage m\nm(t) = xle\u2212\u03c9(t\u2212\u03c4l) +\" t\nand the second \u03bbv\nAe\u2212\u03c9(t\u2212s)dN v(s) is due to activities in previous stages.\nAccording to Thm. 2 we have\n\ndN (s), = B& \u03c4m+1\n\nm(t) where the \ufb01rst \u03bbc\n\nm(t) + \u03bbv\n\n\u03b7m(s) ds\n\n(19)\n\n\u03c4m\n\n\u03c4m\n\n\u03c4l\n\n\u03c4l\n\n\u03c4l\n\nm(t) := E[\u03bbc\n\u03b7c\n\nm(t)] = \u03a8(t \u2212 \u03c4l)\u00b5 +\u03a8( t \u2212 \u03c4l)ul +\n\nand according to Thm. 3 we have\n\nm\u22121%k=l+1\n\n\u03a8(t \u2212 \u03c4k)(uk \u2212 uk\u22121),\n\n(20)\n\nm(t)] =& t\n\n\u03c4l\n\nm(t) := E[\u03bbv\n\u03b7v\n\n\u03a8(t \u2212 s) d(xle\u2212\u03c9(s\u2212\u03c4l)1[\u03c4l,\u221e)(s)).\n\n(21)\n\nm(t) + \u03b7v\n\nm(t) yields:\n\nFrom now on, for simplicity, we assume stages are based on equal partition of [0, T ] to M segments\nwhere each has length \u2206M. Combining Eq. (19) and \u03b7m(t) = \u03b7c\n\u00afEm(xm, um) =\u0393((m \u2212 l + 1)\u2206M )ul + \u0393((m \u2212 l)\u2206M )(ul+1 \u2212 ul) + . . .\n\n+ \u0393(\u2206M )(um \u2212 um\u22121) + \u0393((m \u2212 l + 1)\u2206M )\u00b5 +\u03a5(( m \u2212 l + 1)\u2206M )xl\nwhere \u0393(t) and \u03a5(t) are matrices independent of um\u2019s and are de\ufb01ned in Appendix D. Note the\nlinear relation between average exposure \u00afEm(xm, um) and intervention values ul, . . . , um\u22121.\nOpen-loop optimization. Having found the average exposure at stages m = l, . . . , M\u22121 we formu-\nlate an open-loop optimization to \ufb01nd optimal ul, ul+1, . . . , uM\u22121. De\ufb01ning \u02c6ul = (ul; . . . ; uM\u22121)\nand \u02c6El = ( \u00afEl(xl, ul); . . . ; \u00afEM\u22121(xM\u22121, uM\u22121)) we can write\n\n(22)\n\nXl \u02c6ul + Yl\u00b5 + Wlxl = \u02c6El where Zl \u02c6ul \u2264 zl\n\n(23)\n\nand Xl, Yl, Wl, Zl, and zl are independent of \u02c6ul, \u00b5, and xl as de\ufb01ned in Appendix D.\nDe\ufb01ning the expanded form of constraint variables as \u02c6cl = (cl; . . . ; cM\u22121), \u02c6Cl = (Cl; . . . ; CM\u22121),\nand \u02c6\u03b1l = (\u03b1l; . . . ; \u03b1M\u22121) we provide the optimization from of the above exposure shaping tasks.\nFor CEM consider \u02c6\u03b2l = (\u03b2l; . . . ,\u03b2 M\u22121). Then the problem\n\nmaximize\u02c6h,\u02c6ul\n\n1\n\nn 1\u22a4\u02c6h subject to Xl \u02c6ul + Yl\u00b5 + Wlxl \u2265 \u02c6h, \u02c6\u03b2l \u2265 \u02c6h, Zl \u02c6ul \u2264 zl,\n\n(24)\n\nsolves CEM where h is an auxiliary vector of size n(M \u2212 l).\nFor MEM consider the auxiliary h as a vector of size M \u2212 l and \u02c6h a vector of size n(M \u2212 1).\n\u02c6h = (h(1); . . . ; h(1); h(2); . . . , h(2); . . . , h(M \u2212 l); . . . ; h(M \u2212 l)) where each h(k) is repeated n\ntimes. Then MEM is equivalent to\n\nmaximize\u02c6h,\u02c6ul\n\n1\u22a4\u02c6h subject to Xl \u02c6ul + Yl\u00b5 + Wlxl \u2265 \u02c6h, \u02c6\u03b2l \u2265 \u02c6h, Zl \u02c6ul \u2264 zl\n\n(25)\n\n6\n\n\fe\nr\nu\ns\no\np\nx\ne\n\n \nf\n\n \n\no\nm\nu\ns\n\n350\n\n300\n\n250\n\n200\n\n150\n\n100\n\nCLL OPL RND PRK WEI\n\ne\nr\nu\ns\no\np\nx\ne\nm\nu\nm\nn\nm\n\n \n\ni\n\ni\n\n5\n\n4.5\n\n4\n\n3.5\n\n3\n\n2.5\n\n2\n\n1.5\n\nCLL OPL RND WFL PRP\n\ne\nc\nn\na\n\ni\n\n \n\nt\ns\nd\ne\ng\na\nr\ne\nv\na\n\n\u00d7104\n\n1.5\n\n1\n\n0.5\n\nCLL OPL RND GRD REL\n\na) Capped maximization b) Minimum maximization c) Least-squares shaping\n\nFigure 1: The objective on simulated events and synthetic network; n = 300, M = 6, T = 40\n\nFor LES let \u02c6\u03b3l = (\u03b3l; . . . ; \u03b3M\u22121) and \u02c6Dl = diag(D, . . . , D), then\n\n1\n\nminimize\u02c6ul\n\nsubject to Zl \u02c6ul \u2264 zl\n\nn\u2225 \u02c6Dl(Xl \u02c6ul + Yl\u00b5 + Wlxl) \u2212 \u02c6\u03b3l\u22252\n\n(26)\nAll the three tasks involve convex (and linear) objective function with linear constraints which im-\npose a convex feasible set. Therefore, one can use the rich and well-developed literature on convex\noptimization and linear programming to \ufb01nd the optimum intervention.\n6 Experiments\nWe evaluate our campaigning framework using both simulated and real world data and show that\nour approach signi\ufb01cantly outperforms several baselines1.\nCampaigning results on synthetic networks. In this section, we experiment with a synthetic net-\nwork of 300 nodes. Details of the experimental setup and parameter setting are found in appendix\nF. We focus on three tasks: capped exposure maximization, minimax exposure shaping, and least\nsquare exposure shaping. To compare the methods we simulate the network with the prescribed\nintervention intensity and compute the objective function based on the events happened during the\nsimulation. The mean and standard deviation of the objective function out of 10 runs are reported.\nFig. 1 summarizes the performance of the proposed algorithm (CLL) and 4 other baselines on dif-\nferent campaigning tasks. For CEM, our approach consistently outperforms the others by at least\n10. This means it exposes each user to the campaign at least 10 times more than the rest consuming\nthe same budget and within the same constraints. The extra 20 units of exposures of over OPL or\nvalue of information shows how much we gain by incorporating a dynamic closed-loop solution\nas opposed to open-loop one-time optimization over all stages. For MEM, the proposed method\noutperforms the others by a smaller margin, however, the 0.1 exposure difference with the second\nbest method is not tri\ufb02ing. This is expected as lifting the minimum exposure is a dif\ufb01cult task [8].\nFor LES, results demonstrate the superiority of CLL by a large margin. The 103 difference with the\n\nsecond best algorithm aggregated over 6 stages roughly is translated to-103/6 \u223c 13 difference in\n\nthe number of exposures per user. Given the heterogeneity of the network activity and target shape,\nthis is a signi\ufb01cant improvement over the baselines. Appendix F includes further results on varying\nnumber of nodes, number of stages, and duration of each stage.\nCampaigning results on real world networks. We also evaluate the proposed framework on real\nworld data. To this end, we utilize the MemeTracker dataset [9] which contains the information \ufb02ows\ncaptured by hyperlinks between different sites with timestamps during 9 months. This data has been\npreviously used to validate Hawkes process models of social activity [5, 10]. For the real data, we\nutilize two evaluation procedures. First, similar to the synthetic case, we simulate the network, but\nnow on a network based on the learned parameters from real data. However, the more interesting\nevaluation scheme would entail carrying out real intervention in a social media platform. Since this\nis very challenging to do, instead, in this evaluation scheme we used held-out data to mimic such\nprocedure. Second, we form 10 pairs of clusters/cascades by selecting any 2 combinations of 5\nlargest clusters in the Memetracker data. Each is a cascade of events around a common subject. For\nany of these 10 pairs, the methods are faced to the question of predicting which cascade will reach\nthe objective function better. They should be able to answer this by measuring how similar their\nprescription is to the real exogenous intensity. The key point here is that the real events happened\nare used to evaluate the objective function of the methods. Then the results are reported on average\nprediction accuracy on all stages over 10 runs of random constraint and parameter initialization on\n10 pairs of cascades. The details of the experimental setup is further explained in Appendix F.\nFig. 2, left column illustrates the performance with respect to increasing the number of users in the\nnetwork. The performance drops slightly with the network size. This means that prediction becomes\n\n1codes are available at http://www.cc.gatech.edu/~mfarajta/\n\n7\n\n\fCLL OPL RND PRK WEI\n\nCLL OPL RND PRK WEI\n\nn\no\ni\nt\na\nz\ni\nm\ni\nx\na\nM\nd\ne\np\np\na\nC\n\nn\no\ni\nt\na\nz\ni\nm\ni\nx\na\nM\nm\nu\nm\ni\nn\ni\nM\n\ng\nn\ni\np\na\nh\nS\ns\ne\nr\na\nu\nq\ns\n-\nt\ns\na\ne\nL\n\ny\nc\na\nr\nu\nc\nc\na\n \nn\no\ni\nt\nc\nd\ne\nr\np\n\ni\n\ny\nc\na\nr\nu\nc\nc\na\n \nn\no\ni\nt\nc\nd\ne\nr\np\n\ni\n\ny\nc\na\nr\nu\nc\nc\na\n\n \n\nn\no\n\ni\n\ni\nt\nc\nd\ne\nr\np\n\n0.75\n\n0.7\n\n0.65\n\n0.6\n\n50\n\n0.65\n\n0.6\n\n0.55\n\n0.5\n\n0.45\n\n50\n\n0.65\n\n0.6\n\n0.55\n\n0.5\n\n50\n\n100\n\n150\n\n200\n\n250\n\nnetwork size\n\nCLL OPL RND WFL PRP\n\n100\n\n150\n\n200\n\n250\n\nnetwork size\n\nCLL OPL RND GRD REL\n\n100\n\n150\n\n200\n\n250\n\nnetwork size\n\ny\nc\na\nr\nu\nc\nc\na\n \nn\no\ni\nt\nc\nd\ne\nr\np\n\ni\n\ny\nc\na\nr\nu\nc\nc\na\n \nn\no\ni\nt\nc\nd\ne\nr\np\n\ni\n\ny\nc\na\nr\nu\nc\nc\na\nn\no\n\n \n\ni\n\ni\nt\nc\nd\ne\nr\np\n\n0.8\n\n0.75\n\n0.7\n\n0.65\n\n0.6\n\n2\n\n0.65\n\n0.6\n\n0.55\n\n0.5\n\n0.45\n\n0.4\n\n2\n\n0.65\n\n0.6\n\n0.55\n\n0.5\n\n0.45\n\n2\n\n4\n8\nintervention points\n\n6\n\n10\n\nCLL OPL RND WFL PRP\n\n4\n8\nintervention points\n\n6\n\n10\n\nCLL OPL RND GRD REL\n\n8\n4\nintervention points\n\n6\n\n10\n\ne\nr\nu\ns\no\np\nx\ne\n\n \nf\n\n \n\no\nm\nu\ns\n\n20\n\n15\n\n10\n\n5\n\nCLL OPL RND PRK WEI\n\nmethods\n\ne\nr\nu\ns\no\np\nx\ne\nm\nu\nm\nn\nm\n\n \n\ni\n\ni\n\n1.5\n\n1\n\n0.5\n\n0\n\nCLL OPL RND WFL PRP\n\nmethods\n\n7000\n\n6500\n\ne\nc\nn\na\n\ni\n\n \n\nt\ns\nd\ne\ng\na\nr\ne\nv\na\n\n6000\n\n5500\n\n5000\n\nCLL OPL RND GRD REL\n\nmethods\n\nPerformance vs. # users\n\nPerformance vs. # points\n\nFigure 2: real world dataset results; n = 300, M = 6, T = 40\n\nObjective function\n\nmore dif\ufb01cult as more random variables are involved. The middle panel shows the performance with\nrespect to increasing the number of intervention points. Here, a slight increase in the performance\nis apparent. As the number of intervention points increases the algorithm has more control over the\noutcome and can reach the objective function better.\nFig. 2 top row summarizes the results of CEM. The left panel demonstrates the predictive perfor-\nmance of the algorithms. CLL consistently outperforms the rest. With 65-70 % of accuracy in\npredicting the optimal cascade. The right panel shows the objective function simulated 10 times\nwith the learned parameters for network of n = 300 users on 6 intervention points. The extra 2.5\nextra exposure per user compared to the second best method with the same budget and constraint\nwould be a signi\ufb01cant advertising achievement. Among the competitors OPL and RND seem to\nperform good. If there where no cap over the resultant exposure, all methods would perform com-\nparably because of the linearity of sum of exposure. However, the successful method is the one who\nmanage to maximize exposure considering the cap. Failure of PRK and WEI indicates that structural\nproperties are not enough to capture the in\ufb02uence. Compared to these two, RND performs better in\naverage, however exhibits a larger variance as expected.\nFig. 2 middle row summarizes the results for MEM and shows CLL outperforms others consistently.\nCLL still is the best algorithm and OPL and RND are the signi\ufb01cant baselines. Failure of WFL and\nPRP shows the network structure plays a signi\ufb01cant role in the activity and exposure processes.\nThe bottom row in Fig. 2 demonstrates the results of LES. CLL is still the best method. OPL is still\nstrong but RND is not performing well. The objective function is summation of the square of the\ngap between target and current exposure. This explains why GRD is showing a comparable success,\nsince, it starts with the highest gap in the exposure and greedily allocates the budget.\nConclusion. In this paper, we introduced the optimal multistage campaigning problem, which is a\ngeneralization of the activity shaping and in\ufb02uence maximization problems, and it allows for more\nelaborate goal functions. Our model of social activity is based on multivariate Hawkes process,\nand for the \ufb01rst time, we manage to derive a linear connection between a time-varying exogenous\nintensity and the overall network exposure of the campaign.\nAcknowledgement. The work is supported in part by NSF/NIH BIGDATA R01 GM108341, NSF\nIIS-1639792, NSF DMS-1620345, and NSF DMS-1620342.\n\n8\n\n\fReferences\n\n[1] D M West. Air Wars: Television Advertising and Social Media in Election Campaigns, 1952-2012:\n\nTelevision Advertising and Social Media in Election Campaigns, 1952-2012. Sage, 2013.\n\n[2] M Vergeer, L Hermans, and S Sams. Online social networks and micro-blogging in political campaigning\n\nthe exploration of a new campaign tool and a new campaign style. Party Politics, 2013.\n\n[3] Dimitri P Bertsekas. Dynamic programming and optimal control, volume 1.\n[4] D J Daley and D Vere-Jones. An introduction to the theory of point processes. Springer Science &\n\nBusiness Media, 2007.\n\n[5] K Zhou, H Zha, and L Song. Learning social infectivity in sparse low-rank networks using multi-\n\ndimensional hawkes processes. In AISTATS, 2013.\n\n[6] O Aalen, O Borgan, and H Gjessing. Survival and event history analysis: a process point of view.\n\nSpringer, 2008.\n\n[7] A G Hawkes. Spectra of some self-exciting and mutually exciting point processes. Biometrika, 1971.\n[8] M Farajtabar, N Du, M Gomez-Rodriguez, I Valera, L Song, and H Zha. Shaping social activity by\n\nincentivizing users. NIPS, 2014.\n\n[9] J Leskovec, L Backstrom, and J Kleinberg. Meme-tracking and the dynamics of the news cycle.\n\nSIGKDD, 2009.\n\n[10] SH Yang and H Zha. Mixture of mutually exciting processes for viral diffusion. ICML, 2013.\n[11] D Kempe, J Kleinberg, and E Tardos. Maximizing the spread of in\ufb02uence through a social network.\n\nSIGKDD, 2003.\n\n[12] FB Hanson. Applied stochastic processes and control for Jump-diffusions: modeling, analysis, and\n\ncomputation, volume 13. Siam, 2007.\n\n[13] A De, I Valera, N Ganguly, S Bhattacharya, and M Gomez Rodriguez. Modeling opinion dynamics in\n\ndiffusion networks. arXiv:1506.05474, 2015.\n\n[14] Y Wang, E Theodorou, A Verma, and L Song. Steering opinion dynamics in information diffusion\n\nnetworks. arXiv:1603.09021, 2016.\n\n[15] D Bloembergen, B Ranjbar Sahraei, H Bou-Ammar, K Tuyls, and G Weiss. In\ufb02uencing social networks:\n\nAn optimal control study. In ECAI, 2014.\n\n[16] K Kandhway and J Kuri. Campaigning in heterogeneous social networks: Optimal control of si infor-\n\nmation epidemics. 2015.\n\n[17] Pin-Yu Chen, Shin-Ming Cheng, and Kwang-Cheng Chen. Optimal control of epidemic information\n\ndissemination over networks. Cybernetics, IEEE Transactions on, 2014.\n\n[18] W Lian, R Henao, V Rao, J Lucas, and L Carin. A multitask point process predictive model. ICML,\n\n2015.\n\n[19] AP Parikh, A Gunawardana, and C Meek. Conjoint modeling of temporal dependencies in event streams.\n\nUAI, 2012.\n\n[20] PO Perry and PJ Wolfe. Point process modeling for directed interaction networks. Journal of the Royal\n\nStatistical Society, 2013.\n\n[21] SW Linderman and RP Adams. Discovering latent network structure in point process data. ICML, 2014.\n[22] C Blundell, J Beck, and KA Heller. Modelling reciprocating relationships with hawkes processes. NIPS,\n\n2012.\n\n[23] T Iwata, A Shah, and Z Ghahramani. Discovering latent in\ufb02uence in online social activities via shared\n\ncascade poisson processes. SIGKDD, 2013.\n\n[24] O Hijab. Introduction to calculus and classical analysis. Springer, 2007.\n[25] GB Folland. Real analysis: modern techniques and their applications. John Wiley & Sons, 2013.\n[26] R Bracewell. The fourier transform and iis applications. New York, 5, 1965.\n[27] AH Al-Mohy and MJ Higham. Computing the action of the matrix exponential, with an application to\n\nexponential integrators. SIAM journal on scienti\ufb01c computing, 2011.\n\n9\n\n\f", "award": [], "sourceid": 2395, "authors": [{"given_name": "Mehrdad", "family_name": "Farajtabar", "institution": "Georgia Tech"}, {"given_name": "Xiaojing", "family_name": "Ye", "institution": "Georgia State University"}, {"given_name": "Sahar", "family_name": "Harati", "institution": "Emory University"}, {"given_name": "Le", "family_name": "Song", "institution": "Georgia Institute of Technology"}, {"given_name": "Hongyuan", "family_name": "Zha", "institution": "Georgia Institute of Technology"}]}