{"title": "Shaping Social Activity by Incentivizing Users", "book": "Advances in Neural Information Processing Systems", "page_first": 2474, "page_last": 2482, "abstract": "Events in an online social network can be categorized roughly into endogenous events, where users just respond to the actions of their neighbors within the network, or exogenous events, where users take actions due to drives external to the network. How much external drive should be provided to each user, such that the network activity can be steered towards a target state? In this paper, we model social events using multivariate Hawkes processes, which can capture both endogenous and exogenous event intensities, and derive a time dependent linear relation between the intensity of exogenous events and the overall network activity. Exploiting this connection, we develop a convex optimization framework for determining the required level of external drive in order for the network to reach a desired activity level. We experimented with event data gathered from Twitter, and show that our method can steer the activity of the network more accurately than alternatives.", "full_text": "Shaping Social Activity by Incentivizing Users\n\nMehrdad Farajtabar\u2217\nIsabel Valera\u2021\n\nNan Du\u2217\nHongyuan Zha\u2217\n\nManuel Gomez-Rodriguez\u2020\n\nLe Song\u2217\n\nGeorgia Institute of Technology\u2217\n\nMPI for Software Systems\u2020\n\nUniv. Carlos III in Madrid\u2021\n\n{mehrdad,dunan}@gatech.edu\n{zha,lsong}@cc.gatech.edu\n\nmanuelgr@mpi-sws.org\nivalera@tsc.uc3m.es\n\nAbstract\n\nEvents in an online social network can be categorized roughly into endogenous\nevents, where users just respond to the actions of their neighbors within the net-\nwork, or exogenous events, where users take actions due to drives external to the\nnetwork. How much external drive should be provided to each user, such that the\nnetwork activity can be steered towards a target state? In this paper, we model\nsocial events using multivariate Hawkes processes, which can capture both en-\ndogenous and exogenous event intensities, and derive a time dependent linear re-\nlation between the intensity of exogenous events and the overall network activity.\nExploiting this connection, we develop a convex optimization framework for de-\ntermining the required level of external drive in order for the network to reach a\ndesired activity level. We experimented with event data gathered from Twitter,\nand show that our method can steer the activity of the network more accurately\nthan alternatives.\n\n1 Introduction\nOnline social platforms routinely track and record a large volume of event data, which may corre-\nspond to the usage of a service (e.g., url shortening service, bit.ly). These events can be categorized\nroughly into endogenous events, where users just respond to the actions of their neighbors within\nthe network, or exogenous events, where users take actions due to drives external to the network.\nFor instance, a user\u2019s tweets may contain links provided by bit.ly, either due to his forwarding of a\nlink from his friends, or due to his own initiative to use the service to create a new link.\nCan we model and exploit these data to steer the online community to a desired activity level?\nSpeci\ufb01cally, can we drive the overall usage of a service to a certain level (e.g., at least twice per\nday per user) by incentivizing a small number of users to take more initiatives? What if the goal is\nto make the usage level of a service more homogeneous across users? What about maximizing the\noverall service usage for a target group of users? Furthermore, these activity shaping problems need\nto be addressed by taking into account budget constraints, since incentives are usually provided in\nthe form of monetary or credit rewards.\nActivity shaping problems are signi\ufb01cantly more challenging than traditional in\ufb02uence maximiza-\ntion problems, which aim to identify a set of users, who, when convinced to adopt a product, shall\nin\ufb02uence others in the network and trigger a large cascade of adoptions [1, 2]. First, in in\ufb02uence\nmaximization, the state of each user is often assumed to be binary, either adopting a product or\nnot [1, 3, 4, 5]. However, such assumption does not capture the recurrent nature of product usage,\nwhere the frequency of the usage matters. Second, while in\ufb02uence maximization methods identify\na set of users to provide incentives, they do not typically provide a quantitative prescription on how\nmuch incentive should be provided to each user. Third, activity shaping concerns a larger variety of\ntarget states, such as minimum activity and homogeneity of activity, not just activity maximization.\nIn this paper, we will address the activity shaping problems using multivariate Hawkes processes [6],\nwhich can model both endogenous and exogenous recurrent social events, and were shown to be a\ngood \ufb01t for such data in a number of recent works (e.g., [7, 8, 9, 10, 11, 12]). More importantly,\n\n1\n\n\fwe will go beyond model \ufb01tting, and derive a novel predictive formula for the overall network ac-\ntivity given the intensity of exogenous events in individual users, using a connection between the\nprocesses and branching processes [13, 14, 15, 16]. Based on this relation, we propose a convex\noptimization framework to address a diverse range of activity shaping problems given budget con-\nstraints. Compared to previous methods for in\ufb02uence maximization, our framework can provide\nmore \ufb01ne-grained control of network activity, not only steering the network to a desired steady-state\nactivity level but also do so in a time-sensitive fashion. For example, our framework allows us to\nanswer complex time-sensitive queries, such as, which users should be incentivized, and by how\nmuch, to steer a set of users to use a product twice per week after one month?\nIn addition to the novel framework, we also develop an ef\ufb01cient gradient based optimization al-\ngorithm, where the matrix exponential needed for gradient computation is approximated using the\ntruncated Taylor series expansion [17]. This algorithm allows us to validate our framework in a\nvariety of activity shaping tasks and scale up to networks with tens of thousands of nodes. We also\nconducted experiments on a network of 60,000 Twitter users and more than 7,500,000 uses of a pop-\nular url shortening services. Using held-out data, we show that our algorithm can shape the network\nbehavior much more accurately than alternatives.\n2 Modeling Endogenous-Exogenous Recurrent Social Events\nWe model the events generated by m users in a social network as a m-dimensional counting process\nN (t) = (N1(t), N2(t), . . . , Nm(t))\", where Ni(t) records the total number of events generated by\nuser i up to time t. Furthermore, we represent each event as a tuple (ui, ti), where ui is the user iden-\ntity and ti is the event timing. Let the history of the process up to time t be Ht := {(ui, ti)| ti ! t},\nand Ht\u2212 be the history until just before time t. Then the increment of the process, dN (t), in an in-\n\ufb01nitesimal window [t, t + dt] is parametrized by the intensity \u03bb(t) = (\u03bb1(t), . . . ,\u03bb m(t))\" \" 0, i.e.,\n(1)\nIntuitively, the larger the intensity \u03bb(t), the greater the likelihood of observing an event in the time\nwindow [t, t + dt]. For instance, a Poisson process in [0,\u221e) can be viewed as a special counting\nprocess with a constant intensity function \u03bb, independent of time and history. To model the presence\nof both endogenous and exogenous events, we will decompose the intensity into two terms\n\nE[dN (t)|Ht\u2212] = \u03bb(t) dt.\n\n=\n\n+\n\n,\n\n(2)\n\noverall event intensity\n\nexogenous event intensity\n\nendogenous event intensity\n\nwhere the exogenous event intensity models drive outside the network, and the endogenous event\nintensity models interactions within the network. We assume that hosts of social platforms can\npotentially drive up or down the exogenous events intensity by providing incentives to users; while\nendogenous events are generated due to users\u2019 own interests or under the in\ufb02uence of network peers,\nand the hosts do not interfere with them directly. The key questions in the activity shaping context\nare how to model the endogenous event intensity which are realistic to recurrent social interactions,\nand how to link the exogenous event intensity to the endogenous event intensity. We assume that the\nexogenous event intensity is independent of the history and time, i.e., \u03bb(0)(t) = \u03bb(0).\n2.1 Multivariate Hawkes Process\nRecurrent endogenous events often exhibit the characteristics of self-excitation, where a user tends\nto repeat what he has been doing recently, and mutual-excitation, where a user simply follows what\nhis neighbors are doing due to peer pressure. These social phenomena have been made analogy to\nthe occurrence of earthquake [18] and the spread of epidemics [19], and can be well-captured by\nmultivariate Hawkes processes [6] as shown in a number of recent works (e.g., [7, 8, 9, 10, 11, 12]).\nMore speci\ufb01cally, a multivariate Hawkes process is a counting process who has a particular form\nof intensity. We assume that the strength of in\ufb02uence between users is parameterized by a sparse\nnonnegative in\ufb02uence matrix A = (auu!)u,u!\u2208[m], where auu! > 0 means user u% directly excites\nuser u. We also allow A to have nonnegative diagonals to model self-excitation of a user. Then, the\nintensity of the u-th dimension is\n\n\u03bb\u2217u(t) =%i:ti<t\n\nauui g(t \u2212 ti) =%u!\u2208[m]\n\nauu!& t\n\n0\n\nwhere g(s) is a nonnegative kernel function such that g(s) = 0 for s \u2264 0 and \u2019 \u221e\n0 g(s) ds <\n\u221e; the second equality is obtained by grouping events according to users and use the fact that\n\ng(t \u2212 s) dNu!(s),\n\n(3)\n\n\u03bb(t)\n\n!\"#$\n\n\u03bb(0)(t)\n\n! \"# $\n\n\u03bb\u2217(t)\n\n!\"#$\n\n2\n\n\f3\n\n5\n\n6\n\n1\n\n2\n\n4\n\n1\n\nt1\n2\n\n1\n3\n\n1\n\nt2\n1\n5\n\n3\n\nt3\n\n3\n6\n5\n5\n\n2\n1\n3\n\n5\n24\n\n1\n5\n\n6\n\n3\n\n2\n\n2\n\n3\n\n1\n\n1\n\n2\n\n4\n\nt\n\n(a) An example social network\n\n(b) Branching structure of events\n\nFigure 1: In Panel (a), each directed edge indicates that the target node follows, and can be in\ufb02uenced\nby, the source node. The activity in this network is modeled using Hawkes processes, which result in\nbranching structure of events shown in Panel (b). Each exogenous event is the root node of a branch\n(e.g., top left most red circle at t1), and it occurs due to a user\u2019s own initiative; and each event can\ntrigger one or more endogenous events (blue square at t2). The new endogenous events can create\nthe next generation of endogenous events (green triangles at t3), and so forth. The social network\nwill constrain the branching structure of events, since an event produced by a user (e.g., user 1) can\nonly trigger endogenous events in the same user or one or more of her followers (e.g., user 2 or 3).\n\n\u2019 t\n0 g(t \u2212 s) dNu!(s) = (ui=u!,ti<t g(t \u2212 ti). Intuitively, \u03bb\u2217u(t) models the propagation of peer\n\nin\ufb02uence over the network \u2014 each event (ui, ti) occurred in the neighbor of a user will boost her\nintensity by a certain amount which itself decays over time. Thus, the more frequent the events\noccur in the user\u2019s neighbor, the more likely she will be persuaded to generate a new event.\nFor simplicity, we will focus on an exponential kernel, g(t\u2212 ti) = exp(\u2212\u03c9(t\u2212 ti)) in the reminder\nof the paper. However, multivariate Hawkes processes and the branching processed explained in\nnext section is independent of the kernel choice and can be extended to other kernels such as power-\nlaw, Rayleigh or any other long tailed distribution over nonnegative real domain. Furthermore, we\ncan rewrite equation (3) in vectorial format\n\n\u03bb\u2217(t) =& t\n\n0\n\n(4)\nby de\ufb01ning a m \u00d7 m time-varying matrix G(t) = (auu!g(t))u,u!\u2208[m]. Note that, for multivariate\nHawkes processes, the intensity, \u03bb(t), itself is a random quantity, which depends on the history Ht.\nWe denote the expectation of the intensity with respect to history as\n(5)\n\nG(t \u2212 s) dN (s),\n\n[\u03bb(t)]\n\n\u00b5(t) := EHt\u2212\n\n2.2 Connection to Branching Processes\nA branching process is a Markov process that models a population in which each individual in\ngeneration k produces some random number of individuals in generation k + 1, according some\ndistribution [20]. In this section, we will conceptually assign both exogenous events and endogenous\nevents in the multivariate Hawkes process to levels (or generations), and associate these events with\na branching structure which records the information on which event triggers which other events (see\nFigure 1 for an example). Note that this genealogy of events should be interpreted in probabilistic\nterms and may not be observed in actual data. Such connection has been discussed in Hawkes\u2019\noriginal paper on one dimensional Hawkes processes [21], and it has recently been revisited in the\ncontext of multivariate Hawkes processes by [11]. The branching structure will play a crucial role in\nderiving a novel link between the intensity of the exogenous events and the overall network activity.\nMore speci\ufb01cally, we assign all exogenous events to the zero-th generation, and record the number\nof such events as N (0)(t). These exogenous events will trigger the \ufb01rst generation of endogenous\nevents whose number will be recorded as N (1)(t). Next these \ufb01rst generation of endogenous events\nwill further trigger a second generation of endogenous events N (2)(t), and so on. Then the total\nnumber of events in the network is the sum of the numbers of events from all generations\n\nN (t) = N (0)(t) + N (1)(t) + N (2)(t) + . . .\n\n(6)\nFurthermore, denote all events in generation k \u2212 1 as H(k\u22121)\n. Then, independently for each event\n(ui, ti) \u2208H (k\u22121)\nin generation k \u2212 1, it triggers a Poisson process in its neighbor u independently\nwith intensity auuig(t\u2212ti). Due to the superposition theorem of independent Poisson processes [22],\n\nt\n\nt\n\n3\n\n\fthe intensity, \u03bb(k)\nof the Poisson processes triggered by all its neighbors, i.e., \u03bb(k)\n\nu (t), of events at node u and generation k is simply the sum of conditional intensities\nauuig(t \u2212\n(s). Concatenate the intensity for all u \u2208 [m], and use the\n\nu (t) = ((ui,ti)\u2208H(k\u22121)\n\n0 g(t \u2212 s) dN (k\u22121)\ntime-varying matrix G(t) (4), we have\n\nti) = (u!\u2208[m]\u2019 t\n\nu!\n\nt\n\n\u03bb(k)(t) =& t\n\n0\n\nG(t \u2212 s) dN (k\u22121)(s),\n\n(7)\n\nwhere \u03bb(k)(t) = (\u03bb(k)\nm (t))\" is the intensity for counting process N (k)(t) at k-th gen-\neration. Again, due to the superposition of independent Poisson processes, we can decompose the\nintensity of N (t) into a sum of conditional intensities from different generation\n\n1 (t), . . . ,\u03bb (k)\n\n\u03bb(t) = \u03bb(0)(t) + \u03bb(1)(t) + \u03bb(2)(t) + . . .\n\n(8)\nNext, based on the above decomposition, we will develop a closed form relation between the ex-\npected intensity \u00b5(t) = EHt\u2212\n[\u03bb(t)] and the intensity, \u03bb(0)(t), of the exogenous events. This rela-\ntion will form the basis of our activity shaping framework.\n3 Linking Exogenous Event Intensity to Overall Network Activity\nOur strategy is to \ufb01rst link the expected intensity \u00b5(k)(t) := EHt\u2212\ngeneration with \u03bb(0)(t), and then derive a close form for the in\ufb01nite series sum\n\n[\u03bb(k)(t)] of events at the k-th\n\n(9)\n\n(10)\n\n\u00b5(t) = \u00b5(0)(t) + \u00b5(1)(t) + \u00b5(2)(t) + . . .\n\nDe\ufb01ne a series of auto-convolution matrices, one for each generation, with G(!0)(t) = I and\n\nG(!k)(t) =& t\n\n0\n\nG(t \u2212 s) G(!k\u22121)(s) ds = G(t) # G(!k\u22121)(t)\n\nThen the expected intensity of events at the k-th generation is related to exogenous intensity \u03bb(0) by\n\nLemma 1 \u00b5(k)(t) = G(!k)(t) \u03bb(0).\nNext, by summing together all auto-convolution matrices,\n\n\u03a8(t) := I + G(!1)(t) + G(!2)(t) + . . .\n\nwe obtain a linear relation between the expected intensity of the network and the intensity of the\nexogenous events, i.e., \u00b5(t) = \u03a8(t)\u03bb(0). The entries in the matrix \u03a8(t) roughly encode the \u201cin\ufb02u-\nence\u201d between pairs of users. More precisely, the entry \u03a8uv(t) is the expected intensity of events\nat node u due to a unit level of exogenous intensity at node v. We can also derive several other\n\nWith Lemma 2, we are in a position to prove our main theorem below:\n\nfunction, and we have the following intermediate results on the Laplace transform of G(!k)(t).\n\nin\ufb02uence user v has on all users. Surprisingly, for exponential kernel, the in\ufb01nite sum of matrices\n\nuseful quantities from \u03a8(t). For example, \u03a8\u2022v(t) :=(u \u03a8uv(t) can be thought of as the overall\nresults in a closed form using matrix exponentials. First, let)\u00b7 denote the Laplace transform of a\nLemma 2 )G(!k)(z) =\u2019 \u221e\nTheorem 3 \u00b5(t) = \u03a8(t)\u03bb(0) =*e(A\u2212\u03c9I)t + \u03c9(A \u2212 \u03c9I)\u22121(e(A\u2212\u03c9I)t \u2212 I)+ \u03bb(0).\n\nTheorem 3 provides us a linear relation between exogenous event intensity and the expected overall\nintensity at any point in time but not just stationary intensity. The signi\ufb01cance of this result is that\nit allows us later to design a diverse range of convex programs to determine the intensity of the\nexogenous event in order to achieve a target intensity.\nIn fact, we can recover the previous results in the stationary case as a special case of our general\nresult. More speci\ufb01cally, a multivariate Hawkes process is stationary if the spectral radius\n\n0 G(!k)(t) dt = 1\n\nz \u00b7 Ak\n\n(z+\u03c9)k\n\n\u0393 :=& \u221e\n\n0\n\nG(t) dt =,& \u221e\n\n0\n\n(11)\nis strictly smaller than 1 [6]. In this case, the expected intensity is \u00b5 = (I \u2212 \u0393)\u22121\u03bb(0) independent\nof the time. We can obtain this relation from theorem 3 if we let t \u2192 \u221e.\nCorollary 4 \u00b5 = (I \u2212 \u0393)\u22121 \u03bb(0) = limt\u2192\u221e \u03a8(t) \u03bb(0).\nRefer to Appendix A for all the proofs.\n\n=\n\ng(t) dt-.auu!/u,u!\u2208[m]\n\nA\n\u03c9\n\n4\n\n\f4 Convex Activity Shaping Framework\nGiven the linear relation between exogenous event intensity and expected overall event intensity, we\nnow propose a convex optimization framework for a variety of activity shaping tasks. In all tasks\ndiscussed below, we will optimize the exogenous event intensity \u03bb(0) such that the expected overall\nevent intensity \u00b5(t) is maximized with respect to some concave utility U (\u00b7) in \u00b5(t), i.e.,\n\nmaximize\u00b5(t),\u03bb(0) U (\u00b5(t))\nsubject to\n\n\u00b5(t) = \u03a8(t)\u03bb(0),\n\nc\"\u03bb(0) ! C, \u03bb(0) \" 0\n\n(12)\n\nwhere c = (c1, . . . , cm)\" \" 0 is the cost per unit event for each user and C is the total budget.\nAdditional regularization can also be added to \u03bb(0) either to restrict the number of incentivized\nusers (with $0 norm \u2019\u03bb(0)\u20190), or to promote a sparse solution (with $1 norm \u2019\u03bb(0)\u20191, or to obtain a\nsmooth solution (with $2 regularization \u2019\u03bb(0)\u20192). We next discuss several instances of the general\nframework which achieve different goals (their constraints remain the same and hence omitted).\nCapped Activity Maximization. In real networks, there is an upper bound (or a cap) on the activity\neach user can generate due to limited attention of a user. For example, a Twitter user typically posts\na limited number of shortened urls or retweets a limited number of tweets [23]. Suppose we know\nthe upper bound, \u03b1u, on a user\u2019s activity, i.e., how much activity each user is willing to generate.\nThen we can perform the following capped activity maximization task\n\nmaximize\u00b5(t),\u03bb(0) (u\u2208[m] min{\u00b5u(t),\u03b1 u}\n\n(13)\n\n(16)\n\nMinimax Activity Shaping. Suppose our goal is instead maintaining the activity of each user in the\nnetwork above a certain minimum level, or, alternatively make the user with the minimum activity\nas active as possible. Then, we can perform the following minimax activity shaping task\n\nmaximize\u00b5(t),\u03bb(0) minu \u00b5u(t)\n\n(14)\n\nLeast-Squares Activity Shaping. Sometimes we want to achieve a pre-speci\ufb01ed target activity\nlevels, v, for users. For example, we may like to divide users into groups and desire a different level\nof activity in each group. Inspired by these examples, we can perform the following least-squares\nactivity shaping task\n\nmaximize\u00b5(t),\u03bb(0) \u2212\u2019B\u00b5(t) \u2212 v\u20192\n\n(15)\nwhere B encodes potentially additional constraints (e.g., group partitions). Besides Euclidean dis-\ntance, the family of Bregman divergences can be used to measure the difference between B\u00b5(t)\nand v here. That is, given a function f (\u00b7) : Rm (\u2192 R convex in its argument, we can use\nD(B\u00b5(t)\u2019v) := f (B\u00b5(t)) \u2212 f (v) \u2212 )\u2207f (v), B\u00b5(t) \u2212 v+ as our objective function.\nActivity Homogenization. Many other concave utility functions can be used. For example, we may\nwant to steer users activities to a more homogeneous pro\ufb01le. If we measure homogeneity of activity\nwith Shannon entropy, then we can perform the following activity homogenization task\n\n2\n\nmaximize\u00b5(t),\u03bb(0) \u2212(u\u2208[m] \u00b5u(t) ln \u00b5u(t)\n\n5 Scalable Algorithm\nAll the activity shaping problems de\ufb01ned above require an ef\ufb01cient evaluation of the instantaneous\naverage intensity \u00b5(t) at time t, which entails computing matrix exponentials to obtain \u03a8(t). In\nsmall or medium networks, we can rely on well-known numerical methods to compute matrix expo-\nnentials [24]. However, in large networks, the explicit computation of \u03a8(t) becomes intractable.\nFortunately, we can exploit the following key property of our convex activity shaping framework:\nthe instantaneous average intensity only depends on \u03a8(t) through matrix-vector product operations.\nIn particular, we start by using Theorem 3 to rewrite the multiplication of \u03a8(t) and a vector v\n\n\ufb01rst computing e(A\u2212\u03c9I)tv ef\ufb01ciently, subtracting v from it, and solving a sparse linear system of\n\nas \u03a8(t)v = e(A\u2212\u03c9I)tv + \u03c9(A \u2212 \u03c9I)\u22121*e(A\u2212\u03c9I)tv \u2212 v+. We then get a tractable solution by\nequations, (A \u2212 \u03c9I)x = *e(A\u2212\u03c9I)tv \u2212 v+, ef\ufb01ciently. The steps are illustrated in Algorithm 1.\n\nNext, we elaborate on two very ef\ufb01cient algorithms for computing the product of matrix exponential\nwith a vector and for solving a sparse linear system of equations.\nFor the computation of the product of matrix exponential with a vector, we rely on the iterative\nalgorithm by Al-Mohy et al. [17], which combines a scaling and squaring method with a truncated\nTaylor series approximation to the matrix exponential. For solving the sparse linear system of equa-\n\n5\n\n\fAlgorithm 1: Average Instantaneous Intensity\ninput : A, \u03c9, t, v\noutput: \u03a8(t)v\nv1 = e(A\u2212\u03c9I)tv\nv2 = v2 \u2212 v;\nv3 = (A \u2212 \u03c9I)\u22121v2\nreturn v1 + \u03c9v3;\n\nAlgorithm 2: PGD for Activity Shaping\nInitialize \u03bb(0);\nrepeat\n\n1- Project \u03bb(0) into \u03bb(0) \" 0, c!\u03bb(0) ! C;\n2- Evaluate the gradient g(\u03bb(0)) at \u03bb(0);\n3- Update \u03bb(0) using the gradient g(\u03bb(0));\n\nuntil convergence;\n\ntion, we use the well-known GMRES method [25], which is an Arnoldi process for constructing\nan l2-orthogonal basis of Krylov subspaces. The method solves the linear system by iteratively\nminimizing the norm of the residual vector over a Krylov subspace.\nPerhaps surprisingly, we will now show that it is possible to compute the gradient of the objec-\ntive functions of all our activity shaping problems using the algorithm developed above for com-\nputing the average instantaneous intensity. We only need to de\ufb01ne the vector v appropriately\nfor each problem, as follows: (i) Activity maximization: g(\u03bb(0)) = \u03a8(t)\"v, where v is de-\n\ufb01ned such that vj = 1 if \u03b1j > \u00b5j, and vj = 0, otherwise.\n(ii) Minimax activity shaping:\ng(\u03bb(0)) = \u03a8(t)\"e, where e is de\ufb01ned such that ej = 1 if \u00b5j = \u00b5min, and ej = 0, otherwise. (iii)\n\nLeast-squares activity shaping: g(\u03bb(0)) = 2\u03a8(t)\"B\"*B\u03a8(t)\u03bb(0) \u2212 v+ . (iv) Activity homoge-\nnization: g(\u03bb(0)) = \u03a8(t)\" ln (\u03a8(t)\u03bb(0)) + \u03a8(t)\"1, where ln(\u00b7) on a vector is the element-wise\nnatural logarithm. Since the activity maximization and the minimax activity shaping tasks require\nonly one evaluation of \u03a8(t) times a vector, Algorithm 1 can be used directly. However, computing\nthe gradient for least-squares activity shaping and activity homogenization is slightly more involved\nand it requires to be careful with the order in which we perform the operations (Refer to Appendix B\nfor details). Equipped with an ef\ufb01cient way to compute of gradients, we solve the corresponding\nconvex optimization problem for each activity shaping problem by applying projected gradient de-\nscent (PGD) [26] with the appropriate gradient1. Algorithm 2 summarizes the key steps.\n6 Experimental Evaluation\nWe evaluate our framework using both simulated and real world held-out data, and show that our\napproach signi\ufb01cantly outperforms several baselines. The appendix contains additional experiments.\nDataset description and network inference. We use data gathered from Twitter as reported in [27],\nwhich comprises of all public tweets posted by 60,000 users during a 8-month period, from January\n2009 to September 2009. For every user, we record the times she uses any of six popular url short-\nening services (refer to Appendix C for details). We evaluate the performance of our framework on\na subset of 2,241 active users, linked by 4,901 edges, which we call 2K dataset, and we evaluate its\nscalability on the overall 60,000 users, linked by \u223c 200,000 edges, which we call 60K dataset. The\n2K dataset accounts for 691,020 url shortened service uses while the 60K dataset accounts for \u223c7.5\nmillion uses. Finally, we treat each service as independent cascades of events.\nIn the experiments, we estimated the nonnegative in\ufb02uence matrix A and the exogenous intensity\n\u03bb(0) using maximum log-likelihood, as in previous work [8, 9, 12]. We used a temporal resolution\nof one minute and selected the bandwidth \u03c9 = 0.1 by cross validation. Loosely speaking, \u03c9 = 0.1\ncorresponds to loosing 70% of the initial in\ufb02uence after 10 minutes, which may be explained by the\nrapid rate at which each user\u2019 news feed gets updated.\nEvaluation schemes. We focus on three tasks: capped activity maximization, minimax activity\nshaping, and least square activity shaping. We set the total budget to C = 0.5, which corresponds\nto supporting a total extra activity equal to 0.5 actions per unit time, and assume all users entail the\nsame cost. In the capped activity maximization, we set the upper limit of each user\u2019s intensity, \u03b1,\nby adding a nonnegative random vector to their inferred initial intensity. In the least-squares activity\nshaping, we set B = I and aim to create three user groups: less-active, moderate, and super-active.\nWe use three different evaluation schemes, with an increasing resemblance to a real world scenario:\nTheoretical objective: We compute the expected overall (theoretical) intensity by applying Theo-\nrem 3 on the optimal exogenous event intensities to each of the three activity shaping tasks, as well\nas the learned A and \u03c9. We then compute and report the value of the objective functions.\n\n1For nondifferential objectives, subgradient algorithms can be used instead.\n\n6\n\n\fy\nt\ni\nv\ni\nt\nc\na\n \n\u2019\ns\nr\ne\ns\nu\n \nf\no\n \nm\nu\ns\n\n \n\ny\nt\ni\nv\ni\nt\nc\na\nm\nu\nm\nn\nm\n\ni\n\ni\n\ne\nc\nn\na\n\nt\ns\nd\n\ni\n\n \n\nn\na\ne\nd\n\ni\nl\n\nc\nu\nE\n\n*\n\n CAM\n*\n\n XMU\n\n WEI\n\n DEG\n\n PRK\n\nMMASH\n\nUNI\n\nMINMU LP\n\nGRD\n\nPROP\n\nLSGRD\nLSASH\n(c) Held-out data\n\nCAM XMU WEI\n\nDEG\n\n \n\nPRK\n\n0.75\n0.7\n0.65\n0.6\n\n \n\n0 1 2 3 4 5 6 7 8 9\n\nlogarithm of time\n\nx 10\u22124\nMMASH\n\nUNI\n\nMINMU\n\nLP\n\n \n\nGRD\n\n6\n\n4\n\n2\n\n \n\n0\n0 1 2 3 4 5 6 7 8 9\n1.8x 10(cid:239)4\n\nlogarithm of time\n\nPROP\n\nLSGRD\n\nLSASH\n\n1.6\n\n1.4\n\n1.2\n\n \n\n0 1 2 3 4 5 6 7 8 9\n\nlogarithm of time\n\n \n\ne\nc\nn\na\n\nt\ns\nd\n\ni\n\n \n\nn\na\ne\nd\n\ni\nl\n\nc\nu\nE\n\n(a) Theoretical objective\n\ny\nt\ni\nv\ni\nt\nc\na\n \n\u2019\ns\nr\ne\ns\nu\n \nf\no\n \nm\nu\ns\n\nCAM XMU WEI\n\nDEG\n\n \n\nPRK\n\n0.75\n\n0.7\n\n0.65\n\n0.6\n\n \n\n0 1 2 3 4 5 6 7 8 9\n\nlogarithm of time\n\nx 10\u22124\nMMASH\n\nUNI\n\nMINMU\n\nLP\n\nGRD\n\n \n\ny\nt\ni\nv\ni\nt\nc\na\nm\nu\nm\nn\nm\n\ni\n\ni\n\n6\n5\n4\n3\n2\n0 1 2 3 4 5 6 7 8 9\n1.8x 10(cid:239)4\n\nlogarithm of time\n\n \n\nPROP\n\nLSGRD\n\nLSASH\n\n \n\n \n\n1.6\n\n1.4\n\n1.2\n\n \n\n0 1 2 3 4 5 6 7 8 9\n\nlogarithm of time\n\n(b) Simulated objective\n\nn\no\n\ni\nt\n\nl\n\na\ne\nr\nr\no\nc\n \nk\nn\na\nr\n\nn\no\n\ni\nt\n\nl\n\na\ne\nr\nr\no\nc\n \nk\nn\na\nr\n\nn\no\n\ni\nt\n\nl\n\na\ne\nr\nr\no\nc\n \nk\nn\na\nr\n\n1\n\n0.5\n\n0\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n0.8\n0.6\n0.4\n0.2\n0\n\nFigure 2: Row 1: Capped activity maximization. Row 2: Minimax activity shaping. Row 3: Least-\nsquares activity shaping. * means statistical signi\ufb01cant at level of 0.01 with paired t-test between\nour method and the second best\nSimulated objective: We simulate 50 cascades with Ogata\u2019s thinning algorithm [28], using the opti-\nmal exogenous event intensities to each of the three activity shaping tasks, and the learned A and \u03c9.\nWe then estimate empirically the overall event intensity based on the simulated cascades, by com-\nputing a running average over non-overlapping time windows, and report the value of the objective\nfunctions based on this estimated overall intensity. Appendix D provides a comparison between the\nsimulated and the theoretical objective.\nHeld-out data: The most interesting evaluation scheme would entail carrying out real interventions\nin a social platform. However, since this is very challenging to do, instead, in this evaluation scheme,\nwe use held-out data to simulate such process, proceeding as follows. We \ufb01rst partition the 8-month\ndata into 50 \ufb01ve-day long contiguous intervals. Then, we use one interval for training and the\nremaining 49 intervals for testing. Suppose interval 1 is used for training, the procedure is as follows:\n1 using the events from interval 1. Then, we \ufb01x A1 and \u03c91,\n\n1. We estimate A1, \u03c91 and \u03bb(0)\n\nand estimate \u03bb(0)\n\ni\n\nfor all other intervals, i = 2, . . . , 49.\n\ni\n\n2. Given A1 and \u03c91, we \ufb01nd the optimal exogenous event intensities, \u03bb(0)\n\nopt, for each of the\nthree activity shaping task, by solving the associated convex program. We then sort the\nestimated \u03bb(0)\nopt, using the Euclidean\ndistance \u2019\u03bb(0)\n3. We estimate the overall event intensity for each of the 49 intervals (i = 2, . . . , 49), as in the\n\u201csimulated objective\u201d evaluation scheme, and sort these intervals according to the value of\ntheir corresponding objective function.\n\n(i = 2, . . . , 49) according to their similarity to \u03bb(0)\n\nopt \u2212 \u03bb(0)\n\n4. Last, we compute and report the rank correlation score between the two orderings obtained\n\ni \u20192.\n\nin step 2 and 3.2 The larger the rank correlation, the better the method.\n\nWe repeat this procedure 50 times, choosing each different interval for training once, and compute\nand report the average rank correlations. More details can be found in the appendix.\n\n2rank correlation = number of pairs with consistent ordering / total number of pairs.\n\n7\n\n\fCapped activity maximization (CAM). We compare to a number of alternatives. XMU: heuristic\nbased on \u00b5(t) without optimization; DEG and WEI: heuristics based on the degree of the user;\nPRANK: heuristic based on page rank (refer to Appendix C for further details). The \ufb01rst row of\nFigure 2 summarizes the results for the three different evaluation schemes. We \ufb01nd that our method\n(CAM) consistently outperforms the alternatives. For the theoretical objective, CAM is 11 % better\nthan the second best, DEG. The difference in overall users\u2019 intensity from DEG is about 0.8 which,\nroughly speaking, leads to at least an increase of about 0.8 \u00d7 60 \u00d7 24 \u00d7 30 = 34, 560 in the overall\nnumber of events in a month. In terms of simulated objective and held-out data, the results are\nsimilar and provide empirical evidence that, compared to other heuristics, degree is an appropriate\nsurrogate for in\ufb02uence, while, based on the poor performance of XMU, it seems that high activity\ndoes not necessarily entail being in\ufb02uential. To elaborate on the interpretability of the real-world\nexperiment on held-out data, consider for example the difference in rank correlation between CAM\nand DEG, which is almost 0.1. Then, roughly speaking, this means that incentivizing users based\non our approach accommodates with the ordering of real activity patterns in 0.1 \u00d7 50\u00d749\n2 = 122.5\nmore pairs of realizations.\nMinimax activity shaping (MMASH). We compare to a number of alternatives. UNI: heuristic\nbased on equal allocation; MINMU: heuristic based on \u00b5(t) without optimization; LP: linear pro-\ngramming based heuristic; GRD: a greedy approach to leverage the activity (see Appendix C for\nmore details). The second row of Figure 2 summarizes the results for the three different evaluation\nschemes. We \ufb01nd that our method (MMASH) consistently outperforms the alternatives. For the the-\noretical objective, it is about 2\u00d7 better than the second best, LP. Importantly, the difference between\nMMASH and LP is not tri\ufb02ing and the least active user carries out 2\u00d710\u22124\u00d760\u00d724\u00d730 = 4.3 more\nactions in average over a month. As one may have expected, GRD and LP are the best among the\nheuristics. The poor performance of MINMU, which is directly related to the objective of MMASH,\nmay be because it assigns the budget to a low active user, regardless of their in\ufb02uence. However,\nour method, by cleverly distributing the budget to the users whom actions trigger many other users\u2019\nactions (like those ones with low activity), it bene\ufb01ts from the budget most. In terms of simulated\nobjective and held-out data, the algorithms\u2019 performance become more similar.\nLeast-squares activity shaping (LSASH). We compare to two alternatives. PROP: Assigning the\nbudget proportionally to the desired activity; LSGRD: greedily allocating budget according the dif-\nference between current and desired activity (refer to Appendix C for more details). The third row of\nFigure 2 summarizes the results for the three different evaluation schemes. We \ufb01nd that our method\n(LSASH) consistently outperforms the alternatives. Perhaps surprisingly, PROP, despite its simplic-\nity, seems to perform slightly better than LSGRD. This is may be due to the way it allocates the\nbudget to users, e.g., it does not aim to strictly ful\ufb01ll users\u2019 target activity but bene\ufb01t more users by\nassigning budget proportionally. Refer to Appendix E for additional experiments.\n\nSparsity and Activity Shaping. In some applications there is a limitation on the number of users we\ncan incentivize. In our proposed framework, we can handle this requirement by including a sparsity\nconstraint on the optimization problem.\nIn order to maintain the convexity of the optimization\nproblem, we consider a l1 regularization term, where a regularization parameter \u03b3 provides the\ntrade-off between sparsity and the activity shaping goal. Refer to Appendix F for more details and\nexperimental results for different values of \u03b3.\n\nScalability. The most computationally demanding part of the proposed algorithm is the evaluation of\nmatrix exponentials, which we scale up by utilizing techniques from matrix algebra, such as GMRES\nand Al-Mohy methods. As a result, we are able to run our methods in a reasonable amount of time\non the 60K dataset, speci\ufb01cally, in comparison with a naive implementation of matrix exponential\nevaluations. Refer to Appendix G for detailed experimental results on scalability.\nAppendix H discusses the limitations of our framework and future work.\n\nAcknowledgement. This project was supported in part by NSF IIS1116886, NSF/NIH BIGDATA\n1R01GM108341, NSF CAREER IIS1350983 and Raytheon Faculty Fellowship to Le Song. Is-\nabel Valera acknowledge the support of Plan Regional-Programas I+D of Comunidad de Madrid\n(AGES-CM S2010/BMD-2422), Ministerio de Ciencia e Innovaci\u00b4on of Spain (project DEIPRO\nTEC2009-14504-C02-00 and program Consolider-Ingenio 2010 CSD2008-00010 COMONSENS).\n\n8\n\n\fReferences\n[1] David Kempe, Jon Kleinberg, and \u00b4Eva Tardos. Maximizing the spread of in\ufb02uence through a social\n\nnetwork. In KDD, pages 137\u2013146. ACM, 2003.\n\n[2] Matthew Richardson and Pedro Domingos. Mining knowledge-sharing sites for viral marketing. In KDD,\n\npages 61\u201370. ACM, 2002.\n\n[3] Wei Chen, Yajun Wang, and Siyu Yang. Ef\ufb01cient in\ufb02uence maximization in social networks. In KDD,\n\npages 199\u2013208. ACM, 2009.\n\n[4] Manuel G. Rodriguez and Bernard Sch\u00a8olkopf.\n\nnetworks. In ICML, 2012.\n\nIn\ufb02uence maximization in continuous time diffusion\n\n[5] Nan Du, Le Song, Manuel Gomez Rodriguez, and Hongyuan Zha. Scalable in\ufb02uence estimation in\n\ncontinuous-time diffusion networks. In NIPS 26, 2013.\n\n[6] Thomas .J. Liniger. Multivariate Hawkes Processes. PhD thesis, SWISS FEDERAL INSTITUTE OF\n\nTECHNOLOGY ZURICH, 2009.\n\n[7] Charles Blundell, Jeff Beck, and Katherine A Heller. Modelling reciprocating relationships with Hawkes\n\nprocesses. In NIPS, 2012.\n\n[8] Ke Zhou, Hongyuan Zha, and Le Song. Learning social infectivity in sparse low-rank networks using\n\nmulti-dimensional Hawkes processes. In AISTATS, 2013.\n\n[9] Ke Zhou, Hongyuan Zha, and Le Song. Learning triggering kernels for multi-dimensional Hawkes pro-\n\ncesses. In ICML, 2013.\n\n[10] Tomoharu Iwata, Amar Shah, and Zoubin Ghahramani. Discovering latent in\ufb02uence in online social\n\nactivities via shared cascade poisson processes. In KDD, pages 266\u2013274. ACM, 2013.\n\n[11] Scott W Linderman and Ryan P Adams. Discovering latent network structure in point process data. arXiv\n\npreprint arXiv:1402.0914, 2014.\n\n[12] Isabel Valera, Manuel Gomez-Rodriguez, and Krishna Gummadi. Modeling adoption of competing prod-\n\nucts and conventions in social media. arXiv preprint arXiv:1406.0516, 2014.\n\n[13] Ian Dobson, Benjamin A Carreras, and David E Newman. A branching process approximation to cas-\ncading load-dependent system failure. In System Sciences, 2004. Proceedings of the 37th Annual Hawaii\nInternational Conference on, pages 10\u2013pp. IEEE, 2004.\n\n[14] Jakob Gulddahl Rasmussen. Bayesian inference for Hawkes processes. Methodology and Computing in\n\nApplied Probability, 15(3):623\u2013642, 2013.\n\n[15] Alejandro Veen and Frederic P Schoenberg. Estimation of space\u2013time branching process models in seis-\n\nmology using an em\u2013type algorithm. JASA, 103(482):614\u2013624, 2008.\n\n[16] Jiancang Zhuang, Yosihiko Ogata, and David Vere-Jones. Stochastic declustering of space-time earth-\n\nquake occurrences. JASA, 97(458):369\u2013380, 2002.\n\n[17] Awad H Al-Mohy and Nicholas J Higham. Computing the action of the matrix exponential, with an\n\napplication to exponential integrators. SIAM journal on scienti\ufb01c computing, 33(2):488\u2013511, 2011.\n\n[18] David Marsan and Olivier Lengline. Extending earthquakes\u2019 reach through cascading.\n\n319(5866):1076\u20131079, 2008.\n\nScience,\n\n[19] Shuang-Hong Yang and Hongyuan Zha. Mixture of mutually exciting processes for viral diffusion. In\n\nICML, pages 1-9, 2013.\n\n[20] Theodore E Harris. The theory of branching processes. Courier Dover Publications, 2002.\n[21] Alan G Hawkes. Spectra of some self-exciting and mutually exciting point processes. Biometrika,\n\n58(1):83\u201390, 1971.\n\n[22] John Frank Charles Kingman. Poisson processes, volume 3. Oxford university press, 1992.\n[23] Manuel Gomez-Rodriguez, Krishna Gummadi, and Bernhard Schoelkopf. Quantifying Information Over-\n\nload in Social Media and its Impact on Social Contagions. In ICWSM, 2014.\n\n[24] Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHU Press, 2012.\n[25] Youcef Saad and Martin H Schultz. Gmres: A generalized minimal residual algorithm for solving non-\n\nsymmetric linear systems. SIAM Journal on scienti\ufb01c and statistical computing, 7(3):856\u2013869, 1986.\n\n[26] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge,\n\nEngland, 2004.\n\n[27] Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and P Krishna Gummadi. Measuring User In\ufb02u-\n\nence in Twitter: The Million Follower Fallacy. In ICWSM, 2010.\n\n[28] Yosihiko Ogata. On lewis\u2019 simulation method for point processes. Information Theory, IEEE Transactions\n\non, 27(1):23\u201331, 1981.\n\n9\n\n\f", "award": [], "sourceid": 1302, "authors": [{"given_name": "Mehrdad", "family_name": "Farajtabar", "institution": "Georgia Institute of Technolog"}, {"given_name": "Nan", "family_name": "Du", "institution": "Georgia Tech"}, {"given_name": "Manuel", "family_name": "Gomez Rodriguez", "institution": "MPI for Software Systems"}, {"given_name": "Isabel", "family_name": "Valera", "institution": "UC3M"}, {"given_name": "Hongyuan", "family_name": "Zha", "institution": "Georgia Tech"}, {"given_name": "Le", "family_name": "Song", "institution": "Georgia Tech"}]}