{"title": "Learning and Forecasting Opinion Dynamics in Social Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 397, "page_last": 405, "abstract": "Social media and social networking sites have become a global pinboard for exposition and discussion of news, topics, and ideas, where social media users often update their opinions about a particular topic by learning from the opinions shared by their friends. In this context, can we learn a data-driven model of opinion dynamics that is able to accurately forecast users' opinions? In this paper, we introduce SLANT, a probabilistic modeling framework of opinion dynamics, which represents users' opinions over time by means of marked jump diffusion stochastic differential equations, and allows for efficient model simulation and parameter estimation from historical fine grained event data. We then leverage our framework to derive a set of efficient predictive formulas for opinion forecasting and identify conditions under which opinions converge to a steady state. Experiments on data gathered from Twitter show that our model provides a good fit to the data and our formulas achieve more accurate forecasting than alternatives.", "full_text": "Learning and Forecasting Opinion Dynamics in\n\nSocial Networks\n\nAbir De\u2217\n\nIsabel Valera\u2020\n\nNiloy Ganguly\u2217\n\nSourangshu Bhattacharya\u2217\n\nManuel Gomez-Rodriguez\u2020\n\nIIT Kharagpur\u2217\n\nMPI for Software Systems\u2020\n\n{abir.de,niloy,sourangshu}@cse.iitkgp.ernet.in\n\n{ivalera,manuelgr}@mpi-sws.org\n\nAbstract\n\nSocial media and social networking sites have become a global pinboard for ex-\nposition and discussion of news, topics, and ideas, where social media users often\nupdate their opinions about a particular topic by learning from the opinions shared\nby their friends. In this context, can we learn a data-driven model of opinion dy-\nnamics that is able to accurately forecast users\u2019 opinions? In this paper, we intro-\nduce SLANT, a probabilistic modeling framework of opinion dynamics, which\nrepresents users\u2019 opinions over time by means of marked jump diffusion stochas-\ntic differential equations, and allows for ef\ufb01cient model simulation and parameter\nestimation from historical \ufb01ne grained event data. We then leverage our frame-\nwork to derive a set of ef\ufb01cient predictive formulas for opinion forecasting and\nidentify conditions under which opinions converge to a steady state. Experiments\non data gathered from Twitter show that our model provides a good \ufb01t to the data\nand our formulas achieve more accurate forecasting than alternatives.\n\nIntroduction\n\n1\nSocial media and social networking sites are increasingly used by people to express their opinions,\ngive their \u201chot takes\u201d, on the latest breaking news, political issues, sports events, and new products.\nAs a consequence, there has been an increasing interest on leveraging social media and social net-\nworking sites to sense and forecast opinions, as well as understand opinion dynamics. For example,\npolitical parties routinely use social media to sense people\u2019s opinion about their political discourse1;\nquantitative investment \ufb01rms measure investor sentiment and trade using social media [18]; and,\ncorporations leverage brand sentiment, estimated from users\u2019 posts, likes and shares in social media\nand social networking sites, to design their marketing campaigns2. In this context, multiple methods\nfor sensing opinions, typically based on sentiment analysis [21], have been proposed in recent years.\nHowever, methods for accurately forecasting opinions are still scarce [7, 8, 19], despite the extensive\nliterature on theoretical models of opinion dynamics [6, 9].\nIn this paper, we develop a novel modeling framework of opinion dynamics in social media and so-\ncial networking sites, SLANT3, which allows for accurate forecasting of individual users\u2019 opinions.\nThe proposed framework is based on two simple intuitive ideas: i) users\u2019 opinions are hidden until\nthey decide to share it with their friends (or neighbors); and, ii) users may update their opinions\nabout a particular topic by learning from the opinions shared by their friends. While the latter is one\nof the main underlying premises used by many well-known theoretical models of opinion dynam-\nics [6, 9, 22], the former has been ignored by models of opinion dynamics, despite its relevance on\nclosely related processes such as information diffusion [12].\n\n1http://www.nytimes.com/2012/10/08/technology/campaigns-use-social-media-to-lure-younger-voters.html\n2http://www.nytimes.com/2012/07/31/technology/facebook-twitter-and-foursquare-as-corporate-focus-groups.html\n3Slant is a particular point of view from which something is seen or presented.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fMore in detail, our proposed model represents users\u2019 latent opinions as continuous-time stochastic\nprocesses driven by a set of marked jump stochastic differential equations (SDEs) [14]. Such con-\nstruction allows each user\u2019s latent opinion to be modulated over time by the opinions asynchronously\nexpressed by her neighbors as sentiment messages. Here, every time a user expresses an opinion by\nposting a sentiment message, she reveals a noisy estimate of her current latent opinion. Then, we\nexploit a key property of our model, the Markov property, to develop:\n\nI. An ef\ufb01cient estimation procedure to \ufb01nd the parameters that maximize the likelihood of a\n\nset of (millions of) sentiment messages via convex programming.\n\nII. A scalable simulation procedure to sample millions of sentiment messages from the pro-\n\nposed model in a matter of minutes.\n\nIII. A set of novel predictive formulas for ef\ufb01cient and accurate opinion forecasting, which\ncan also be used to identify conditions under which opinions converge to a steady state of\nconsensus or polarization.\n\nFinally, we experiment on both synthetic and real data gathered from Twitter and show that our\nmodel provides a good \ufb01t to the data and our predictive formulas achieve more accurate opinion\nforecasting than several alternatives [7, 8, 9, 15, 26].\nRelated work. There is an extensive line of work on theoretical models of opinion dynamics and\nopinion formation [3, 6, 9, 15, 17, 26]. However, previous models typically share the following\nlimitations: (i) they do not distinguish between latent opinion and sentiment (or expressed opin-\nion), which is a noisy observation of the opinion (e.g., thumbs up/down, text sentiment); (ii) they\nconsider users\u2019 opinions to be updated synchronously in discrete time, however, opinions may be\nupdated asynchronously following complex temporal patterns [12]; (iii) the model parameters are\ndif\ufb01cult to learn from real \ufb01ne-grained data and instead are set arbitrarily, as a consequence, they\nprovide inaccurate \ufb01ne-grained predictions; and, (iv) they focus on analyzing only the steady state\nof the users\u2019 opinions, neglecting the transient behavior of real opinion dynamics which allows for\nopinion forecasting methods. More recently, there have been some efforts on designing models that\novercome some of the above limitations and provide more accurate predictions [7, 8]. However,\nthey do not distinguish between opinion and sentiment and still consider opinions to be updated\nsynchronously in discrete time. Our modeling framework addresses the above limitations and, by\ndoing so, achieves more accurate opinion forecasting than alternatives.\n2 Proposed model\nIn this section, we \ufb01rst formulate our model of opinion dynamics, starting from the data it is designed\nfor, and then introduce ef\ufb01cient methods for model parameter estimation and model simulation.\nOpinions data. Given a directed social network G = (V,E), we record each message as e :=\n(u, m, t), where the triplet means that the user u \u2208 V posted a message with sentiment m at time\nt. Given a collection of messages {e1 = (u1, m1, t1), . . . , en = (un, mn, tn)}, the history Hu(t)\ngathers all messages posted by user u up to but not including time t, i.e.,\n(1)\nHu(t) = {ei = (ui, mi, ti)|ui = u and ti < t},\n\nand H(t) := \u222au\u2208VHu(t) denotes the entire history of messages up to but not including time t.\nGenerative process. We represent users\u2019 latent opinions as a multidimensional stochastic process\nx\u2217(t), in which the u-th entry, x\u2217u(t) \u2208 R, represents the opinion of user u at time t and the sign \u2217\nmeans that it may depend on the history H(t). Then, every time a user u posts a message at time t,\nwe draw its sentiment m from a sentiment distribution p(m|x\u2217u(t)). Here, we can also think of the\nsentiment m of each message as samples from a noisy stochastic process mu(t) \u223c p(mu(t)|x\u2217u(t)).\nFurther, we represent the message times by a set of counting processes. In particular, we denote\nthe set of counting processes as a vector N (t), in which the u-th entry, Nu(t) \u2208 {0} \u222a Z+, counts\nthe number of sentiment messages user u posted up to but not including time t. Then, we can\ncharacterize the message rate of the users using their corresponding conditional intensities as\n\nE[dN (t)|H(t)] = \u03bb\u2217(t) dt,\n(2)\nwhere dN (t) := ( dNu(t) )u\u2208V\ndenotes the number of messages per user in the window [t, t + dt)\nand \u03bb\u2217(t) := ( \u03bb\u2217u(t) )u\u2208V denotes the associated user intensities, which may depend on the history\nH(t). We denote the set of user that u follows by N (u). Next, we specify the the intensity functions\n\u03bb\u2217(t), the dynamics of the users\u2019 opinions x\u2217(t), and the sentiment distribution p(m|x\u2217u(t)).\n\n2\n\n\fconstant, i.e., \u03bb\u2217u(t) = \u00b5u.\n\nIntensity for messages. There is a wide variety of message intensity functions one can choose from\nto model the users\u2019 intensity \u03bb\u2217(t) [1]. In this work, we consider two of the most popular functional\nforms used in the growing literature on social activity modeling using point processes [10, 24, 5]:\nI. Poisson process. The intensity is assumed to be independent of the history H(t) and\nII. Multivariate Hawkes processes. The intensity captures a mutual excitation pheno-\nmena between message events and depends on the whole history of message events\n\u222av\u2208{u\u222aN (u)}Hv(t) before t:\n\u03bb\u2217u(t) = \u00b5u + (cid:88)v\u2208u\u222aN (u)\n\n\u03ba(t\u2212 ti) = \u00b5u + (cid:88)v\u2208u\u222aN (u)\n\nbvu (cid:88)ei\u2208Hv(t)\n\nbvu (\u03ba(t) (cid:63) dNv(t)), (3)\n\nwhere the \ufb01rst term, \u00b5u (cid:62) 0, models the publication of messages by user u on her own\ninitiative, and the second term, with bvu (cid:62) 0, models the publication of additional messages\nby user u due to the in\ufb02uence that previous messages posted by the users she follows have\non her intensity. Here, \u03ba(t) = e\u2212\u03bdt is an exponential triggering kernel modeling the decay\nof in\ufb02uence of the past events over time and (cid:63) denotes the convolution operation.\n\nIn both cases, the couple (N (t), \u03bb\u2217(t)) is a Markov process, i.e., future states of the process (condi-\ntional on past and present states) depends only upon the present state, and we can express the users\u2019\nintensity more compactly using the following jump stochastic differential equation (SDE):\n\nd\u03bb\u2217(t) = \u03bd(\u00b5 \u2212 \u03bb\u2217(t))dt + BdN (t),\n\navu (cid:88)ei\u2208Hv(t)\n\nwhere the initial condition is \u03bb\u2217(0) = \u00b5. The Markov property will become important later.\nStochastic process for opinion. The opinion x\u2217u(t) of a user u at time t adopts the following form:\n(4)\n\nx\u2217u(t) = \u03b1u + (cid:88)v\u2208N (u)\nwhere the \ufb01rst term, \u03b1u \u2208 R, models the original opinion a user u starts with, the second term,\nwith avu \u2208 R, models updates in user u\u2019s opinion due to the in\ufb02uence that previous messages with\nopinions mi posted by the users that u follows has on her opinion. Here, g(t) = e\u2212\u03c9t (where\n\u03c9 (cid:62) 0) denotes an exponential triggering kernel, which models the decay of in\ufb02uence over time.\nThe greater the value of \u03c9, the greater the user\u2019s tendency to retain her own opinion \u03b1u. Under this\nform, the resulting opinion dynamics are Markovian and can be compactly represented by a set of\ncoupled marked jumped stochastic differential equations (proven in Appendix A):\n\nmig(t \u2212 ti) = \u03b1u + (cid:88)v\u2208N (u)\n\navu (g(t) (cid:63) (mv(t)dNv(t))),\n\nProposition 1 The tuple (x\u2217(t), \u03bb\u2217(t), N (t)) is a Markov process, whose dynamics are de\ufb01ned by\nthe following marked jumped stochastic differential equations (SDE):\n\n, with mu(t) \u223c p(m|x\u2217u(t)), and the sign (cid:12) denotes pointwise product.\n\ndx\u2217(t) = \u03c9(\u03b1 \u2212 x\u2217(t))dt + A(m(t) (cid:12) dN (t))\nd\u03bb\u2217(t) = \u03bd(\u00b5 \u2212 \u03bb\u2217(t))dt + B dN (t)\n\n(5)\n(6)\nwhere the initial conditions are \u03bb\u2217(0) = \u00b5 and x\u2217(0) = \u03b1, the marks are the sentiment messages\nm(t) = ( mu(t) )u\u2208V\nThe above mentioned Markov property will be the key to the design of ef\ufb01cient model parameter\nestimation and model simulation algorithms.\nSentiment distribution. The particular choice of sentiment distribution p(m|x\u2217u(t)) depends on the\nrecorded marks. For example, one may consider:\nI. Gaussian Distribution The sentiment is assumed to be a real random variable m \u2208 R, i.e.,\np(m|xu(t)) = N (xu(t), \u03c3u). This \ufb01ts well scenarios in which sentiment is extracted from\ntext using sentiment analysis [13].\nII. Logistic. The sentiment is assumed to be a binary random variable m \u2208 {\u22121, 1}, i.e.,\np(m|xu(t)) = 1/(1 + exp(\u2212m \u00b7 xu(t))). This \ufb01ts well scenarios in which sentiment is\nmeasured by means of up votes, down votes or likes.\nOur model estimation method can be easily adapted to any log-concave sentiment distribution. How-\never, in the remainder of the paper, we consider the Gaussian distribution since, in our experiments,\nsentiment is extracted from text using sentiment analysis.\n\n3\n\n\f2.1 Model parameter estimation\nGiven a collection of messages H(T ) = {(ui, mi, ti)} recorded during a time period [0, T ) in\na social network G = (V,E), we can \ufb01nd the optimal parameters \u03b1, \u00b5, A and B by solving a\nmaximum likelihood estimation (MLE) problem4. To do so, it is easy to show that the log-likelihood\nof the messages is given by\nL(\u03b1, \u00b5, A, B) = (cid:88)ei\u2208H(T )\n\nlog \u03bb\u2217ui(ti) \u2212(cid:88)u\u2208V(cid:90) T\n\n\u03bb\u2217u(\u03c4 ) d\u03c4\n\n0\n\n.\n\n(7)\n\n(cid:124)\n\nmessage sentiments\n\n+ (cid:88)ei\u2208H(T )\n(cid:124)\n\nlog p(mi|x\u2217ui(ti))\n(cid:125)\n(cid:123)(cid:122)\nmaximize\n\u03b1,\u00b5\u22650,A,B\u22650 L(\u03b1, \u00b5, A, B).\n\nThen, we can \ufb01nd the optimal parameters (\u03b1, \u00b5, A, B) using MLE as\n\nmessage times\n\n(cid:123)(cid:122)\n\n(cid:125)\n\n(8)\n\nNote that, as long as the sentiment distributions are log-concave, the MLE problem above is con-\ncave and thus can be solved ef\ufb01ciently. Moreover, the problem decomposes in 2|V| independent\nsubproblems, two per user u, since the \ufb01rst term in Eq. 7 only depends on (\u03b1, A) whereas the last\ntwo terms only depend on (\u00b5, B), and thus can be readily parallelized. Then, we \ufb01nd (\u00b5\u2217, B\u2217) us-\ning spectral projected gradient descent [4], which works well in practice and achieves \u03b5 accuracy in\nO(log(1/\u03b5)) iterations, and \ufb01nd (\u03b1\u2217, A\u2217) analytically, since, for Gaussian sentiment distributions,\nthe problem reduces to a least-square problem. Fortunately, in each subproblem, we can use the\nMarkov property from Proposition 1 to precompute the sums and integrals in (8) in linear time, i.e.,\nO(|Hu(T )| + | \u222av\u2208N (u) Hv(T )|). Appendix H summarizes the overall estimation algorithm.\n2.2 Model simulation\nWe leverage the ef\ufb01cient sampling algorithm for multivariate Hawkes introduced by Farajtabar et\nal. [11] to design a scalable algorithm to sample opinions from our model. The two key ideas that\nallow us to adapt the procedure by Farajtabar et al. to our model of opinion dynamics, while keeping\nits ef\ufb01ciency, are as follows: (i) the opinion dynamics, de\ufb01ned by Eqs. 5 and 6, are Markovian and\nthus we can update individual intensities and opinions in O(1) \u2013 let ti and ti+1 be two consecutive\nevents, then, we can compute \u03bb\u2217(ti+1) as (\u03bb\u2217(ti) \u2212 \u00b5) exp(\u2212\u03bd(ti+1 \u2212 ti)) + \u00b5 and x\u2217(ti+1) as\n(x\u2217(ti) \u2212 \u03b1) exp(\u2212\u03c9(ti+1 \u2212 ti)) + \u03b1, respectively; and, (ii) social networks are typically sparse\nand thus both A and B are also sparse, then, whenever a node expresses its opinion, only a small\nnumber of opinions and intensity functions in its local neighborhood will change. As a consequence,\nwe can reuse the majority of samples from the intensity functions and sentiment distributions for the\nnext new sample. Appendix I summarizes the overall simulation algorithm.\n\n3 Opinion forecasting\nOur goal here is developing ef\ufb01cient methods that leverage our model to forecast a user u\u2019s\nIn the context of our proba-\nopinion xu(t) at time t given the history H(t0) up to time t0 0, where \u03b3, c > 0,\nthen,\n\nE\nHt\\Ht0\n\nlim\nt\u2192\u221e\n\n[x\u2217(t)|Ht0 ] =(cid:18)I \u2212\n\nA\u039b2\n\nw (cid:19)\u22121\n\n\u03b1,\n\n(12)\n\nwhere\n\nE\nHt\\Ht0\n\nwhere \u039b2 := diag(cid:104)(cid:0)I \u2212 B\n\n\u03bd(cid:1)\u22121 \u00b5(cid:105)\n\nThe above results indicate that the conditional average opinions are nonlinearly related to the pa-\nrameter matrices A and B. This suggests that the effect of the temporal in\ufb02uence on the opinion\nevolution, by means of the parameter matrix B of the multivariate Hawkes process, is non trivial.\nWe illustrate this result empirically in Figure 1.\n\n5\n\n\fNetwork G1\n\n(cid:80)\n\nP:\n\nu\u2208V\u00b1 E[xu (t)]\n\n|V \u00b1|\n\n(cid:80)\nu\u2208V\u00b1 E[xu(t)]\n\n|V \u00b1|\n\nH:\n\nP: Temporal evolution\n\nH: Temporal evolution\n\nNetwork G2\n\n(cid:80)\n\nP:\n\nu\u2208V E[xu (t)]\n\n|V |\n\n(cid:80)\nu\u2208V E[xu (t)]\n\n|V |\n\nH:\n\nP: Temporal evolution\n\nH: Temporal evolution\n\nFigure 1: Opinion dynamics on two 50-node networks G1 (top) and G2 (bottom) for Poisson (P)\nand Hawkes (H) message intensities. The \ufb01rst column visualizes the two networks and opinion of\neach node at t = 0 (positive/negative opinions in red/blue). The second column shows the temporal\nevolution of the theoretical and empirical average opinion for Poisson intensities. The third column\nshows the temporal evolution of the empirical average opinion for Hawkes intensities, where we\ncompute the average separately for positive (+) and negative (\u2212) opinions in the steady state. The\nfourth and \ufb01fth columns shows the polarity of average opinion per user over time.\n\n3.2 Simulation based forecasting\nGiven the ef\ufb01cient simulation procedure described in Section 2.2, we can readily derive a general\nsimulation based formula for opinion forecasting:\n\nE\nHt\\Ht0\n\n[x\u2217(t)|Ht0] \u2248 \u02c6x\u2217(t) =\n\n1\nn\n\nn(cid:88)l=1\n\nx\u2217l (t),\n\n(13)\n\nwhere n is the number of times that we simulate the opinion dynamics and x\u2217l (t) gathers the users\u2019\nopinion at time t for the l-th simulation. Moreover, we have the following theoretical guarantee\n(proven in Appendix G):\nTheorem 6 Simulate the opinion dynamics up to time t > t0 the following number of times:\n\n1\n3\u00012 (6\u03c32\n\n(14)\n\nHt\\Ht0\n\nHt\\Ht0\n\nmax + 4xmax\u0001) log(2/\u03b4),\n\nmax = maxu\u2208G \u03c32\n\n[x\u2217u(t)|Ht0 ]| \u2264 \u0001 with probability at least 1 \u2212 \u03b4.\n\nn \u2265\nwhere \u03c32\n(x\u2217u(t)|Ht0) is the maximum variance of the users\u2019 opinions, which\nwe analyze in Appendix G, and xmax \u2265 |xu(t)|,\u2200u \u2208 G is an upper bound on the users\u2019 (absolute)\nopinions. Then, for each user u \u2208 G, the error between her true and estimated average opinion\nsatis\ufb01es that |\u02c6x\u2217u(t) \u2212 E\n4 Experiments\n4.1 Experiments on synthetic data\nWe \ufb01rst provide empirical evidence that our model is able to produce different types of opinion\ndynamics, which may or may not converge to a steady state of consensus or polarization. Then, we\nshow that our model estimation and simulation algorithms as well as our predictive formulas scale\nto networks with millions of users and events. Appendix J contains an evaluation of the accuracy of\nour model parameter estimation method.\nDifferent types of opinion dynamics. We \ufb01rst simulate our model on two different small networks\nusing Poisson intensities, i.e., \u03bb\u2217u(t) = \u00b5u, \u00b5u \u223c U (0, 1) \u2200u, and then simulate our model on the\nsame networks while using Hawkes intensities with bvu \u223c U (0, 1) on 5% of the nodes, chosen at\nrandom, and the original Poisson intensities on the remaining nodes. Figure 1 summarizes the re-\nsults, which show that (i) our model is able to produce opinion dynamics that converge to consensus\n(second column) and polarization (third column); (ii) the opinion forecasting formulas described in\nSection 3 closely match an simulation based estimation (second column); and, (iii) the evolution of\n\n6\n\n TimeOpinion-Trajectory\u2192ExperimentalTheoretical0.0050.010.0150.51.5001 TimeOpinion-Trajectory\u2192Hawkes(-)Hawkes(+)0.0050.010.015-4-224000.010.0050.015001020304050TimeNode-ID0.0050.010.015Node-IDTime102030405000 TimeOpinion-Trajectory\u2192ExperimentalTheoretical0.0050.010.015-1.5-0.5-2-100 TimeOpinion-Trajectory\u2192Hawkes(-)Hawkes(+)0.0050.010.015-20-1010203040000.010.0050.015001020304050TimeNode-ID0.0050.010.015Node-IDTime102030405000\f(c) Forecast vs # nodes\n\n(a) Estimation vs # nodes (b) Simulation vs # nodes\nFigure 2: Panels (a) and (b) show running time of our estimation and simulation procedures against\nnumber of nodes, where the average number of events per node is 10. Panels (c) and (d) show the\nrunning time needed to compute our analytical formulas against number of nodes and time horizon\nT = t\u2212 t0, where the number of nodes is 103. In Panel (c), T = 6 hours. For all panels, the average\ndegree per node is 30. The experiments are carried out in a single machine with 24 cores and 64 GB\nof main memory.\n\n(d) Forecast vs T\n\nthe average opinion and whether opinions converge to a steady state of consensus or polarization\ndepend on the functional form of message intensity5.\nScalability. Figure 2 shows that our model estimation and simulation algorithms, described in Sec-\ntions 2.1 and 2.2, and our analytical predictive formulas, described in Section 3.1, scale to networks\nwith millions of users and events. For example, our algorithm takes 20 minutes to estimate the\nmodel parameters from 10 million events generated by one million nodes using a single machine\nwith 24 cores and 64 GB RAM.\n4.2 Experiments on real data\nWe use real data gathered from Twitter to show that our model can forecast users\u2019 opinions more\naccurately than six state of the art methods [7, 8, 9, 15, 19, 26] (see Appendix L).\nExperimental Setup. We experimented with \ufb01ve Twitter datasets about current real-world events\n(Politics, Movie, Fight, Bollywood and US), in which, for each recorded message i, we compute its\nsentiment value mi using a popular sentiment analysis toolbox, specially designed for Twitter [13].\nHere, the sentiment takes values m \u2208 (\u22121, 1) and we consider the sentiment polarity to be simply\nsign(m). Appendix K contains further details and statistics about these datasets.\nOpinion forecasting. We \ufb01rst evaluate the performance of our model at predicting sentiment (ex-\npressed opinion) at a message level. To do so, for each dataset, we \ufb01rst estimate the parameters of\nour model, SLANT, using messages from a training set containing the (chronologically) \ufb01rst 90%\nof the messages. Here, we set the decay parameters of the exponential triggering kernels \u03ba(t) and\ng(t) by cross-validation. Then, we evaluate the predictive performance of our opinion forecasting\nformulas using the last 10% of the messages6. More speci\ufb01cally, we predict the sentiment value m\nfor each message posted by user u in the test set given the history up to T hours before the time\nof the message as \u02c6m = EHt\\Ht\u2212T [x\u2217u(t)|Ht\u2212T ]. We compare the performance of our model with\nthe asynchronous linear model (AsLM) [8], DeGroot\u2019s model [9], the voter model [26], the biased\nvoter model [7], the \ufb02ocking model [15], and the sentiment prediction method based on collabora-\ntive \ufb01ltering by Kim et al. [19], in terms of: (i) the mean squared error between the true (m) and the\nestimated ( \u02c6m) sentiment value for all messages in the held-out set, i.e., E[(m \u2212 \u02c6m)2], and (ii) the\nfailure rate, de\ufb01ned as the probability that the true and the estimated polarity do not coincide, i.e.,\nP(sign(m) (cid:54)= sign( \u02c6m)). For the baselines algorithms, which work in discrete time, we simulate NT\nrounds in (t \u2212 T, t), where NT is the number of posts in time T . Figure 3 summarizes the results,\nwhich show that: (i) our opinion forecasting formulas consistently outperform others both in terms\nof MSE (often by an order of magnitude) and failure rate;7 (ii) its forecasting performance degrades\ngracefully with respect to T , in contrast, competing methods often fail catastrophically; and, (iii) it\nachieves an additional mileage by using Hawkes processes instead of Poisson processes. To some\nextent, we believe SLANT\u2019s superior performance is due to its ability to leverage historical data to\nlearn its model parameters and then simulate realistic temporal patterns.\nFinally, we look at the forecasting results at a network level and show that our forecasting formulas\ncan also predict the evolution of opinions macroscopically (in terms of the average opinion across\nusers). Figure 4 summarizes the results for two real world datasets, which show that the forecasted\n\n5For these particular networks, Poisson intensities lead to consensus while Hawkes intensities lead to polarization, however, we did \ufb01nd\n\nother examples in which Poisson intensities lead to polarization and Hawkes intensities lead to consensus.\n\n6Here, we do not distinguish between analytical and sampling based forecasting since, in practice, they closely match each other.\n7The failure rate is very close to zero for those datasets in which most users post messages with the same polarity.\n\n7\n\n InformationalTemporalNodesTime(s)10110210310410510610\u2212210\u22121100101102103104105NodesTime(s)10110210310410510610\u2212210\u22121101101102103104105 PoissonHawkesNodesTime(s)10210310410510610710\u2212210\u22121100101102103104105 024568101015PoissonHawkesForecast-Time[T(hr)]Time(s)\f(a) Politics\n\n(b) Movie\n\n(c) Fight\n\n(d) Bollywood\n\n(e) US\n\nFigure 3: Sentiment prediction performance using a 10% held-out set for each real-world dataset.\nPerformance is measured in terms of mean squared error (MSE) on the sentiment value, E[(m \u2212\n\u02c6m)2], and failure rate on the sentiment polarity, P(sign(m) (cid:54)= sign( \u02c6m)). For each message in the\nheld-out set, we predict the sentiment value m given the history up to T hours before the time of\nthe message, for different values of T . Nowcasting corresponds to T = 0 and forecasting to T > 0.\nThe sentiment value m \u2208 (\u22121, 1) and the sentiment polarity sign (m) \u2208 {\u22121, 1}.\n\n(a) Tw: Movie (Hawkes)\n\n(b) Tw: Movie (Poisson)\n\n(c) Tw: US (Hawkes)\n\n(d) Tw: US (Poisson)\n\nFigure 4: Macroscopic sentiment prediction given by our model for two real-world datasets. The\npanels show the observed sentiment \u00afm(t) (in blue, running average), inferred opinion \u00afx(t) on the\ntraining set (in red), and forecasted opinion E\nHt\\Ht\u2212T [xu(t)|Ht\u2212T ] for T = 1, 3, and 5 hours on\nthe test set (in black, green and gray, respectively), where the symbol \u00af denotes average across users.\nopinions become less accurate as the time T becomes larger, since the average is computed on longer\ntime periods. As expected, our model is more accurate when the message intensities are modeled\nusing multivariate Hawkes. We found qualitatively similar results for the remaining datasets.\n5 Conclusions\nWe proposed a modeling framework of opinion dynamics, whose key innovation is modeling users\u2019\nlatent opinions as continuous-time stochastic processes driven by a set of marked jump stochastic\ndifferential equations (SDEs) [14]. Such construction allows each user\u2019s latent opinion to be modu-\nlated over time by the opinions asynchronously expressed by her neighbors as sentiment messages.\nWe then exploited a key property of our model, the Markov property, to design ef\ufb01cient parameter\nestimation and simulation algorithms, which scale to networks with millions of nodes. Moreover, we\nderived a set of novel predictive formulas for ef\ufb01cient and accurate opinion forecasting and identi\ufb01ed\nconditions under which opinions converge to a steady state of consensus or polarization. Finally, we\nexperimented with real data gathered from Twitter and showed that our framework achieves more\naccurate opinion forecasting than state-of-the-arts.\nOur model opens up many interesting venues for future work. For example, in Eq. 4, our model\nassumes a linear dependence between users\u2019 opinions, however, in some scenarios, this may be a\ncoarse approximation. A natural follow-up to improve the opinion forecasting accuracy would be\nconsidering nonlinear dependences between opinions. It would be interesting to augment our model\nto jointly consider correlations between different topics. One could leverage our modeling frame-\nwork to design opinion shaping algorithms based on stochastic optimal control [14, 25]. Finally, one\nof the key modeling ideas is realizing that users\u2019 expressed opinions (be it in the form of thumbs\nup/down or text sentiment) can be viewed as noisy discrete samples of the users\u2019 latent opinion lo-\ncalized in time. It would be very interesting to generalize this idea to any type of event data and\nderive sampling theorems and conditions under which an underlying general continuous signal of\ninterest (be it user\u2019s opinion or expertise) can be recovered from event data with provable guaran-\ntees.\nAcknowledgement: Abir De is partially supported by Google India under the Google India PhD Fellowship\nAward, and Isabel Valera is supported by a Humboldt post-doctoral fellowship.\n\n8\n\n Collab-FilterFlockingBiasedVoterLinearVoterDeGrootSLANT(P)SLANT(H)0246810T,hoursMSE10\u2212210\u221211001010246810T,hours0246810T,hours0246810T,hours0246810T,hours0012468100.20.40.60.8T,hoursFailure-Rate0246810T,hours0246810T,hours0246810T,hours0246810T,hours 0.40.50.60.70.8\u00afm(t)\u00afx(t)T=1hT=3hT=5hTimeAverageOpinion\u219228April2May5May 0.40.50.60.70.8\u00afm(t)\u00afx(t)T=1hT=3hT=5hTimeAverageOpinion\u219228April2May5May -0.4-0.20.20.40.60.8\u00afm(t)\u00afx(t)T=1hT=3hT=5hTimeAverageOpinion\u21927April10April13April0 -0.4-0.20.20.40.60.8\u00afm(t)\u00afx(t)T=1hT=3hT=5hTimeAverageOpinion\u21927April10April13April0\fReferences\n[1] O. Aalen, \u00d8. Borgan, and H. Gjessing. Survival and event history analysis: a process point of view.\n\nSpringer Verlag, 2008.\n\n[2] A. H. Al-Mohy and N. J. Higham. Computing the action of the matrix exponential, with an application\n\nto exponential integrators. SIAM journal on scienti\ufb01c computing, 33(2):488\u2013511, 2011.\n\n[3] R. Axelrod. The dissemination of culture a model with local convergence and global polarization. Journal\n\nof con\ufb02ict resolution, 41(2):203\u2013226, 1997.\n\n[4] E. G. Birgin, J. M. Mart\u00b4\u0131nez, and M. Raydan. Nonmonotone spectral projected gradient methods on\n\nconvex sets. SIAM Journal on Optimization, 10(4), 2000.\n\n[5] C. Blundell, J. Beck, and K. A. Heller. Modelling reciprocating relationships with hawkes processes. In\n\nAdvances in Neural Information Processing Systems, pages 2600\u20132608, 2012.\n\n[6] P. Clifford and A. Sudbury. A model for spatial con\ufb02ict. Biometrika, 60(3):581\u2013588, 1973.\n\n[7] A. Das, S. Gollapudi, and K. Munagala. Modeling opinion dynamics in social networks. In WSDM, 2014.\n\n[8] A. De, S. Bhattacharya, P. Bhattacharya, N. Ganguly, and S. Chakrabarti. Learning a linear in\ufb02uence\n\nmodel from transient opinion dynamics. In CIKM, 2014.\n\n[9] M. H. DeGroot. Reaching a consensus. Journal of the American Statistical Association, 69(345), 1974.\n\n[10] M. Farajtabar, N. Du, M. Gomez-Rodriguez, I. Valera, L. Song, and H. Zha. Shaping social activity by\n\nincentivizing users. In NIPS, 2014.\n\n[11] M. Farajtabar, Y. Wang, M. Gomez-Rodriguez, S. Li, H. Zha, and L. Song. Coevolve: A joint point\n\nprocess model for information diffusion and network co-evolution. In NIPS, 2015.\n\n[12] M. Gomez-Rodriguez, D. Balduzzi, and B. Sch\u00a8olkopf. Uncovering the Temporal Dynamics of Diffusion\n\nNetworks. In ICML, 2011.\n\n[13] A. Hannak, E. Anderson, L. F. Barrett, S. Lehmann, A. Mislove, and M. Riedewald. Tweetin\u2019in the rain:\n\nExploring societal-scale effects of weather on mood. In ICWSM, 2012.\n\n[14] F. B. Hanson. Applied Stochastic Processes and Control for Jump-Diffusions. SIAM, 2007.\n\n[15] R. Hegselmann and U. Krause. Opinion dynamics and bounded con\ufb01dence models, analysis, and simu-\n\nlation. Journal of Arti\ufb01cial Societies and Social Simulation, 5(3), 2002.\n\n[16] D. Hinrichsen, A. Ilchmann, and A. Pritchard. Robustness of stability of time-varying linear systems.\n\nJournal of Differential Equations, 82(2):219 \u2013 250, 1989.\n\n[17] P. Holme and M. E. Newman. Nonequilibrium phase transition in the coevolution of networks and opin-\n\nions. Physical Review E, 74(5):056108, 2006.\n\n[18] T. Karppi and K. Crawford. Social media, \ufb01nancial algorithms and the hack crash. TC&S, 2015.\n\n[19] J. Kim, J.-B. Yoo, H. Lim, H. Qiu, Z. Kozareva, and A. Galstyan. Sentiment prediction using collaborative\n\n\ufb01ltering. In ICWSM, 2013.\n\n[20] J. Leskovec, D. Chakrabarti, J. M. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker graphs: An\n\napproach to modeling networks. JMLR, 2010.\n\n[21] B. Pang and L. Lee. Opinion mining and sentiment analysis. F&T in information retrieval, 2(1-2), 2008.\n\n[22] B. H. Raven. The bases of power: Origins and recent developments. Journal of social issues, 49(4), 1993.\n\n[23] Y. Saad and M. H. Schultz. Gmres: A generalized minimal residual algorithm for solving nonsymmetric\n\nlinear systems. SIAM Journal on scienti\ufb01c and statistical computing, 7(3):856\u2013869, 1986.\n\n[24] I. Valera and M. Gomez-Rodriguez. Modeling adoption and usage of competing products. In Proceedings\n\nof the 2015 IEEE International Conference on Data Mining, 2015.\n\n[25] Y. Wang, E. Theodorou, A. Verma, and L. Song. Steering opinion dynamics in information diffusion\n\nnetworks. arXiv preprint arXiv:1603.09021, 2016.\n\n[26] M. E. Yildiz, R. Pagliari, A. Ozdaglar, and A. Scaglione. Voting models in random networks. In Infor-\n\nmation Theory and Applications Workshop, pages 1\u20137, 2010.\n\n9\n\n\f", "award": [], "sourceid": 241, "authors": [{"given_name": "Abir", "family_name": "De", "institution": "IIT Kharagpur"}, {"given_name": "Isabel", "family_name": "Valera", "institution": "Max Planck Institute for Software Systems (MPI-SWS)"}, {"given_name": "Niloy", "family_name": "Ganguly", "institution": "IIT Kharagpur"}, {"given_name": "Sourangshu", "family_name": "Bhattacharya", "institution": "IIT Kharagpur"}, {"given_name": "Manuel", "family_name": "Gomez Rodriguez", "institution": "MPI-SWS"}]}