{"title": "Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback", "book": "Advances in Neural Information Processing Systems", "page_first": 3022, "page_last": 3032, "abstract": "We study the online influence maximization problem in social networks under the independent cascade model. Specifically, we aim to learn the set of \"best influencers\" in a social network online while repeatedly interacting with it. We address the challenges of (i) combinatorial action space, since the number of feasible influencer sets grows exponentially with the maximum number of influencers, and (ii) limited feedback, since only the influenced portion of the network is observed. Under a stochastic semi-bandit feedback, we propose and analyze IMLinUCB, a computationally efficient UCB-based algorithm. Our bounds on the cumulative regret are polynomial in all quantities of interest, achieve near-optimal dependence on the number of interactions and reflect the topology of the network and the activation probabilities of its edges, thereby giving insights on the problem complexity. To the best of our knowledge, these are the first such results. Our experiments show that in several representative graph topologies, the regret of IMLinUCB scales as suggested by our upper bounds. IMLinUCB permits linear generalization and thus is both statistically and computationally suitable for large-scale problems. Our experiments also show that IMLinUCB with linear generalization can lead to low regret in real-world online influence maximization.", "full_text": "Online In\ufb02uence Maximization under Independent\n\nCascade Model with Semi-Bandit Feedback\n\nZheng Wen\n\nAdobe Research\nzwen@adobe.com\n\nBranislav Kveton\nAdobe Research\n\nkveton@adobe.com\n\nMichal Valko\n\nSequeL team, INRIA Lille - Nord Europe\n\nmichal.valko@inria.fr\n\nSharan Vaswani\n\nUniversity of British Columbia\n\nsharanv@cs.ubc.ca\n\nAbstract\n\nWe study the online in\ufb02uence maximization problem in social networks under\nthe independent cascade model. Speci\ufb01cally, we aim to learn the set of \u201cbest\nin\ufb02uencers\u201d in a social network online while repeatedly interacting with it. We ad-\ndress the challenges of (i) combinatorial action space, since the number of feasible\nin\ufb02uencer sets grows exponentially with the maximum number of in\ufb02uencers, and\n(ii) limited feedback, since only the in\ufb02uenced portion of the network is observed.\nUnder a stochastic semi-bandit feedback, we propose and analyze IMLinUCB, a\ncomputationally ef\ufb01cient UCB-based algorithm. Our bounds on the cumulative\nregret are polynomial in all quantities of interest, achieve near-optimal dependence\non the number of interactions and re\ufb02ect the topology of the network and the acti-\nvation probabilities of its edges, thereby giving insights on the problem complexity.\nTo the best of our knowledge, these are the \ufb01rst such results. Our experiments show\nthat in several representative graph topologies, the regret of IMLinUCB scales as\nsuggested by our upper bounds. IMLinUCB permits linear generalization and thus\nis both statistically and computationally suitable for large-scale problems. Our\nexperiments also show that IMLinUCB with linear generalization can lead to low\nregret in real-world online in\ufb02uence maximization.\n\n1\n\nIntroduction\n\nSocial networks are increasingly important as media for spreading information, ideas, and in\ufb02u-\nence. Computational advertising studies models of information propagation or diffusion in such\nnetworks [16, 6, 10]. Viral marketing aims to use this information propagation to spread awareness\nabout a speci\ufb01c product. More precisely, agents (marketers) aim to select a \ufb01xed number of in\ufb02u-\nencers (called seeds or source nodes) and provide them with free products or discounts. They expect\nthat these users will in\ufb02uence their neighbours and, transitively, other users in the social network to\nadopt the product. This will thus result in information propagating across the network as more users\nadopt or become aware of the product. The marketer has a budget on the number of free products and\nmust choose seeds in order to maximize the in\ufb02uence spread, which is the expected number of users\nthat become aware of the product. This problem is referred to as in\ufb02uence maximization (IM) [16].\nFor IM, the social network is modeled as a directed graph with the nodes representing users, and\nthe edges representing relations (e.g., friendships on Facebook, following on Twitter) between them.\nEach directed edge (i, j) is associated with an activation probability w(i, j) that models the strength\nof in\ufb02uence that user i has on user j. We say a node j is a downstream neighbor of node i if\nthere is a directed edge (i, j) from i to j. The IM problem has been studied under a number of\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fdiffusion models [16, 13, 23]. The best known and studied are the models in [16], and in particular\nthe independent cascade (IC) model. In this work, we assume that the diffusion follows the IC model\nand describe it next.\nAfter the agent chooses a set of source nodes S, the independent cascade model de\ufb01nes a diffusion\n(in\ufb02uence) process: At the beginning, all nodes in S are activated (in\ufb02uenced); subsequently, every\nactivated node i can activate its downstream neighbor j with probability w(i, j) once, independently\nof the history of the process. This process runs until no activations are possible. In the IM problem, the\ngoal of the agent is to maximize the expected number of the in\ufb02uenced nodes subject to a cardinality\nconstraint on S. Finding the best set S is an NP-hard problem, but under common diffusion models\nincluding IC, it can be ef\ufb01ciently approximated to within a factor of 1 \u2212 1/e [16].\nIn many social networks, however, the activation probabilities are unknown. One possibility is to\nlearn these from past propagation data [25, 14, 24]. However in practice, such data are hard to\nobtain and the large number of parameters makes this learning challenging. This motivates the\nlearning framework of IM bandits [31, 28, 29], where the agent needs to learn to choose a good set\nof source nodes while repeatedly interacting with the network. Depending on the feedback to the\nagent, the IM bandits can have (1) full-bandit feedback, where only the number of in\ufb02uenced nodes is\nobserved; (2) node semi-bandit feedback, where the identity of in\ufb02uenced nodes is observed; or (3)\nedge semi-bandit feedback, where the identity of in\ufb02uenced edges (edges going out from in\ufb02uenced\nnodes) is observed. In this paper, we give results for the edge semi-bandit feedback model, where we\nobserve for each in\ufb02uenced node, the downstream neighbors that this node in\ufb02uences. Such feedback\nis feasible to obtain in most online social networks. These networks track activities of users, for\ninstance, when a user retweets a tweet of another user. They can thus trace the propagation (of the\ntweet) through the network, thereby obtaining edge semi-bandit feedback.\nThe IM bandits problem combines two main challenges. First, the number of actions (possible\nsets) S grows exponentially with the cardinality constraint on S. Second, the agent can only observe\nthe in\ufb02uenced portion of the network as feedback. Although IM bandits have been studied in the\npast [21, 8, 31, 5, 29] (see Section 6 for an overview and comparison), there are a number of open\nchallenges [28]. One challenge is to identify reasonable complexity metrics that depend on both\nthe topology and activation probabilities of the network and characterize the information-theoretic\ncomplexity of the IM bandits problem. Another challenge is to develop learning algorithms such that\n(i) their performance scales gracefully with these metrics and (ii) are computationally ef\ufb01cient and\ncan be applied to large social networks with millions of users.\nIn this paper, we address these two challenges under the IC model with access to edge semi-bandit\nfeedback. We refer to our model as an independent cascade semi-bandit (ICSB). We make four\nmain contributions. First, we propose IMLinUCB, a UCB-like algorithm for ICSBs that permits linear\ngeneralization and is suitable for large-scale problems. Second, we de\ufb01ne a new complexity metric,\nreferred to as maximum observed relevance for ICSB, which depends on the topology of the network\nand is a non-decreasing function of activation probabilities. The maximum observed relevance C\u2217\ncan also be upper bounded based on the network topology or the size of the network in the worst case.\nHowever, in real-world social networks, due to the relatively low activation probabilities [14], C\u2217\nattains much smaller values as compared to the worst case upper bounds. Third, we bound the\ncumulative regret of IMLinUCB. Our regret bounds are polynomial in all quantities of interest and\nhave near-optimal dependence on the number of interactions. They re\ufb02ect the structure and activation\nprobabilities of the network through C\u2217 and do not depend on inherently large quantities, such as\nthe reciprocal of the minimum probability of being in\ufb02uenced (unlike [8]) and the cardinality of\nthe action set. Finally, we evaluate IMLinUCB on several problems. Our empirical results on simple\nrepresentative topologies show that the regret of IMLinUCB scales as suggested by our topology-\ndependent regret bounds. We also show that IMLinUCB with linear generalization can lead to low\nregret in real-world online in\ufb02uence maximization problems.\n\n2\n\nIn\ufb02uence Maximization under Independence Cascade Model\n\nIn this section, we de\ufb01ne notation and give the formal problem statement for the IM problem under\nthe IC model. Consider a directed graph G = (V,E) with a set V = {1, 2, . . . , L} of L = |V| nodes,\na set E = {1, 2, . . . ,|E|} of directed edges, and an arbitrary binary weight function w : E \u2192 {0, 1}.\n\n2\n\n\fWe say that a node v2 \u2208 V is reachable from a node v1 \u2208 V under w if there is a directed path1\np = (e1, e2, . . . , el) from v1 to v2 in G satisfying w(ei) = 1 for all i = 1, 2, . . . , l, where ei is the\ni-th edge in p. For a given source node set S \u2286 V and w, we say that node v \u2208 V is in\ufb02uenced if v is\nreachable from at least one source node in S under w; and denote the number of in\ufb02uenced nodes in\nG by f (S, w). By de\ufb01nition, the nodes in S are always in\ufb02uenced.\nThe in\ufb02uence maximization (IM) problem is characterized by a triple (G, K, w), where G is a given\ndirected graph, K \u2264 L is the cardinality of source nodes, and w : E \u2192 [0, 1] is a probability weight\nfunction mapping each edge e \u2208 E to a real number w(e) \u2208 [0, 1]. The agent needs to choose a set\nof K source nodes S \u2286 V based on (G, K, w). Then a random binary weight function w, which\nencodes the diffusion process under the IC model, is obtained by independently sampling a Bernoulli\nrandom variable w(e) \u223c Bern (w(e)) for each edge e \u2208 E. The agent\u2019s objective is to maximize the\nexpected number of the in\ufb02uenced nodes: maxS: |S|=K f (S, w), where f (S, w) \u2206= Ew [f (S, w)] is\nthe expected number of in\ufb02uenced nodes when the source node set is S and w is sampled according\nto w.2\nIt is well-known that the (of\ufb02ine) IM problem is NP-hard [16], but can be approximately solved\nby approximation/randomized algorithms [6] under the IC model. In this paper, we refer to such\nalgorithms as oracles to distinguish them from the machine learning algorithms discussed in following\nsections. Let S opt be the optimal solution of this problem, and S\u2217 = ORACLE(G, K, w) be the\n(possibly random) solution of an oracle ORACLE. For any \u03b1, \u03b3 \u2208 [0, 1], we say that ORACLE is\nan (\u03b1, \u03b3)-approximation oracle for a given (G, K) if for any w, f (S\u2217, w) \u2265 \u03b3f (S opt, w) with\nprobability at least \u03b1. Notice that this further implies that E [f (S\u2217, w)] \u2265 \u03b1\u03b3f (S opt, w). We say an\noracle is exact if \u03b1 = \u03b3 = 1.\n\n3\n\nIn\ufb02uence Maximization Semi-Bandit\n\nIn this section, we \ufb01rst describe the IM semi-bandit problem. Next, we state the linear generalization\nassumption and describe IMLinUCB, our UCB-based semi-bandit algorithm.\n\n3.1 Protocol\nThe independent cascade semi-bandit (ICSB) problem is also characterized by a triple (G, K, w), but\nw is unknown to the agent. The agent interacts with the independent cascade semi-bandit for n rounds.\nAt each round t = 1, 2, . . . , n, the agent \ufb01rst chooses a source node set St \u2286 V with cardinality K\nbased on its prior information and past observations. In\ufb02uence then diffuses from the nodes in St\naccording to the IC model. Similarly to the previous section, this can be interpreted as the environment\ngenerating a binary weight function wt by independently sampling wt(e) \u223c Bern (w(e)) for each\ne \u2208 E. At round t, the agent receives the reward f (St, wt), that is equal to the number of nodes\nin\ufb02uenced at that round. The agent also receives edge semi-bandit feedback from the diffusion\nprocess. Speci\ufb01cally, for any edge e = (u1, u2) \u2208 E, the agent observes the realization of wt(e) if\nand only if the start node u1 of the directed edge e is in\ufb02uenced in the realization wt. The agent\u2019s\nobjective is to maximize the expected cumulative reward over the n steps.\n\n3.2 Linear generalization\n\nSince the number of edges in real-world social networks tends to be in millions or even billions, we\nneed to exploit some generalization model across activation probabilities to develop ef\ufb01cient and\ndeployable learning algorithms. In particular, we assume that there exists a linear-generalization\nmodel for the probability weight function w. That is, each edge e \u2208 E is associated with a known\nfeature vector xe \u2208 (cid:60)d (here d is the dimension of the feature vector) and that there is an unknown\ncoef\ufb01cient vector \u03b8\u2217 \u2208 (cid:60)d such that for all e \u2208 E, w(e) is \u201cwell approximated\" by xT\ne\u03b8\u2217. Formally,\ne\u03b8\u2217| is small. In Section 5.2, we see that such a linear\nwe assume that \u03c1 \u2206= maxe\u2208E |w(e) \u2212 xT\ngeneralization leads to ef\ufb01cient learning in real-world networks. Note that all vectors in this paper\nare column vectors.\n\n1As is standard in graph theory, a directed path is a sequence of directed edges connecting a sequence of\n2Notice that the de\ufb01nitions of f (S, w) and f (S, w) are consistent in the sense that if w \u2208 {0, 1}|E|, then\n\ndistinct nodes, under the restriction that all edges are directed in the same direction.\nf (S, w) = f (S, w) with probability 1.\n\n3\n\n\fAlgorithm 1 IMLinUCB: In\ufb02uence Maximization Linear UCB\n\nInput: graph G, source node set cardinality K, oracle ORACLE, feature vector xe\u2019s, and algorithm\nparameters \u03c3, c > 0,\nInitialization: B0 \u2190 0 \u2208 (cid:60)d, M0 \u2190 I \u2208 (cid:60)d\u00d7d\nfor t = 1, 2, . . . , n do\n\nt\u22121Bt\u22121 and the UCBs as Ut(e) \u2190 Proj[0,1]\n\n1. set \u03b8t\u22121 \u2190 \u03c3\u22122M\u22121\nfor all e \u2208 E\n2. choose St \u2208 ORACLE(G, K, Ut), and observe the edge-level semi-bandit feedback\n3. update statistics:\n\ne\u03b8t\u22121 + c\nxT\n\n(a) initialize Mt \u2190 Mt\u22121 and Bt \u2190 Bt\u22121\n(b) for all observed edges e \u2208 E, update Mt \u2190 Mt + \u03c3\u22122xexT\n\ne and Bt \u2190 Bt + xewt(e)\n\neM\u22121\nxT\n\nt\u22121xe\n\n(cid:18)\n\n(cid:113)\n\n(cid:19)\n\nSimilar to the existing approaches for linear bandits [1, 9], we exploit the linear generalization to\ndevelop a learning algorithm for ICSB. Without loss of generality, we assume that (cid:107)xe(cid:107)2 \u2264 1 for\nall e \u2208 E. Moreover, we use X \u2208 (cid:60)|E|\u00d7d to denote the feature matrix, i.e., the row of X associated\nwith edge e is xT\ne. Note that if a learning agent does not know how to construct good features, it can\nalways choose the na\u00efve feature matrix X = I \u2208 (cid:60)|E|\u00d7|E| and have no generalization model across\nedges. We refer to the special case X = I \u2208 (cid:60)|E|\u00d7|E| as the tabular case.\n\n3.3 IMLinUCB algorithm\n\nIn this section, we propose In\ufb02uence Maximization Linear UCB (IMLinUCB), detailed in Algorithm 1.\nNotice that IMLinUCB represents its past observations as a positive-de\ufb01nite matrix (Gram matrix)\nMt \u2208 (cid:60)d\u00d7d and a vector Bt \u2208 (cid:60)d. Speci\ufb01cally, let Xt be a matrix whose rows are the feature\nvectors of all observed edges in t steps and Yt be a binary column vector encoding the realizations of\nall observed edges in t steps. Then Mt = I + \u03c3\u22122XT\nAt each round t, IMLinUCB operates in three steps: First, it computes an upper con\ufb01dence bound\nUt(e) for each edge e \u2208 E. Note that Proj[0,1](\u00b7) projects a real number into interval [0, 1] to ensure\nthat Ut \u2208 [0, 1]|E|. Second, it chooses a set of source nodes based on the given ORACLE and Ut, which\nis also a probability-weight function. Finally, it receives the edge semi-bandit feedback and uses it to\nupdate Mt and Bt. It is worth emphasizing that IMLinUCB is computationally ef\ufb01cient as long as\nORACLE is computationally ef\ufb01cient. Speci\ufb01cally, at each round t, the computational complexities of\n\nboth Step 1 and 3 of IMLinUCB are O(cid:0)|E|d2(cid:1).3\n\nt Xt and Bt = XT\n\nt Yt.\n\nIt is worth pointing out that in the tabular case, IMLinUCB reduces to CUCB [7], in the sense that the\ncon\ufb01dence radii in IMLinUCB are the same as those in CUCB, up to logarithmic factors. That is, CUCB\ncan be viewed as a special case of IMLinUCB with X = I.\n\n3.4 Performance metrics\n\nRecall that the agent\u2019s objective is to maximize the expected cumulative reward, which is equivalent to\nminimizing the expected cumulative regret. The cumulative regret is the loss in reward (accumulated\nover rounds) because of the lack of knowledge of the activation probabilities. Observe that in each\nround t, IMLinUCB needs to use an approximation/randomized algorithm ORACLE for solving the\nof\ufb02ine IM problem. Naturally, this can lead to O(n) cumulative regret, since at each round there is\na non-diminishing regret due to the approximation/randomized nature of ORACLE. To analyze the\nperformance of IMLinUCB in such cases, we de\ufb01ne a more appropriate performance metric, the scaled\nt ], where n is the number of steps, \u03b7 > 0 is the scale, and\nt = f (S opt, wt) \u2212 1\nR\u03b7\nt at round t. When \u03b7 = 1, R\u03b7(n)\nreduces to the standard expected cumulative regret R(n).\n\ncumulative regret, as R\u03b7(n) =(cid:80)n\n\n\u03b7 f (St, wt) is the \u03b7-scaled realized regret R\u03b7\n\nE [R\u03b7\n\nt=1\n\n3Notice that in a practical implementation, we store M\n\nis equivalent to M\n\nt \u2190 M\n\u22121\n\nt \u2212 M\n\u22121\n\n\u22121\nt xexT\neM\n\n\u22121\nt xe+\u03c32 .\nt\n\u22121\n\neM\n\nxT\n\n\u22121\nt\n\ninstead of Mt. Moreover, Mt \u2190 Mt + \u03c3\u22122xexT\n\ne\n\n4\n\n\fFigure 1: a. Bar graph on 8 nodes. b. Star graph on 4 nodes. c. Ray graph on 10 nodes. d. Grid\ngraph on 9 nodes. Each undirected edge denotes two directed edges in opposite directions.\n\n4 Analysis\n\ne\u03b8\u2217 for all e \u2208 E,\nIn this section, we give a regret bound for IMLinUCB for the case when w(e) = xT\ni.e., the linear generalization is perfect. Our main contribution is a regret bound that scales with a new\ncomplexity metric, maximum observed relevance, which depends on both the topology of G and the\nprobability weight function w, and is de\ufb01ned in Section 4.1. We highlight this as most known results\nfor this problem are worst case, and some of them do not depend on probability weight function at all.\n\n4.1 Maximum observed relevance\nWe start by de\ufb01ning some terminology. For given directed graph G = (V,E) and source node set\nS \u2286 V, we say an edge e \u2208 E is relevant to a node v \u2208 V \\ S under S if there exists a path p from a\nsource node s \u2208 S to v such that (1) e \u2208 p and (2) p does not contain another source node other than\ns. Notice that with a given S, whether or not a node v \u2208 V \\ S is in\ufb02uenced only depends on the\nbinary weights w on its relevant edges. For any edge e \u2208 E, we de\ufb01ne NS,e as the number of nodes\nin V \\ S it is relevant to, and de\ufb01ne PS,e as the conditional probability that e is observed given S,\n\nv\u2208V\\S 1{e is relevant to v under S}\n\n(1)\nNotice that NS,e only depends on the topology of G, while PS,e depends on both the topology of G\nand the probability weight w. The maximum observed relevance C\u2217 is de\ufb01ned as the maximum\n(over S) 2-norm of NS,e\u2019s weighted by PS,e\u2019s,\n\n\u2206= P (e is observed |S) .\n\nand PS,e\n\n\u2206=(cid:80)\n\nNS,e\n\n(cid:113)(cid:80)\n\nC\u2217 \u2206= maxS: |S|=K\n\ne\u2208E N 2S,ePS,e.\n\n(2)\n\n(cid:113)(cid:80)\nL(cid:112)|E|(cid:17)\n\n= O(cid:0)L2(cid:1) ,\n\ne\u2208E N 2S,e \u2264 (L \u2212 K)(cid:112)|E| = O(cid:16)\n(cid:113)(cid:80)\n\nAs is detailed in the proof of Lemma 1 in Appendix A, C\u2217 arises in the step where Cauchy-Schwarz\ninequality is applied. Note that C\u2217 also depends on both the topology of G and the probability\nweight w. However, C\u2217 can be bounded from above only based on the topology of G or the size of the\nproblem, i.e., L = |V| and |E|. Speci\ufb01cally, by de\ufb01ning CG \u2206= maxS: |S|=K\ne\u2208E N 2S,e, we have\n\nC\u2217 \u2264 CG = maxS: |S|=K\n\n(3)\nwhere CG is the maximum/worst-case (over w) C\u2217 for the directed graph G, and the maximum is\nobtained by setting w(e) = 1 for all e \u2208 E. Since CG is worst-case, it might be very far away\nfrom C\u2217 if the activation probabilities are small. Indeed, this is what we expect in typical real-\nworld situations. Notice also that if maxe\u2208E w(e) \u2192 0, then PS,e \u2192 0 for all e /\u2208 E(S) and\nPS,e = 1 for all e \u2208 E(S), where E(S) is the set of edges with start node in S, hence we have\nC\u2217 \u2192 C 0G \u2206= maxS: |S|=K\ne\u2208E(S) N 2S,e. In particular, if K is small, C 0G is much less than CG in\nmany topologies. For example, in a complete graph with K = 1, CG = \u0398(L2) while C 0G = \u0398(L 3\n2 ).\nFinally, it is worth pointing out that there exist situations (G, w) such that C\u2217 = \u0398(L2). One such\nexample is when G is a complete graph with L nodes and w(e) = L/(L + 1) for all edges e in this\ngraph.\nTo give more intuition, in the rest of this subsection, we illustrate how CG, the worst-case C\u2217, varies\nwith four graph topologies in Figure 1: bar, star, ray, and grid, as well as two other topologies:\n\n(cid:113)(cid:80)\n\n5\n\n(a)(b)(c)(d)\fwith k = (cid:6)\u221a\n\nL \u2212 1(cid:7) arms, where node 1 is central and each arm contains either (cid:100)(L \u2212 1)/k(cid:101) or\n\ngeneral tree and complete graph. We \ufb01x the node set V = {1, 2, . . . , L} for all graphs. The bar\ngraph (Figure 1a) is a graph where nodes i and i + 1 are connected when i is odd. The star graph\n(Figure 1b) is a graph where node 1 is central and all remaining nodes i \u2208 V \\ {1} are connected\nto it. The distance between any two of these nodes is 2. The ray graph (Figure 1c) is a star graph\n(cid:98)(L \u2212 1)/k(cid:99) nodes connected in a line. The distance between any two nodes in this graph is O(\nThe grid graph (Figure 1d) is a classical non-tree graph with O(L) edges.\nTo see how CG varies with the graph topology, we start with the simpli\ufb01ed case when K = |S| = 1.\nIn the bar graph (Figure 1a), only one edge is relevant to a node v \u2208 V \\ S and all the other edges\nare not relevant to any nodes. Therefore, CG \u2264 1. In the star graph (Figure 1b), for any s, at\nmost one edge is relevant to at most L \u2212 1 nodes and the remaining edges are relevant to at most\none node. In this case, CG \u2264 \u221a\nL2 + L = O(L). In the ray graph (Figure 1c), for any s, at most\n(cid:112)\n\u221a\nO(\nbound CG by O(L(cid:112)|E|), regardless of K. Hence, for the grid graph (Figure 1d) and general tree\nL)\nnodes. In this case, CG = O(\n4 ). Finally, recall that for all graphs we can\ngraph, CG = O(L 3\n2 ) since |E| = O(L); for the complete graph CG = O(L2) since |E| = O(L2).\nClearly, CG varies widely with the topology of the graph. The second column of Table 1 summarizes\nhow CG varies with the above-mentioned graph topologies for general K = |S|.\n\nL) edges are relevant to L \u2212 1 nodes and the remaining edges are relevant to at most O(\n\n2 L2 + LL) = O(L 5\nL 1\n\nL).\n\n\u221a\n\n\u221a\n\n4.2 Regret guarantees\n\nConsider C\u2217 de\ufb01ned in Section 4.1 and recall the worst-case upper bound C\u2217 \u2264 (L \u2212 K)(cid:112)|E|, we\n\nhave the following regret guarantees for IMLinUCB.\n\nTheorem 1 Assume that (1) w(e) = xT\nalgorithm. Let D be a known upper bound on (cid:107)\u03b8\u2217(cid:107)2, if we apply IMLinUCB with \u03c3 = 1 and\n\ne\u03b8\u2217 for all e \u2208 E and (2) ORACLE is an (\u03b1, \u03b3)-approximation\n\n(cid:115)\n\n(cid:18)\n\n(cid:19)\n\nn|E|\nd\n\nc =\n\nd log\n\n1 +\n\n+ 2 log (n(L + 1 \u2212 K)) + D,\n\n(cid:19)\n\n+ 1 = (cid:101)O(cid:16)\n\n(cid:17)\ndC\u2217(cid:112)|E|n/(\u03b1\u03b3)\n\n(4)\n\n(5)\n\n(6)\n\n(7)\n\n(8)\n\nthen we have\n\n(cid:115)\ndn|E| log2\n\n(cid:18)\nn/(\u03b1\u03b3)(cid:1) .\n\nn|E|\nd\n\n1 +\n\nR\u03b1\u03b3(n) \u2264 2cC\u2217\n\u03b1\u03b3\n\n\u2264 (cid:101)O(cid:0)d(L \u2212 K)|E|\u221a\n\u2264 (cid:101)O(cid:16)\n\nR\u03b1\u03b3(n) \u2264 2cC\u2217\n\u03b1\u03b3\n\n(L \u2212 K)|E| 3\n\n2\n\n(cid:17)\n\n\u221a\n\nn/(\u03b1\u03b3)\n\n.\n\nMoreover, if the feature matrix X = I \u2208 (cid:60)|E|\u00d7|E| (i.e., the tabular case), we have\n\n(cid:112)n|E| log2 (1 + n) + 1 = (cid:101)O(cid:0)|E|C\u2217\n\nn/(\u03b1\u03b3)(cid:1)\n\n\u221a\n\nPlease refer to Appendix A for the proof of Theorem 1, that we outline in Section 4.3. We now brie\ufb02y\ncomment on the regret bounds in Theorem 1.\nTopology-dependent bounds: Since C\u2217 is topology-dependent, the regret bounds in Equations 5\nand 7 are also topology-dependent. Table 1 summarizes the regret bounds for each topology4\ndiscussed in Section 4.1. Since the regret bounds in Table 1 are the worst-case regret bounds for a\ngiven topology, more general topologies have larger regret bounds. For instance, the regret bounds\nfor tree are larger than their counterparts for star and ray, since star and ray are special trees. The grid\nand tree can also be viewed as special complete graphs by setting w(e) = 0 for some e \u2208 E, hence\ncomplete graph has larger regret bounds. Again, in practice we expect C\u2217 to be far smaller due to\nactivation probabilities.\n\n4The regret bound for bar graph is based on Theorem 2 in the appendix, which is a stronger version of\n\nTheorem 1 for disconnected graph.\n\n6\n\n\ftopology\nbar graph\n\nstar graph\n\nray graph\n\ntree graph\n\ngrid graph\n\ncomplete graph\n\nn/(\u03b1\u03b3))\n\n\u221a\nO(\nK)\n\u221a\nO(L\nK)\n\u221a\nO(L 5\n\nCG (worst-case C\u2217) R\u03b1\u03b3(n) for general X R\u03b1\u03b3(n) for X = I\n\u221a\nKn/(\u03b1\u03b3)\n\u221a\nKn/(\u03b1\u03b3)\n\u221a\nKn/(\u03b1\u03b3)\n\u221a\n\u221a\n\ndL 3\ndL 7\n\nKn/(\u03b1\u03b3)\n\nKn/(\u03b1\u03b3)\n\n(cid:17)\n(cid:17)\n\nK)\n\n4\n\n2\n\n2\n\nL\n\n(cid:101)O(cid:16)\n(cid:101)O(cid:16)\n(cid:101)O(cid:16)\n(cid:101)O(cid:16)\n(cid:101)O(cid:16)\n(cid:101)O(cid:0)L4\u221a\n\nL2\nL 9\nL 5\nL 5\n\n4\n\n2\n\n(cid:17)\n(cid:17)\n(cid:17)\n(cid:17)\n(cid:17)\nn/(\u03b1\u03b3)(cid:1)\n\nn/(\u03b1\u03b3)\n\nn/(\u03b1\u03b3)\n\n\u221a\n\u221a\n\u221a\n\n(cid:101)O (dK\n(cid:101)O(cid:16)\n(cid:101)O(cid:16)\n(cid:101)O(cid:0)dL2\u221a\n(cid:101)O(cid:0)dL2\u221a\n(cid:101)O(cid:0)dL3\u221a\n\n4\n\nn/(\u03b1\u03b3)(cid:1)\nn/(\u03b1\u03b3)(cid:1)\nn/(\u03b1\u03b3)(cid:1)\n\nO(L 3\n2 )\nO(L 3\n2 )\nO(L2)\n\nTable 1: CG and worst-case regret bounds for different graph topologies.\n\nTighter bounds in tabular case and under exact oracle: Notice that for the tabular case with\n\nfeature matrix X = I and d = |E|, (cid:101)O((cid:112)|E|) tighter regret bounds are obtained in Equations 7 and 8.\nAlso notice that the (cid:101)O(1/(\u03b1\u03b3)) factor is due to the fact that ORACLE is an (\u03b1, \u03b3)-approximation\n\noracle. If ORACLE solves the IM problem exactly (i.e., \u03b1 = \u03b3 = 1), then R\u03b1\u03b3(n) = R(n).\nTightness of our regret bounds: First, note that our regret bound in the bar case with K = 1 matches\nthe regret bound of the classic LinUCB algorithm. Speci\ufb01cally, with perfect linear generalization, this\ncase is equivalent to a linear bandit problem with L arms and feature dimension d. From Table 1,\nn), which matches the known regret bound of LinUCB that can\nbe obtained by the technique of [1]. Second, we brie\ufb02y discuss the tightness of the regret bound in\nn)-dependence on time\n\nour regret bound in this case is (cid:101)O (d\nEquation 6 for a general graph with L nodes and |E| edges. Note that the (cid:101)O(\nis near-optimal, and the (cid:101)O(d)-dependence on feature dimension is standard in linear bandits [1, 33],\nsince (cid:101)O(\nd) results are only known for impractical algorithms. The (cid:101)O(L \u2212 K) factor is due to the\nfact that the reward in this problem is from K to L, rather than from 0 to 1. To explain the (cid:101)O(|E|)\nfactor in this bound, notice that one (cid:101)O((cid:112)|E|) factor is due to the fact that at most (cid:101)O(|E|) edges might\nsemi-bandits [19]; another (cid:101)O((cid:112)|E|) factor is due to linear generalization (see Lemma 1) and might\nbe removed by better analysis. We conjecture that our (cid:101)O (d(L \u2212 K)|E|\u221a\nthis case is at most (cid:101)O((cid:112)|E|d) away from being tight.\n\nbe observed at each round (see Theorem 3), and is intrinsic to the problem similarly to combinatorial\n\nn/(\u03b1\u03b3)) regret bound in\n\n\u221a\n\n\u221a\n\n\u221a\n\n(cid:113)\n\n4.3 Proof sketch\nWe now outline the proof of Theorem 1. For each round t \u2264 n, we de\ufb01ne the favorable event\n\u03bet\u22121 = {|xT\n\u03c4\u22121xe, \u2200e \u2208 E, \u2200\u03c4 \u2264 t}, and the unfavorable event \u03bet\u22121 as\nthe complement of \u03bet\u22121. If we decompose E[R\u03b1\u03b3\n], the (\u03b1\u03b3)-scaled expected regret at round t, over\nevents \u03bet\u22121 and \u03bet\u22121, and bound R\u03b1\u03b3\n\nt on event \u03bet\u22121 using the na\u00efve bound R\u03b1\u03b3\n\nt \u2264 L \u2212 K, then,\n\ne(\u03b8\u03c4\u22121 \u2212 \u03b8\u2217)| \u2264 c\n\neM\u22121\nxT\n\n|\u03bet\u22121] + P(cid:0)\u03bet\u22121\nBy choosing c as speci\ufb01ed by Equation 4, we have P(cid:0)\u03bet\u22121\n\n] \u2264 P (\u03bet\u22121) E [R\u03b1\u03b3\n\n(cid:1) [L \u2212 K].\n(cid:1) [L \u2212 K] < 1/n (see Lemma 2 in\n\nthe appendix). On the other hand, notice that by de\ufb01nition of \u03bet\u22121, w(e) \u2264 Ut(e), \u2200e \u2208 E under\nevent \u03bet\u22121. Using the monotonicity of f in the probability weight, and the fact that ORACLE is an\n(\u03b1, \u03b3)-approximation algorithm, we have\n\nE[R\u03b1\u03b3\n\nt\n\nt\n\nt\n\nE [R\u03b1\u03b3\n\nt\n\n|\u03bet\u22121] \u2264 E [f (St, Ut) \u2212 f (St, w)|\u03bet\u22121] /(\u03b1\u03b3).\n\nThe next observation is that, from the linearity of expectation, the gap f (St, Ut) \u2212 f (St, w) decom-\nposes over nodes v \u2208 V \\ St. Speci\ufb01cally, for any source node set S \u2286 V, any probability weight\nfunction w : E \u2192 [0, 1], and any node v \u2208 V, we de\ufb01ne f (S, w, v) as the probability that node v is\nin\ufb02uenced if the source node set is S and the probability weight is w. Hence, we have\n\nf (St, Ut) \u2212 f (St, w) =(cid:80)\n\nv\u2208V\\St\n\n[f (St, Ut, v) \u2212 f (St, w, v)] .\n\n7\n\n\f(a) Stars and rays: The log-log plots of the n-step regret of\nIMLinUCB in two graph topologies after n = 104 steps. We vary\nthe number of nodes L and the mean edge weight \u03c9.\n\n(b) Subgraph of Facebook network\n\nFigure 2: Experimental results\n\nIn the appendix, we show that under any weight function, the diffusion process from the source node\nset St to the target node v can be modeled as a Markov chain. Hence, weight function Ut and w give\nus two Markov chains with the same state space but different transition probabilities. f (St, Ut, v) \u2212\nf (St, w, v) can be recursively bounded based on the state diagram of the Markov chain under weight\nfunction w. With some algebra, Theorem 3 in Appendix A bounds f (St, Ut, v) \u2212 f (St, w, v) by the\nedge-level gap Ut(e) \u2212 w(e) on the observed relevant edges for node v,\n\n(9)\nfor any t, any \u201chistory\" (past observations) Ht\u22121 and St such that \u03bet\u22121 holds, and any v \u2208 V \\ St,\nwhere ESt,v is the set of edges relevant to v and Ot(e) is the event that edge e is observed at round\nt. Based on Equation 9, we can prove Theorem 1 using the standard linear-bandit techniques (see\nAppendix A).\n\nE [1{Ot(e)} [Ut(e) \u2212 w(e)]|Ht\u22121,St] ,\n\nf (St, Ut, v) \u2212 f (St, w, v) \u2264(cid:80)\n\ne\u2208ESt,v\n\n5 Experiments\n\nIn this section, we present a synthetic experiment in order to empirically validate our upper bounds\non the regret. Next, we evaluate our algorithm on a real-world Facebook subgraph.\n\n5.1 Stars and rays\n\nIn the \ufb01rst experiment, we evaluate IMLinUCB on undirected stars and rays (Figure 1) and validate\nthat the regret grows with the number of nodes L and the maximum observed relevance C\u2217 as shown\nin Table 1. We focus on the tabular case (X = I) with K = |S| = 1, where the IM problem can be\nsolved exactly. We vary the number of nodes L; and edge weight w(e) = \u03c9, which is the same for all\nedges e. We run IMLinUCB for n = 104 steps and verify that it converges to the optimal solution in\neach experiment. We report the n-step regret of IMLinUCB for 8 \u2264 L \u2264 32 in Figure 2a. Recall that\n\nfrom Table 1, R(n) = (cid:101)O(L2) for star and R(n) = (cid:101)O(L 9\n\n4 ) for ray.\n\nWe numerically estimate the growth of regret in L, the exponent of L, in the log-log space of L and\nregret. In particular, since log(f (L)) = p log(L) + log(c) for any f (L) = cLp and c > 0, both p\nand log(c) can be estimated by linear regression in the new space. For star graphs with \u03c9 = 0.8 and\n\u03c9 = 0.7, our estimated growth are respectively O(L2.040) and O(L2.056), which are close to the\n\nexpected (cid:101)O(L2). For ray graphs with \u03c9 = 0.8 and \u03c9 = 0.7, our estimated growth are respectively\nO(L2.488) and O(L2.467), which are again close to the expected (cid:101)O(L 9\n\n4 ). This shows that maximum\nobserved relevance C\u2217 proposed in Section 4.1 is a reasonable complexity metric for these two\ntopologies.\n\n5.2 Subgraph of Facebook network\n\nIn the second experiment, we demonstrate the potential performance gain of IMLinUCB in real-\nworld in\ufb02uence maximization semi-bandit problems by exploiting linear generalization across edges.\nSpeci\ufb01cally, we compare IMLinUCB with CUCB in a subgraph of Facebook network from [22]. The\nsubgraph has L = |V| = 327 nodes and |E| = 5038 directed edges. Since the true probability weight\n\n8\n\n8162432L210212214216Regret! = 0.8, X = IStarRay8162432L29211213215Regret! = 0.7, X = I8162432L2829210211Regret! = 0.8, X = X4010002000300040005000Number of Rounds00.511.522.5Cumulative Regret#105CUCBIMLinUCB with d=10\ffunction w is not available, we independently sample w(e)\u2019s from the uniform distribution U (0, 0.1)\nand treat them as ground-truth. Note that this range of probabilities is guided by empirical evidence\nin [14, 3]. We set n = 5000 and K = 10 in this experiment. For IMLinUCB, we choose d = 10\nand generate edge feature xe\u2019s as follows: we \ufb01rst use node2vec algorithm [15] to generate a node\nfeature in (cid:60)d for each node v \u2208 V; then for each edge e, we generate xe as the element-wise product\nof node features of the two nodes connected to e. Note that the linear generalization in this experiment\nis imperfect in the sense that min\u03b8\u2208(cid:60)d maxe\u2208E |w(e) \u2212 xT\ne \u03b8| > 0. For both CUCB and IMLinUCB,\nwe choose ORACLE as the state-of-the-art of\ufb02ine IM algorithm proposed in [27]. To compute the\ncumulative regret, we compare against a \ufb01xed seed set S\u2217 obtained by using the true w as input to the\noracle proposed in [27]. We average the empirical cumulative regret over 10 independent runs, and\nplot the results in Figure 2b. The experimental results show that compared with CUCB, IMLinUCB can\nsigni\ufb01cantly reduce the cumulative regret by exploiting linear generalization across w(e)\u2019s.\n\n6 Related Work\n\nThere exist prior results on IM semi-bandits [21, 8, 31]. First, Lei et al. [21] gave algorithms for the\nsame feedback model as ours. The algorithms are not analyzed and cannot solve large-scale problems\nbecause they estimate each edge weight independently. Second, our setting is a special case of\nstochastic combinatorial semi-bandit with a submodular reward function and stochastically observed\nedges [8]. Their work is the closest related work. Their gap-dependent and gap-free bounds are both\nproblematic because they depend on the reciprocal of the minimum observation probability p\u2217 of an\nedge: Consider a line graph with |E| edges where all edge weights are 0.5. Then 1/p\u2217 is 2|E|\u22121. On\nthe other hand, our derived regret bounds in Theorem 1 are polynomial in all quantities of interest.\nA very recent result of Wang and Chen [32] removes the 1/p\u2217 factor in [8] for the tabular case and\nn), which in the tabular complete graph case improves over\n\npresents a worst-case bound of (cid:101)O(L|E|\u221a\nour result by (cid:101)O(L). On the other hand, their analysis does not give structural guarantees that we\n\nprovide with maximum observed relevance C\u2217 obtaining potentially much better results for the case\nin hand and giving insights for the complexity of IM bandits. Moreover, both Chen et al. [8] and\nWang and Chen [32] do not consider generalization models across edges or nodes, and therefore\ntheir proposed algorithms are unlikely to be practical for real-world social networks. In contrast, our\nproposed algorithm scales to large problems by exploiting linear generalization across edges.\nIM bandits for different in\ufb02uence models and settings: There exist a number of extensions and\nrelated results for IM bandits. We only mention the most related ones (see [28] for a recent survey).\nVaswani et al. [31] proposed a learning algorithm for a different and more challenging feedback\nmodel, where the learning agent observes in\ufb02uenced nodes but not the edges, but they do not give\nany guarantees. Carpentier and Valko [5] give a minimax optimal algorithm for IM bandits but only\nconsider a local model of in\ufb02uence with a single source and a cascade of in\ufb02uences never happens.\nIn related networked bandits [11], the learner chooses a node and its reward is the sum of the rewards\nof the chosen node and its neighborhood. The problem gets more challenging when we allow the\nin\ufb02uence probabilities to change [2], when we allow the seed set to be chosen adaptively [30], or\nwhen we consider a continuous model [12]. Furthermore, Sigla et al. [26] treats the IM setting with an\nadditional observability constraints, where we face a restriction on which nodes we can choose at each\nround. This setting is also related to the volatile multi-armed bandits where the set of possible arms\nchanges [4]. Vaswani et al. [29] proposed a diffusion-independent algorithm for IM semi-bandits with\na wide range of diffusion models, based on the maximum-reachability approximation. Despite its\nwide applicability, the maximum reachability approximation introduces an additional approximation\nfactor to the scaled regret bounds. As they have discussed, this approximation factor can be large in\nsome cases. Lagr\u00e9e et al. [20] treat a persistent extension of IM bandits when some nodes become\npersistent over the rounds and no longer yield rewards. This work is also a generalization and\nextension of recent work on cascading bandits [17, 18, 34], since cascading bandits can be viewed as\nvariants of online in\ufb02uence maximization problems with special topologies (chains).\n\nAcknowledgements The research presented was supported by French Ministry of Higher Education and\nResearch, Nord-Pas-de-Calais Regional Council and French National Research Agency projects ExTra-Learn\n(n.ANR-14-CE24-0010-01) and BoB (n.ANR-16-CE23-0003). We would also like to thank Dr. Wei Chen and\nMr. Qinshi Wang for pointing out a mistake in an earlier version of this paper.\n\n9\n\n\fReferences\n[1] Yasin Abbasi-Yadkori, D\u00e1vid P\u00e1l, and Csaba Szepesv\u00e1ri.\n\nstochastic bandits. In Neural Information Processing Systems, 2011.\n\nImproved algorithms for linear\n\n[2] Yixin Bao, Xiaoke Wang, Zhi Wang, Chuan Wu, and Francis C. M. Lau. Online in\ufb02uence\nmaximization in non-stationary social networks. In International Symposium on Quality of\nService, apr 2016.\n\n[3] Nicola Barbieri, Francesco Bonchi, and Giuseppe Manco. Topic-aware social in\ufb02uence propa-\n\ngation models. Knowledge and information systems, 37(3):555\u2013584, 2013.\n\n[4] Zahy Bnaya, Rami Puzis, Roni Stern, and Ariel Felner. Social network search as a volatile\n\nmulti-armed bandit problem. Human Journal, 2(2):84\u201398, 2013.\n\n[5] Alexandra Carpentier and Michal Valko. Revealing graph bandits for maximizing local in\ufb02uence.\n\nIn International Conference on Arti\ufb01cial Intelligence and Statistics, 2016.\n\n[6] Wei Chen, Chi Wang, and Yajun Wang. Scalable in\ufb02uence maximization for prevalent viral\n\nmarketing in large-scale social networks. In Knowledge Discovery and Data Mining, 2010.\n\n[7] Wei Chen, Yajun Wang, and Yang Yuan. Combinatorial multi-armed bandit: General framework,\n\nresults and applications. In International Conference on Machine Learning, 2013.\n\n[8] Wei Chen, Yajun Wang, and Yang Yuan. Combinatorial multi-armed bandit and its extension to\n\nprobabilistically triggered arms. Journal of Machine Learning Research, 17, 2016.\n\n[9] Varsha Dani, Thomas P Hayes, and Sham M Kakade. Stochastic linear optimization under\n\nbandit feedback. In Conference on Learning Theory, 2008.\n\n[10] David Easley and Jon Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly\n\nConnected World. Cambridge University Press, 2010.\n\n[11] Meng Fang and Dacheng Tao. Networked bandits with disjoint linear payoffs. In International\n\nConference on Knowledge Discovery and Data Mining, 2014.\n\n[12] Mehrdad Farajtabar, Xiaojing Ye, Sahar Harati, Le Song, and Hongyuan Zha. Multistage\n\ncampaigning in social networks. In Neural Information Processing Systems, 2016.\n\n[13] M Gomez Rodriguez, B Sch\u00f6lkopf, Langford J Pineau, et al.\n\nIn\ufb02uence maximization in\ncontinuous time diffusion networks. In International Conference on Machine Learning, 2012.\n\n[14] Amit Goyal, Francesco Bonchi, and Laks VS Lakshmanan. Learning in\ufb02uence probabilities in\nsocial networks. In Proceedings of the third ACM international conference on Web search and\ndata mining, pages 241\u2013250. ACM, 2010.\n\n[15] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks.\n\nKnowledge Discovery and Data Mining. ACM, 2016.\n\nIn\n\n[16] David Kempe, Jon Kleinberg, and \u00c9va Tardos. Maximizing the spread of in\ufb02uence through a\n\nsocial network. Knowledge Discovery and Data Mining, page 137, 2003.\n\n[17] Branislav Kveton, Csaba Szepesvari, Zheng Wen, and Azin Ashkan. Cascading bandits:\nLearning to rank in the cascade model. In Proceedings of the 32nd International Conference on\nMachine Learning, 2015.\n\n[18] Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari. Combinatorial cascading\nbandits. In Advances in Neural Information Processing Systems 28, pages 1450\u20131458, 2015.\n\n[19] Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari. Tight regret bounds for\nstochastic combinatorial semi-bandits. In Proceedings of the 18th International Conference on\nArti\ufb01cial Intelligence and Statistics, 2015.\n\n[20] Paul Lagr\u00e9e, Olivier Capp\u00e9, Bogdan Cautis, and Silviu Maniu. Effective large-scale online\n\nin\ufb02uence maximization. In International Conference on Data Mining, 2017.\n\n10\n\n\f[21] Siyu Lei, Silviu Maniu, Luyi Mo, Reynold Cheng, and Pierre Senellart. Online in\ufb02uence\n\nmaximization. In Knowledge Discovery and Data mining, 2015.\n\n[22] Jure Leskovec and Andrej Krevl. Snap datasets: Stanford large network dataset collection.\n\nhttp://snap.stanford.edu/data, jun 2014.\n\n[23] Yanhua Li, Wei Chen, Yajun Wang, and Zhi-Li Zhang. In\ufb02uence diffusion dynamics and in\ufb02u-\nence maximization in social networks with friend and foe relationships. In ACM international\nconference on Web search and data mining. ACM, 2013.\n\n[24] Praneeth Netrapalli and Sujay Sanghavi. Learning the graph of epidemic cascades. In ACM\n\nSIGMETRICS Performance Evaluation Review, volume 40, pages 211\u2013222. ACM, 2012.\n\n[25] Kazumi Saito, Ryohei Nakano, and Masahiro Kimura. Prediction of information diffusion\nprobabilities for independent cascade model. In Knowledge-Based Intelligent Information and\nEngineering Systems, pages 67\u201375, 2008.\n\n[26] Adish Singla, Eric Horvitz, Pushmeet Kohli, Ryen White, and Andreas Krause. Information\ngathering in networks via active exploration. In International Joint Conferences on Arti\ufb01cial\nIntelligence, 2015.\n\n[27] Youze Tang, Xiaokui Xiao, and Shi Yanchen. In\ufb02uence maximization: Near-optimal time\n\ncomplexity meets practical ef\ufb01ciency. 2014.\n\n[28] Michal Valko. Bandits on graphs and structures. habilitation, \u00c9cole normale sup\u00e9rieure de\n\nCachan, 2016.\n\n[29] Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks VS Laksh-\nmanan, and Mark Schmidt. Model-independent online learning for in\ufb02uence maximization. In\nInternational Conference on Machine Learning, 2017.\n\n[30] Sharan Vaswani and Laks V. S. Lakshmanan. Adaptive in\ufb02uence maximization in social\n\nnetworks: Why commit when you can adapt? Technical report, 2016.\n\n[31] Sharan Vaswani, Laks. V. S. Lakshmanan, and Mark Schmidt. In\ufb02uence maximization with\n\nbandits. In NIPS workshop on Networks in the Social and Information Sciences 2015, 2015.\n\n[32] Qinshi Wang and Wei Chen. Improving regret bounds for combinatorial semi-bandits with\nprobabilistically triggered arms and its applications. In Neural Information Processing Systems,\nmar 2017.\n\n[33] Zheng Wen, Branislav Kveton, and Azin Ashkan. Ef\ufb01cient learning in large-scale combinatorial\n\nsemi-bandits. In International Conference on Machine Learning, 2015.\n\n[34] Shi Zong, Hao Ni, Kenny Sung, Nan Rosemary Ke, Zheng Wen, and Branislav Kveton. Cascad-\ning bandits for large-scale recommendation problems. In Uncertainty in Arti\ufb01cial Intelligence,\n2016.\n\n11\n\n\f", "award": [], "sourceid": 1728, "authors": [{"given_name": "Zheng", "family_name": "Wen", "institution": "Adobe Research"}, {"given_name": "Branislav", "family_name": "Kveton", "institution": "Adobe Research"}, {"given_name": "Michal", "family_name": "Valko", "institution": "Inria Lille - Nord Europe"}, {"given_name": "Sharan", "family_name": "Vaswani", "institution": "University of British Columbia"}]}