LA-LDA: A Limited Attention Topic Model for Social Recommendation Jeon-Hyung Kang1, Kristina Lerman1, Lise Getoor2 1 USC Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA [email protected],[email protected] 2 University of Maryland, Computer Science Department, College Park, MD [email protected] Abstract. Social media users have finite attention which limits the number of incoming messages from friends they can process. Moreover, 3 theypaymoreattentiontoopinionsandrecommendationsofsomefriends 1 more than others. In this paper, we propose LA-LDA, a latent topic 0 model which incorporates limited, non-uniformly divided attention in 2 the diffusion process by which opinions and information spread on the n social network. We show that our proposed model is able to learn more a accurate user models from users’ social network and item adoption be- J haviorthanmodelswhichdonottakelimitedattentionintoaccount.We 6 analyze voting on news items on the social news aggregator Digg and 2 show that our proposed model is better able to predict held out votes than alternative models. Our study demonstrates that psycho-socially ] I motivated models have better ability to describe and predict observed S behavior than models which only consider topics. . s c Keywords: social media, diffusion, link analysis, influence [ 1 1 Introduction v 7 7 Information overload has been drastically exacerbated by social media. On sites 2 such as Twitter, YouTube and Facebook, more videos and images are uploaded, 6 blog posts written, and new messages posted than people are able to process. . 1 Social media sites attempt to mitigate this problem by allowing users to sub- 0 scribe to, or follow, updates from specific users only. However, as the number of 3 friends people follow grows, and the amount of information shared expands, the 1 information overload problem returns. : v Though social media contributes to the information overload problem; how- Xi ever it also creates opportunities for solutions. We can apply statistical tech- niques to social media data to learn user preferences and interests from obser- r a vations of their behavior. The learned preferences could then be used to more accurately filter and personalize streams of new information. Consider social recommendation: when a user shares an item, e.g., by posting a link to a news story on Digg or Twitter, he broadcasts it to all his followers. Those followers may in turn share the item with their own followers, and so on, creating a cas- cade through which information and ideas diffuse through the social network. 2 Jeon-Hyung Kang1, Kristina Lerman1, and Lise Getoor2 By analyzing these cascades, who shares what items and when, we can learn what users are interested in and use this knowledge to filter and rank incoming information. The generic diffusion process described above ignores two important ele- ments: (i) users have finite attention, which limits their ability to process rec- ommended items, and (ii) users divide their attention non-uniformly over their friends and interests. Attention is the psychological mechanism that integrates perceptual and cognitive factors to select the small fraction of input to be pro- cessedinrealtime[8,12].Attentionhasbeenshowntobeanimportantfactorin explainingonlineinteractions[17,7].Attentiveacts,e.g.,readingatweet,brows- ing the web, or responding to email, require mental effort, and since the brain’s capacity for mental effort is limited, so is attention. Attention has been shown to impact the popularity of memes [18,17], what people retweet [3,7] and the number of meaningful conversations they can have [5]. Attention is important, because most sites, including Digg and Twitter, display items from friends as a chronologically sorted list, with the newest items at the top of the list. The more friends a user follows, the longer the list, in average. A user scans the list, beginning at the top, and if he finds an item interesting, he may share it with his followers. He will continue scanning the list until he gets bored or dis- tracted,whichislikelytohappenbeforehehadachancetoinspectallnewitems. While a user must divide his limited attention among his friends, he does not divide it uniformly. Some friends are closer or more influential [6,4]; therefore, their recommendations may receive more attention, making them more likely to be adopted. Users may also preferentially pay more attention to each friend depending on topic. In next section we describe a diffusion mechanism that takes into considera- tion the limited, non-uniformly divided attention of social media users. We use this mechanism to motivate LA-LDA, a probabilistic topic model we introduce. Next, we analyze voting on news items on the social news aggregator Digg and show that our model is better able to predict held out votes than alternative modelsthatdonottakelimitedattentionintoaccount.Ourstudydemonstrates that psycho-socially motivated models are better able to describe and predict observed user behavior in social media, and may lead to better tools for solving the information overload problem. 2 LA-LDA Social Recommendation Setting We begin by describing the social rec- ommendation scenario we are modeling. We assume an idealized social media setting, with U users who recommend to each other and adopt items A. Users haveinterestsX,anditemshavetopicsZ,withusersmorelikelytoadoptitems whose topics match their interests. In addition, each user u has N friends frds(u) and can see the items friends adopted. The social recommendation model we propose is dynamic, and describes a number of user actions. A user u can share an item i at time t. An item could LA-LDA: A Limited Attention Topic Model for Social Recommendation 3 be a link to an online resource that a user shares by tweeting it on Twitter or submitting for it on Digg. We assume that when an item is shared by u, the recommendation is broadcast of all of u’s followers. A user u can share a recommended item i at time t, for example, by retweeting the link on Twitter or voting for it on Digg. Wealsointroducethenotionofaseed,theuserwhointroducedtheiteminto the social network. For any item i, there is a set of seed users whose adoptions diffusethroughthesocialnetworkalongfollowerlinks,basedonusers’interests. Finally, what sets our model apart from previous models for social recom- mendations is that we also model user’s attention. Users have limited attention and may not attend to all the items their friends recommend. After attending to an item, they may decide to adopt and share it. Once an item is shared, the limited attention diffusion process continues to unfold. In summary, in the context of social recommendation, limited attention im- plies that users may process all items their friends recommend. How they limit their attention depends on both their interests and their social network. Probabilistic Model WenowintroduceatopicmodelLA-LDAthatcaptures the salient elements, including the limited attention of users, of social recom- mendation. Our model consists of four key components which describe user’s interests(θ ),item’stopics(ψ ),user’sattentiontofriendsondifferentinter- (u) (i) ests (τ ), and user’s limited attention (φ ). We assume there are N users, (u) (u) u N items, and each user u follows N friends. Moreover, each user has N i frds(u) x interests, and each item has N topics. z The LA-LDA model is presented in graphical form in Figure 1(a). There are four parts to the model representation: user level (θ, τ, φ), item level (ψ), interest × topic level (π), and global hyperparameters (α,β,ρ, and η). Each adoption of an item i by a user u has an associated item topic z, and user interestx;Y denotesthefriend(s)whoserecommendationsforiwereadoptedby u. Variables A and Y are observed, while X and Z are hidden. User u’s interest profile θ is a distribution over N interests. Similarly, item i’s topic profile (u) x ψ isadistributionoverN topics.Eachuserpaysattentiontodifferentfriends (i) z depending on interests, so that for user u and interest x, there is an interest- specific distribution τ over frds(u). The distribution of user u’s attention (u,x) over both N interests and frds(u) is captured by φ . Finally, each interest x (u) x and topic z pair has an adoption probability π for items. The generative (x,z) process for item adoption through a social network is shown in Figure 1(b). Inference The inference procedure for our model follows the derivation of the equations for collapsed Gibbs sampling, since we cannot compute posterior dis- tributiondirectlybecauseofthesummationinthedenominator.Byconstructing a Markov chain, we can sample sequentially until the sampled parameters ap- proach the target posterior distributions. In particular, we sample all variables from their distribution by conditioning on the currently assigned values of all 4 Jeon-Hyung Kang1, Kristina Lerman1, and Lise Getoor2 For each user u Generate θ(u) ∼ Dirichlet(α) For each interest x Generate τ(u,x) ∼ Dirichlet(ρ) For each item i Generate ψ(i) ∼ Dirichlet(β) For each interest x For each topic z Generate π(x,z) ∼ Dirichlet(η) For each user u For each adopted item i Choose interest x ∼ Multinomial(θ(u)) Choose friend to pay attention to y ∼ Multinomial(τ(u,x)) Choose topic z ∼ Multinomial(ψ(i)) Choose item i ∼ Multinomial(π(x,z)) (a) (b) Fig.1. The LA-LDA model (user interest profiles(θ), interest-specific attention profiles(τ), item topic profiles(ψ), and adoption probabilities(π)). othervariables.Toapplythisalgorithm,weneedthefullconditionaldistribution and it can be obtained by a probabilistic argument. The Gibbs sampling formulas for the variables are: nk +β nx,k +η −(u,v) −(u,v) P(Z =k|Z ,X,Y,A )∝ (u,v) −(u,v) u n(·) +β×N nx,k +η×N −(u,v) z −(·,·) i P(X =j|X ,Y,Z,A )∝ (1) (u,v) −(u,v) u nj +α ny +ρ nj,z +η −(u,·) −(u,j) −(u,v) n(−·)(u,·)+α×Nxn(−·)(u,j)+ρ×N(frds(u))nj−,z(·,·)+η×Ni wherenk isthenumberoftimestopick isassignedonitem(u,v)excluding −(u,v) thecurrentassignmentofZ ,nx,k isthenumberofadoptionsofitem(u,v) (u,v) −(u,v) under item topic assignment k and user interest assignment of x, excluding the current item topic assignment of Z , A is the set of items adopted by user (u,v) u u, and v ranges over the items in A . (u,v) denotes the index of the vth item u adopted by user u. The first ratio expresses the probability of topic k for item (u,v), and the second ratio expresses the probability of item (u,v)’s adoption undertheitemtopicassignmentk anduserinterestassignmentx.Inthesecond equation, nj is the number of times user u pays attention to interest j −(u,·) excluding the current assignment of X and ny is the number of times (u,v) −(u,j) user u pays attention to friend y on interest x excluding the current assignment of X . The first ratio expresses the probability of user u paying attention (u,v) to interest j and the second ratio expresses the probability that user u pays attention to friend y on interest j. Our model allows the algorithm learn each LA-LDA: A Limited Attention Topic Model for Social Recommendation 5 user’sinterestsbytakingintoaccountthelimitedattentiononfriendsforcertain interests from local perspective, while adopting is given by user’s interest and item’s topic assignment from global perspective. To make the model simple we use symmetric Dirichlet priors. We estimate θ, ψ, π, and φ with sampled values in the standard manner. 3 Evaluation on Synthetic Data Our first set of experiments illustrate the properties of the LA-LDA model used in conjunction with synthetic data. We used social network links among top 5,000 most active users in 2009 dataset, who are followed by in average 81.8 other users (max 984 and median 11). We begin generating synthetic data by creating N items and N users according to the generative model. i u Wemodelthepropagationofitemsthroughthesocialnetworkoveraperiod of N days. We first choose a set of seeders (S%) from N users. Seeders day u will be able to introduce new items into the network. We introduce a special source node, which contains all of the items. Seeders will have the source node as one of their friends. Every user u is assigned a fixed attention budget V , u which determines the total number of items from friends that u can attend to in a day. For simplicity, we represent V as a function of a global attention u limit parameter v and the number of friends user has. This is motivated by the g observation that, at least on Digg, user activity is correlated with the number of friends they follow (the correlation coefficient is 0.1626–0.1701). Intuitively, the number of items a user adopts is some fraction of the number of stories to which a user attends; here, to simplify matters, we assume that user’s attention budget is simply proportional to the number of friends she follows. function Generate Synthetic Data for day =1→N do day for u=1→N do u for attention=1→V do u choose interest x∼Mult(θ ) (u) choose friend y ∼Mult(τ ) (u,x) choose a item i from y choose topic z ∼Mult(ψ ) (i) Adopt and share item with probability π (x,z) end for end for end for end function Synthetic cascades are generated as follows. Each day, every user within her allotted attention budget, will check to see whether her friends have any items that match her interests. Initially, when the cascade starts, the source node is the only friend, which has items, so only seed nodes will be able to adopt and share items. However, as time progresses, and items begin flowing through the network.Eventuallyuserswillexhausttheirattentionbudget,withoutbeingable 6 Jeon-Hyung Kang1, Kristina Lerman1, and Lise Getoor2 toattendtoalltheitemsthattheirfriendssharedwiththem.Whenuserchooses to attend to an item i that has been shared by a friend y, they choose without replacement, so that an item will only be attended to once from a particular friend y. However, we do allow a user to attend the same item from different friends. Once an item has been chosen, the user will adopt (and share) the item with probability π . x,z By varying parameters (S and v ) and hyperparameters (α, β, η, and ρ) we g can create different synthetic datasets and we investigate how well we are able to recover the user interests from the generated data using LA-LDA (or LDA) model. We evaluate the performance of models by measuring the similarity of the learned and the actual distributions by the average deviation between the Jensen-Shannondivergenceoftheirvectors.Theaveragedeviationissmallwhen two vectors are similar without considering the indexing of the interests. elta 60 elta 60 elta 60 LLAD−ALDA elta 60 LLAD−ALDA d d d d g 40 LA−LDA g 40 LA−LDA g 40 g 40 av LDA av LDA av av 200 0.5 1 200 0.5 1 200 0.5 1 200 0.5 1 (cid:108) (cid:95) (cid:108) (cid:95) (a) (b) (c) (d) Fig.2. The average deviation of user interest (θ) and item topic (ψ) with different limited attention values (ρ and α) on synthetic. The top two figures show average deviationbetweenlearnedandactualθ when(a)α=0.05andρ=0.05,0.1,0.5,and1.0 and (b) ρ=0.05 and α=0.05, 0.1, 0.5, and 1.0. The bottom two figures show average deviation between learned and actual ψ when (c) α = 0.05 and (d) ρ = 0.05. For comparison, we learned two different LDA models, one for user interests and one for item topics. We learn the LDA for interest distributions of users θ by viewing a user as a document and items as terms in a document, and we learn the LDA for topic distributions of items ψ by setting item as a document and users as terms in a document. We also ran LA-LDA to learn both θ and ψ in accordance with that model. For generating the synthetic data, we set v =2, g β=0.1, η=0.1 and S=30%) and varied α (0.05, 0.1, 0.5, and 1.0) and ρ (0.05, 0.1, 0.5, and 1.0). We applied the same hyperparameters used to generate the synthetic data in the models. The average deviation between learned and actual interests and topics of itemsinthesyntheticdatasetsareshowninFig.2.Withlargevaluesofα,users allocatetheirattentionuniformlyoverinterests,sousersaremorelikelytoadopt items on a variety of interests. Because of this adoption tendency, it is hard to distinguishtheirinterests.Forsmallvaluesofα,userspayattentiontoalimited numberofinterestsandmorecanbelearnedfromtheiradoptionbehavior.That iswhybothLDAandLA-LDAperformbetterforsmallαvalues.Similarly,large values of ρ cause users to pay attention to their friends uniformly, while small LA-LDA: A Limited Attention Topic Model for Social Recommendation 7 values focuses users’ attention to a smaller subset of their friends. With large ρ values, average deviations of both models are high, whereas for lower values both models perform better. In all four cases, LA-LDA is superior to LDA in learning interests distribution of users and topics distribution of items for all α and ρ values. 4 Evaluation on Digg We evaluate LA-LDA on real-world data from the social news aggregator Digg, whichallowsuserstosubmitlinkstonewsstoriesandotheruserstovotefor(or “digg”)storiestheyfindinteresting.Diggalsoallowsuserstofollowtheactivity of other users to see the stories they submitted or dugg recently. When a user votes for a story, this recommendation is broadcast to all his followers. At the time data was collected, users were submitting many thousands of stories, from which Digg selected a handful to promote to its popular front page. We evaluated two datasets The 2009 dataset [9] contains information about the voting history of 70K active users (with 1.7M social links) on 3.5K stories promotedtoDiggfrontpageinJune,andcontains2.1Mvotes.Atthetime,Digg assigned stories to one of eight topics (Entertainment, Lifestyle, Science, Tech- nology, World & Business, Sports, Offbeat, and Gaming). The 2010 dataset [15] containsinformationaboutvotinghistoriesof12Kusers(with1.3Msociallinks) over a 6 months period (Jul – Dec). It includes 48K stories with 1.9M votes. At the time data was collected, Digg assigned stories to 10 topics, replacing the “World & Business” topic with “World News,” “Business,” and “Politics”. Before a story is promoted to the front page, it is visible on the upcoming storiesqueueandtothesubmitter’sfollowers.Witheachnewvote,thestorybe- comesvisibletothatvoter’sfollowers.Weexamineonlythevotesthatthestory accrued before promotion to the front page, during which time it propagated mainly via friends’ recommendations. In the 2009 dataset, 28K users voted for 3Kstoriesandinthe2010dataset,4Kusersvotedfor36Kstoriesbeforepromo- tion. We focused the data further by selecting those users who voted at least 10 times, resulting in 2,390 users (who voted for 3,553 stories) in the 2009 dataset and 2,330 users (who voted on 22,483 stories) in the 2010 dataset. LA-LDA has six parameters: the number of interests (N ) and topics (N ) x z and hyperparameters α, β, η, and ρ. The choice of hyperparameters can have implications inference results. While our algorithm can be extended to learn hyperparameters,herewefixthem(0.1)andfocusontheconsequencesofvarying thenumberoftopicsandinterests(from5to800).Weestimatetheperformance of model by computing the likelihood of the training set given the model for differentcombinationsofparameters.Wetooksamplesatalagof100iterations after discarding the first 1000 iterations and both algorithms stabilize within 2000 iterations. The best performance is obtained for N = 10 interests and x N =200 topics in the 2009 dataset and N =30 interests and N =200 topics z x z inthe2010datasetforbothITMandLA-LDA.LDAresultsinbestperformance for 200 interests in the 2009 and 500 interests in the 2010 dataset. 8 Jeon-Hyung Kang1, Kristina Lerman1, and Lise Getoor2 Evaluation of Learned User Interests The topics assigned to stories by Digg provide useful evidence for evaluating topic models. We represent user u’s preferencesbyconstructinganempiricalinterestvectorthatgivesthefractionof votesmadebyuoneachtopic.Theempiricalinterestvectorservesasgoldstan- dard for evaluating user interests learned by different topic models. We measure the similarity of the distributions using average Jensen-Shannon divergence. In bothdatasets,LA-LDA(2009dataset:15.11&2010dataset:28.71)outperforms ITM [11] (36.38 & 36.01) and LDA [1] (37.72 & 55.43) models by learning user interests that are closer to the gold standard. Evaluation on Vote Prediction We evaluate our proposed topic models by measuring how well they allow us to predict individual votes. There are 257K pre-promotion votes in the 2009 dataset and 1.5M votes in the 2010 dataset, with 72.34 and 68.20 average votes per story, respectively. For our evaluation, we randomly split the data into training and test sets, and performed five-fold cross validation. To generate the test set, we use the held-out votes (positive examples)andaugmentitwithstoriesthatfriendsofuserssharedbutthatwere not adopted by user. Depending on a user’s and their friends’ activities, there are different numbers of positive (Nu ) in the test set. The average percentage pos of Nu in the test set is 0.73% (max 18%, min 0.02%, and median 0.13%), pos suggesting that friends share many stories that users do not end up not voting for. This makes the prediction task extremely challenging, with less than one in a hundred chance of successfully predicting votes if stories are picked randomly. Wetrainthemodelsonthedatainthetrainingset.Then,foreachstoryiin the test set, we compute the probability user u votes for it, given training data D. For LDA, the probability of the vote on i is the probability of adopting a : i (cid:90) (cid:88) P(ai|D)= P(ai|x)P(x|θ)P(θ|D)dθ (2) θ x For ITM, the probability that user u votes for story i is obtained by integrating over the posterior Dirichlet distributions of θ and ψ: (cid:90) (cid:90) (cid:88) P(ai|D)= P(ai|z,x)P(z|ψ)P(x|θ)P(ψ|D)P(θ|D)dθdψ (3) ψ θ x,z Finally, in the LA-LDA model, the probability user u votes for story i is: (cid:90) (cid:90) (cid:88) P(ai|D)= P(ai|x,z)P(z|ψ)P(x,y|φ)P(ψ|D)P(φ|D)dφdψ (4) ψ φx,y,z where theprobabilityof auser’s vote isdecided bythe distributionofthe user’s limited attention over friends and interests φ and story’s topic profile ψ. We evaluate performance of the models on the prediction task using average preci- sion. Average precision at Nu for each user is (cid:80) Prec(k)/(Nuser), where pos k=1,n pos Prec(k)istheprecisionatcut-offkinthelistofvotesorderedbytheirlikelihood. LA-LDA: A Limited Attention Topic Model for Social Recommendation 9 We divide users into categories based on their activity in the training set. The first category includes all users and the remaining categories include users who voted for at least 7.5%, 15%, and 25% of the stories in the training set. While LA-LDA outperforms baseline methods in all cases, its comparative ad- vantage improves with user activity. When there is little information about user interests, the precision of all methods is ranges from 1%–3%. As the amount of information about user interests, as expressed through the votes they make, grows,performanceofallmodelsimproves,butthatofLA-LDAimprovesmuch faster.LA-LDAcorrectlypredictsmorethan30%ofthevotesmadebythemost active users, as compared to 11% of the randomly guess. Average 2009 Data 2010 Data Precision Allusers ≥7.5% ≥15% ≥25% Allusers ≥7.5% ≥15% ≥25% random 0.0192 0.0477 0.0617 0.1092 0.0111 0.03619 0.0557 0.1054 LDA 0.0209 0.0440 0.0621 0.1107 0.0182 0.0415 0.0562 0.1117 ITM 0.0220 0.1100 0.1526 0.2693 0.0244 0.1363 0.1763 0.2370 LA-LDA 0.0224 0.11640.16770.3204 0.0376 0.1368 0.18810.3154 Submitter 0.0379 0.0873 0.1138 0.1517 0.0283 0.0483 0.0746 0.1257 Max 0.0789 0.0964 0.1240 0.1707 0.0702 0.0733 0.1080 0.1616 ITM+Submitter 0.0241 0.0904 0.1311 0.1889 0.0381 0.0845 0.1121 0.1816 ITM+Max 0.0257 0.0977 0.1471 0.2365 0.0482 0.1243 0.1645 0.2436 One may ask whether a simple attention allocation heuristic could predict votes as well as LA-LDA, but at a reduced computational cost. We answer thisquestionbypresentingresultsoffourexperimentsstudyingtheeffectofthe influenceheuristiconthepredictiontask.Inthefirstexperiment,predictedvotes for each user are sorted based the influence of the submitter, the first user to post the story on Digg. In the second experiment, they are sorted based on the influence of the most influential (max) voter. The third experiment investigates the effect of including either influence heuristic into the ITM model. In this case,thevoteprobabilitygivenbyEq.3ismultipliedbyrelativeinfluence(with respect to the most influential user in the network) of the submitter or max voter. When there is little information to learn user interests, using a simple heuristic that a user votes for a story if a very influential user recommended it, works well to predict votes, three to four times better than random guess. However, as LA-LDA receives more data about user interests, it is able to learn a model that outperforms the simpler influence-based models. 5 Conclusion Traditional topic models have been extended to a networked setting to model hyperlinks between documents [10], and the varying vocabularies and styles of different authors [13]. Collaborative filtering methods examine item recommen- dations made by many users to discover their preferences and recommend new items that were liked by similar users ([14],[2]) and improve the explanatory power of recommendations by extending LDA [16]. 10 Jeon-Hyung Kang1, Kristina Lerman1, and Lise Getoor2 We introduced LA-LDA, a novel hidden topic model that takes into account social media users’ limited attention. Our work demonstrates the importance of modelingpsychologicalfactors,suchasattention,insocialmediaanalysis.These results may apply beyond social media and point to the fundamental role that psychosocial and cognitive factors play in social communication. People do not haveinfinitetimeandpatiencetoreadallstatusupdatesorscientificarticleson topics they are interested in, see all the movies or read all the books. Attention acts as an “information bottleneck,” selecting a small fraction of available input forfurtherprocessing.Sincehumanattentionisfinite,themechanismsthatguide itbecomeevermoreimportant.Uncoveringthefactorsthatguideattentionwill be the focus of our future work. References 1. D.Blei,A.Ng,andM.Jordan.Latentdirichletallocation.TheJournalofMachine Learning Research, 3:993–1022, 2003. 2. F.C.T.Chua,H.W.Lauw,andE.-P.Lim. Generativemodelsforitemadoptions using social correlation. In TKDE, 2012. 3. S.CountsandK.Fisher. Takingitallin?visualattentioninmicroblogconsump- tion. In ICWSM, 2011. 4. E. Gilbert and K. Karahalios. Predicting tie strength with social media. In CHI, 2009. 5. B. Goncalves, N. Perra, and A. Vespignani. Validation of Dunbar’s number in Twitter conversations. arXiv.org, 2011. 6. M. S. Granovetter. The Strength of Weak Ties. American Journal of Sociology, 78(6):1360–1380, 1973. 7. N. Hodas and K. Lerman. How limited visibility and divided attention constrain social contagion. In SocialCom, 2012. 8. D. Kahneman. Attention and effort. Prentice Hall, 1973. 9. K. Lerman and R. Ghosh. Information contagion: an empirical study of spread of news on digg and twitter social networks. In ICWSM, 2010. 10. R. Nallapati and W. Cohen. Link-PLSA-LDA: A new unsupervised model for topics and influence of blogs. In ICWSM, 2008. 11. A. Plangprasopchok and K. Lerman. Modeling social annotation: a bayesian ap- proach. ACM Transactions on Knowledge Discovery from Data, 5(1):4, 2010. 12. R.Rensink,J.O’Regan,andJ.Clark. Toseeornottosee:Theneedforattention to perceive changes in scenes. Psychological Science, 8(5):368, 1997. 13. M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI, 2004. 14. B.Sarwar,G.Karypis,J.Konstan,andJ.Riedl. Itembasedcollaborativefiltering recommendation algorithms. In WWW, 2001. 15. H.Sharara,W.Rand,andL.Getoor. Differentialadaptivediffusion:Understand- ing diversity and learning whom to trust in viral marketing. In ICWSM, 2011. 16. C.WangandD.M.Blei. Collaborativetopicmodelingforrecommendingscientific articles. In KDD, 2011. 17. L.Weng,A.Flammini,A.Vespignani,andF.Menczer.Competitionamongmemes in a world with limited attention. Scientific Reports, 2, 2012. 18. F. Wu and B. A. Huberman. Novelty and collective attention. Proc. the National Academy of Sciences, 104(45):17599–17601, Nov. 2007.