Studies in Big Data 3 Katsutoshi Yada E ditor Data Mining for Service Studies in Big Data Volume 3 Series editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland e-mail: [email protected] For furthervolumes: http://www.springer.com/series/11970 About this Series The series ‘‘Studies in Big Data’’ (SBD) publishes new developments and advances in the various areas of Big Data- quickly and with a high quality. The intentistocoverthetheory,research,development,andapplicationsofBigData, as embedded in the fields of engineering, computer science, physics, economics andlifesciences.Thebooksoftheseriesrefertotheanalysisandunderstandingof large, complex, and/or distributed data sets generated from recent digital sources coming from sensors or other physical instruments as well as simulations, crowd sourcing, social networks or other internet transactions, such as emails or video click streams andother.Theseries containsmonographs,lecturenotesand edited volumes inBigData spanning the areas ofcomputationalintelligenceincl.neural networks, evolutionary computation, soft computing, fuzzy systems, as well as artificial intelligence, data mining, modern statistics and Operations research, as well as self-organizing systems. Of particular value to both the contributors and thereadershiparetheshortpublicationtimeframeandtheworld-widedistribution, which enable both wide and rapid dissemination of research output. Katsutoshi Yada Editor Data Mining for Service 123 Editor Katsutoshi Yada Faculty ofCommerce KansaiUniversity Osaka Japan ISSN 2197-6503 ISSN 2197-6511 (electronic) ISBN 978-3-642-45251-2 ISBN 978-3-642-45252-9 (eBook) DOI 10.1007/978-3-642-45252-9 SpringerHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2013957989 (cid:2)Springer-VerlagBerlinHeidelberg2014 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionor informationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purposeofbeingenteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthe work. Duplication of this publication or parts thereof is permitted only under the provisions of theCopyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the CopyrightClearanceCenter.ViolationsareliabletoprosecutionundertherespectiveCopyrightLaw. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface In the globalized economy, the service sector is expanding rapidly and becoming more and more important. Many researchers have conducted research on services fromvariouspointsofviewandofferedinsightstobusinessowners.Recognizedas one of the most important challenges in research on services among practitioners and researchers is how to improve service productivity and efficiently ensure customer satisfaction with limited natural and human resources. Due to their nature,servicesusedtobedifficulttoresearchusingascientificapproach,butthe innovation of digital devices has led to the accumulation of a variety of data, which is gradually enabling researchers to analyze services scientifically. Data mining is one of the most important steps to scientific analysis of service processes. It is a series of processes which include collecting and accumulating data,modelingphenomena,anddiscoveringnewinformation.Numeroustechnical papersandstudiesondatamininghavebeenpublishedincomputerscience.Using calculationspeedandpredictionaccuracyastheevaluationcriteria,manyofthese studies have contributed to the efficient processing of a large amount of data. However,whenitcomestoapplyingdataminingtoanalyzingservices,calculation speed and prediction accuracy do not suffice; instead, algorithms and techniques that are appropriate for a particular service must be adopted or developed. Therefore, expertise in the service domain is crucial in applying data mining in services. This book reveals how data mining can be applied to the service sector within a variety of service-related examples. Understanding the compatible rela- tion between the expertise in services and data mining techniques will provide insights on the extended use of data mining in other service domains. I would like to thank everyone who has supported me in the publishing of this book. I would like to address my special thanks tothe authors of all the chapters, who offered new ideas and valuable perspectives; staff members at Springer for their continued guidance in the editing process, and the secretaries at Kansai University Data Mining Laboratory. This work was supported by the program for the Strategic Research Foundation at Private Universities from Ministry of v vi Preface Education,Culture,Sports,ScienceandTechnology(MEXT),2009–2013.Finally, I hope this book will stimulate interest in the relation between data mining tech- nology and its application to other fields and provide important insights for many researchers and practitioners involved in the service sector. Osaka, October 2013 Katsutoshi Yada Contents Part I Fundamental Technologies Supporting Service Science Data Mining for Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Katsutoshi Yada Feature Selection Over Distributed Data Streams. . . . . . . . . . . . . . . . 11 Jacob Kogan Learning Hidden Markov Models Using Probabilistic Matrix Factorization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Ashutosh Tewari and Michael J. Giering Dimensionality Reduction for Information Retrieval Using Vector Replacement of Rare Terms . . . . . . . . . . . . . . . . . . . . . 41 Tobias Berka and Marian Vajteršic Panel Data Analysis via Variable Selection and Subject Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Haibing Lu, Shengsheng Huang, Yingjiu Li and Yanjiang Yang Part II Knowledge Discovery from Text A Weighted Density-Based Approach for Identifying Standardized Items that are Significantly Related to the Biological Literature . . . . . 79 Omar Al-Azzam, Jianfei Wu, Loai Al-Nimer, Charith Chitraranjan and Anne M. Denton Nonnegative Tensor Factorization of Biomedical Literature for Analysis of Genomic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Sujoy Roy, Ramin Homayouni, Michael W. Berry and Andrey A. Puretskiy vii viii Contents Text Mining of Business-Oriented Conversations at a Call Center . . . 111 Hironori Takeuchi and Takahira Yamaguchi Part III Approach for New Services in Social Media Scam Detection in Twitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Xiaoling Chen, Rajarathnam Chandramouli and Koduvayur P. Subbalakshmi A Matrix Factorization Framework for Jointly Analyzing Multiple Nonnegative Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Sunil Kumar Gupta, Dinh Phung, Brett Adams and Svetha Venkatesh Recommendation Systems for Web 2.0 Marketing . . . . . . . . . . . . . . . 171 Chen Wei, Richard Khoury and Simon Fong Part IV Data Mining Spreading into Various Service Fields Handling Imbalanced and Overlapping Classes in Smart Environments Prompting Dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Barnan Das, Narayanan C. Krishnan and Diane J. Cook Change Detection from Heterogeneous Data Sources . . . . . . . . . . . . . 221 Tsuyoshi Idé Interesting Subset Discovery and Its Application on Service Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 Maitreya Natu and Girish Keshav Palshikar Text Document Cluster Analysis Through Visualization of 3D Projections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Masaki Aono and Mei Kobayashi Part I Fundamental Technologies Supporting Service Science