ebook img

Web and Network Data Science: Modeling Techniques in Predictive Analytics PDF

725 Pages·2014·28.93 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Web and Network Data Science: Modeling Techniques in Predictive Analytics

About This eBook ePUB is an open, industry-standard format for eBooks. However, support of ePUB and its many features varies across reading devices and applications. Use your device or app settings to customize the presentation to your liking. Settings that you can customize often include font, font size, single or double column, landscape or portrait mode, and figures that you can click or tap to enlarge. For additional information about the settings and features on your reading device or app, visit the device manufacturer’s Web site. Many titles include programming code or configuration examples. To optimize the presentation of these elements, view the eBook in single-column, landscape mode and adjust the font size to the smallest setting. In addition to presenting code and configurations in the reflowable text format, we have included images of the code that mimic the presentation found in the print book; therefore, where the reflowable format may compromise the presentation of the code listing, you will see a “Click here to view code image” link. Click the link to view the print-fidelity code image. To return to the previous page viewed, click the Back button on your device or app. Web and Network Data Science Modeling Techniques in Predictive Analytics THOMAS W. MILLER Editor-in-Chief: Amy Neidlinger Executive Editor: Jeanne Glasser Operations Specialist: Jodi Kemper Cover Designer: Alan Clements Managing Editor: Kristy Hart Project Editor: Andy Beaster Senior Compositor: Gloria Schurick Manufacturing Buyer: Dan Uhrig ©2015 by Thomas W. Miller Published by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 Pearson offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales. For more information, please contact U.S. Corporate and Government Sales, 1-800-382-3419, [email protected]. For sales outside the U.S., please contact International Sales at [email protected]. Company and product names mentioned herein are the trademarks or registered trademarks of their respective owners. All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in writing from the publisher. Printed in the United States of America First Printing December 2014 ISBN-10: 0-13-388644-1 ISBN-13: 978-0-13-388644-3 Pearson Education LTD. Pearson Education Australia PTY, Limited. Pearson Education Singapore, Pte. Ltd. Pearson Education Asia, Ltd. Pearson Education Canada, Ltd. Pearson Educacin de Mexico, S.A. de C.V. Pearson Education—Japan Pearson Education Malaysia, Pte. Ltd. Library of Congress Control Number: 2014956958 Contents Preface Figures Tables Exhibits 1 Being Technically Inclined 2 Delivering a Message Online 3 Crawling and Scraping the Web 4 Testing Links, Look, and Feel 5 Watching Competitors 6 Visualizing Networks 7 Understanding Communities 8 Measuring Sentiment 9 Discovering Common Themes 10 Making Recommendations 11 Playing Network Games 12 What’s Next for the Web? A Data Science Methods A.1 Databases and Data Preparation A.2 Classical and Bayesian Statistics A.3 Regression and Classification A.4 Machine Learning A.5 Data Visualization A.6 Text Analytics B Primary Research Online C Case Studies C.1 E-Mail or Spam? C.2 ToutBay Begins C.3 Keyword Games: Dodgers and Angels C.4 Enron E-Mail Corpus and Network C.5 Wikipedia Votes C.6 Quake Talk C.7 POTUS Speeches C.8 Anonymous Microsoft Web Data D Code and Utilities E Glossary Bibliography Index Preface “Scotty, beam me up.” —WILLIAM SHATNER AS CAPTAIN KIRK IN Star Trek IV: The Voyage Home (1986) The web is a network of linked pages. The web is a communication medium. The web is the locus of the world’s information. We spend much of our time searching the web, extracting relevant data, and analyzing those data. Our lives are easier when we can work efficiently on the web. This book shows how. The book emerged from a course I teach at Northwestern University. The course started as an introduction to website analytics, looking at usage statistics and performance in search. Then I added concepts from network science and social media. After teaching the course for two years, I realized that gathering information from the web provided a unifying theme. There is much to learn about web and network data science. This book, like the course, provides a guide. Web and network data science is data science and network science combined, focusing on the web as an information resource. And the best way to learn about it is to work through examples. We include many examples in this book. We help researchers and analysts by providing a ready resource and reference guide for modeling techniques. We show programmers how to build on a foundation of code that works to solve real business problems. The truth about what we do is in the programs we write. It is there for everyone to see and for some to debug. To promote student learning, each program includes step-by-step comments and suggestions for taking the analysis further. Data sets and computer programs are available from the book’s website at http://www.ftpress.com/miller/. Python gets its name from Monty Python. We see packages with devious names such as Twisted and Scrapy. R has its lubridate and zoo. Good things come from people who work and have fun at the same time. It is fun rather than profit or fame that motivates contributors to open source, and I am happy to be part of the Python and R communities. Let the fun begin. When working on web and network problems, some things are more easily accomplished with Python, others with R. And there are times when it is good to offer solutions in both languages, checking one against the other. Together, Python and R are good at gathering web and network data and analyzing those data. There is a long list of programming tools we mention only in passing. Web masters, charged with the task of making things happen on the web, rely on additional languages and technologies, including JavaScript, Apache and .Net web services, and database systems. We discuss these technologies but do not provide programming code. Most of the data in the book were obtained from public domain data sources. Supporting data for the cases come from the University of California–Irvine Machine Learning Repository and the Stanford Large Network Dataset Collection. Movie information was obtained courtesy of The Internet Movie Database, used with permission. IMDb movie reviews data were organized by Andrew L. Mass and his colleagues at Stanford University. William W. Cohen of Carnegie Mellon University maintains the data for the Enron case. Maksim Tsvetovat maintains the data for the Quake Talk case. We are most thankful to these scholars for providing access to rich data sets for research. Many have influenced my intellectual development over the years. There were those good thinkers and good people, teachers and mentors for whom I will be forever grateful. Sadly, no longer with us are Gerald Hahn Hinkle in philosophy and Allan Lake Rice in languages at Ursinus College, and Herbert Feigl in philosophy at the University of Minnesota. I am also most thankful to David J. Weiss in psychometrics at the University of Minnesota and Kelly Eakin in economics, formerly at the University of Oregon. Good teachers—yes, great teachers—are valued for a lifetime. Thanks to Stan Narusiewcz who gave me my first job in business as a network engineer and to Tom Obinger who showed me how to be successful in selling computer systems as well as networks. Along with Bill JoBush and Brian Hill, they served as able managers and colleagues across various parts of my career as an information systems professional. Thanks to Michael L. Rothschild, Neal M. Ford, Peter R. Dickson, and Janet Christopher who provided invaluable support during our years together at the University of Wisconsin–Madison. I am most grateful to the students and executive advisory board members of the A. C. Nielsen Center for Marketing Research and to Jeff Walkowski and Neli Esipova who worked with me in exploring online surveys and focus groups when those methods were just starting to be used

Description:
Master modern web and network data modeling: both theory and applications. In Web and Network Data Science, a top faculty member of Northwestern University’s prestigious analytics program presents the first fully-integrated treatment of both the business and academic elements of web and network mo
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.