Introduction to Data Science CS 5963 / Math 3900 Alexander Lex Braxton Osting [email protected] [email protected] [xkcd] What is Data Science? The sexiest job of the century —Harvard Buisness Review A data scientist is a statistician who lives in San Fransisco Data Science is statistics on a Mac A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician. https://twitter.com/jeremyjarvis/status/428848527226437632/photo/1 What is Data Science? Source: datascience.berkeley.edu What is Data Science? source: Drew Conway blog What is Data Science? Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms. (Wikipedia) Data Science closes the circle from collecting real-world data, to processing and analyzing it, to influence the real world again. DDS, p.41 Data Science vs. Machine Learning vs. Statistics ?!? -> read 50 years of Data Science by David Donoho What is Data Science? “The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it— that’s going to be a hugely important skill in the next decades, … because now we really do have essentially free and ubiquitous data.” Hal Varian, Google’s Chief Economist The McKinsey Quarterly, Jan 2009 15 Exabytes in Punch Cards: Big Data 4.5 km over New England 2010: 1,200 exabytes, largely unstructured Google stores ~10 exabytes (2013) Hard disk industry ships ~8 exabytes/year 2.5 exabytes (2.5 billion gigabytes) generated every day in 2012 http://onesecond.designly.com/ How can we leverage data? Improve your fitness by targeted training Improve your product by targeting your audience by considering semantics Make better decisions exact diagnosis, choose right medication, pick good restaurant Predict elections, events, crowd behavior, etc. … and many more applications Example: Personal Data
Description: