ebook img

Data Science Handbook PDF

472 Pages·2022·151.501 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Science Handbook

Data Science Handbook Scrivener Publishing 100 Cummings Center, Suite 541J Beverly, MA 01915-6106 Next-Generation Computing and Communication Engineering Series Editors: Dr. G. R. Kanagachidambaresan and Dr. Kolla Bhanu Prakash Developments in articial intelligence are made more challenging because the involvement of multi-domain technology creates new problems for researchers. erefore, in order to help meet the challenge, this book series concentrates on next generation computing and communication methodologies involving smart and ambient environment design. It is an eective publishing plat- form for monographs, handbooks, and edited volumes on Industry 4.0, agriculture, smart city development, new computing and communication paradigms. Although the series mainly focuses on design, it also addresses analytics and investigation of industry-related real-time problems. Publishers at Scrivener Martin Scrivener ([email protected]) Phillip Carmical ([email protected]) Data Science Handbook A Practical Approach Kolla Bhanu Prakash K. L. University, Vaddeswaram, Andhra Pradesh, India This edition first published 2022 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA © 2022 Scrivener Publishing LLC For more information about Scrivener publications please visit www.scrivenerpublishing.com. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or other- wise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions. Wiley Global Headquarters 111 River Street, Hoboken, NJ 07030, USA For details of our global editorial offices, customer services, and more information about Wiley prod- ucts visit us at www.wiley.com. Limit of Liability/Disclaimer of Warranty While the publisher and authors have used their best efforts in preparing this work, they make no rep- resentations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchant- ability or fitness for a particular purpose. No warranty may be created or extended by sales representa- tives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further informa- tion does not mean that the publisher and authors endorse the information or services the organiza- tion, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Library of Congress Cataloging-in-Publication Data ISBN 978-1-119-85733-4 Cover image: Pixabay.Com Cover design by Russell Richardson Set in size of 11pt and Minion Pro by Manila Typesetting Company, Makati, Philippines Printed in the USA 10 9 8 7 6 5 4 3 2 1 Dedication Dedicated to My Parents Sri. Kolla Narayana Rao Smt. Kolla Uma Maheswari And my Wife Mrs. M. V. Prasanna Lakshmi v Contents Acknowledgment xi Preface xiii 1 Data Munging Basics 1 Introduction 1 1.1 Filtering and Selecting Data 6 1.2 Treating Missing Values 11 1.3 Removing Duplicates 14 1.4 Concatenating and Transforming Data 16 1.5 Grouping and Data Aggregation 20 References 20 2 Data Visualization 23 2.1 Creating Standard Plots (Line, Bar, Pie) 26 2.2 Defining Elements of a Plot 30 2.3 Plot Formatting 33 2.4 Creating Labels and Annotations 38 2.5 Creating Visualizations from Time Series Data 42 2.6 Constructing Histograms, Box Plots, and Scatter Plots 44 References 54 3 Basic Math and Statistics 57 3.1 Linear Algebra 57 3.2 Calculus 58 3.2.1 Differential Calculus 58 3.2.2 Integral Calculus 58 3.3 Inferential Statistics 60 3.3.1 Central Limit Theorem 60 3.3.2 Hypothesis Testing 60 3.3.3 ANOVA 60 3.3.4 Qualitative Data Analysis 60 vii viii Contents 3.4 Using NumPy to Perform Arithmetic Operations on Data 61 3.5 Generating Summary Statistics Using Pandas and Scipy 64 3.6 Summarizing Categorical Data Using Pandas 68 3.7 Starting with Parametric Methods in Pandas and Scipy 84 3.8 Delving Into Non-Parametric Methods Using Pandas and Scipy 87 3.9 Transforming Dataset Distributions 91 References 94 4 Introduction to Machine Learning 97 4.1 Introduction to Machine Learning 97 4.2 Types of Machine Learning Algorithms 101 4.3 Explanatory Factor Analysis 114 4.4 Principal Component Analysis (PCA) 115 References 121 5 Outlier Analysis 123 5.1 Extreme Value Analysis Using Univariate Methods 123 5.2 Multivariate Analysis for Outlier Detection 125 5.3 DBSCan Clustering to Identify Outliers 127 References 133 6 Cluster Analysis 135 6.1 K-Means Algorithm 135 6.2 Hierarchial Methods 141 6.3 Instance-Based Learning w/ k-Nearest Neighbor 149 References 156 7 Network Analysis with NetworkX 157 7.1 Working with Graph Objects 159 7.2 Simulating a Social Network (ie; Directed Network Analysis) 163 7.3 Analyzing a Social Network 169 References 171 8 Basic Algorithmic Learning 173 8.1 Linear Regression 173 8.2 Logistic Regression 183 8.3 Naive Bayes Classifiers 189 References 195

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.