ebook img

Applied Univariate, Bivariate, and Multivariate Statistics Using Python: A Beginner's Guide to Advanced Data Analysis PDF

300 Pages·2021·21.66 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Applied Univariate, Bivariate, and Multivariate Statistics Using Python: A Beginner's Guide to Advanced Data Analysis

Applied Univariate, Bivariate, and Multivariate Statistics Using Python ffirs.indd 1 08-04-2021 17:20:11 ffirs.indd 2 08-04-2021 17:20:11 Applied Univariate, Bivariate, and Multivariate Statistics Using Python A Beginner’s Guide to Advanced Data Analysis Daniel J. Denis ffirs.indd 3 08-04-2021 17:20:11 This edition first published 2021 © 2021 John Wiley & Sons, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions. The right of Daniel J. Denis to be identified as the author of this work has been asserted in accordance with law. Registered Office John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA Editorial Office 111 River Street, Hoboken, NJ 07030, USA For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com. Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats. Limit of Liability/Disclaimer of Warranty The contents of this work are intended to further general scientific research, understanding, and discussion only and are not intended and should not be relied upon as recommending or promoting scientific method, diagnosis, or treatment by physicians for any particular patient. In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of medicines, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each medicine, equipment, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Library of Congress Cataloging-in-Publication Data Names: Denis, Daniel J., 1974- author. | John Wiley & Sons, Inc., publisher. Title: Applied univariate, bivariate, and multivariate statistics using Python Subtitle: A beginner’s guide to advanced data analysis / Daniel J. Denis, University of Montana, Missoula, MT. Description: Hoboken, NJ : John Wiley & Sons, Inc., 2021. | Includes bibliographical references and index. Identifiers: LCCN 2020050202 (print) | LCCN 2020050203 (ebook) | ISBN 9781119578147 (hardback) | ISBN 9781119578178 (pdf) | ISBN 9781119578185 (epub) | ISBN 9781119578208 (ebook) Subjects: LCSH: Statistics--Software. | Multivariate analysis. | Python (Computer program language). Classification: LCC QA276.45.P98 D46 2021 (print) | LCC QA276.45.P98 (ebook) | DDC 519.5/302855133--dc23 LC record available at https://lccn.loc.gov/2020050202 LC ebook record available at https://lccn.loc.gov/2020050203 Cover image: © MR.Cole_Photographer/Getty Images Cover design by Wiley Set in 9.5/12.5 STIXTwoText by Integra Software Services, Pondicherry, India 10 9 8 7 6 5 4 3 2 1 ffirs.indd 4 08-04-2021 17:20:11 To Kaiser ffirs.indd 5 08-04-2021 17:20:11 ffirs.indd 6 08-04-2021 17:20:11 vii Contents Preface xii 1 A Brief Introduction and Overview of Applied Statistics 1 1.1 How Statistical Inference Works 4 1.2 Statistics and Decision-Making 7 1.3 Quantifying Error Rates in Decision-Making: Type I and Type II Errors 8 1.4 Estimation of Parameters 9 1.5 Essential Philosophical Principles for Applied Statistics 11 1.6 Continuous vs. Discrete Variables 13 1.6.1 Continuity Is Not Always Clear-Cut 15 1.7 Using Abstract Systems to Describe Physical Phenomena: Understanding Numerical vs. Physical Differences 16 1.8 Data Analysis, Data Science, Machine Learning, Big Data 18 1.9 “Training” and “Testing” Models: What “Statistical Learning” Means in the Age of Machine Learning and Data Science 20 1.10 Where We Are Going From Here: How to Use This Book 22 Review Exercises 23 2 Introduction to Python and the Field of Computational Statistics 25 2.1 The Importance of Specializing in Statistics and Research, Not Python: Advice for Prioritizing Your Hierarchy 26 2.2 How to Obtain Python 28 2.3 Python Packages 29 2.4 Installing a New Package in Python 31 2.5 Computing z-Scores in Python 32 2.6 Building a Dataframe in Python: And Computing Some Statistical Functions 35 2.7 Importing a .txt or .csv File 38 2.8 Loading Data into Python 39 2.9 Creating Random Data in Python 40 2.10 Exploring Mathematics in Python 40 2.11 Linear and Matrix Algebra in Python: Mechanics of Statistical Analyses 41 2.11.1 Operations on Matrices 44 2.11.2 Eigenvalues and Eigenvectors 47 Review Exercises 48 ftoc.indd 7 07-04-2021 10:32:00 viii Contents 3 Visualization in Python: Introduction to Graphs and Plots 50 3.1 Aim for Simplicity and Clarity in Tables and Graphs: Complexity is for Fools! 52 3.2 State Population Change Data 54 3.3 What Do the Numbers Tell Us? Clues to Substantive Theory 56 3.4 The Scatterplot 58 3.5 Correlograms 59 3.6 Histograms and Bar Graphs 61 3.7 Plotting Side-by-Side Histograms 62 3.8 Bubble Plots 63 3.9 Pie Plots 65 3.10 Heatmaps 66 3.11 Line Charts 68 3.12 Closing Thoughts 69 Review Exercises 70 4 Simple Statistical Techniques for Univariate and Bivariate Analyses 72 4.1 Pearson Product-Moment Correlation 73 4.2 A Pearson Correlation Does Not (Necessarily) Imply Zero Relationship 75 4.3 Spearman’s Rho 76 4.4 More General Comments on Correlation: Don’t Let a Correlation Impress You Too Much! 79 4.5 Computing Correlation in Python 80 4.6 T-Tests for Comparing Means 84 4.7 Paired-Samples t-Test in Python 88 4.8 Binomial Test 90 4.9 The Chi-Squared Distribution and Goodness-of-Fit Test 91 4.10 Contingency Tables 93 Review Exercises 94 5 Power, Effect Size, P-Values, and Estimating Required Sample Size Using Python 96 5.1 What Determines the Size of a P-Value? 96 5.2 How P-Values Are a Function of Sample Size 99 5.3 What is Effect Size? 100 5.4 Understanding Population Variability in the Context of Experimental Design 102 5.5 Where Does Power Fit into All of This? 103 5.6 Can You Have Too Much Power? Can a Sample Be Too Large? 104 5.7 Demonstrating Power Principles in Python: Estimating Power or Sample Size 106 5.8 Demonstrating the Influence of Effect Size 108 5.9 The Influence of Significance Levels on Statistical Power 108 5.10 What About Power and Hypothesis Testing in the Age of “Big Data”? 110 5.11 Concluding Comments on Power, Effect Size, and Significance Testing 111 Review Exercises 112 ftoc.indd 8 07-04-2021 10:32:00 Contents ix 6 Analysis of Variance 113 6.1 T-Tests for Means as a “Special Case” of ANOVA 114 6.2 Why Not Do Several t-Tests? 116 6.3 Understanding ANOVA Through an Example 117 6.4 Evaluating Assumptions in ANOVA 121 6.5 ANOVA in Python 124 6.6 Effect Size for Teacher 125 6.7 Post-Hoc Tests Following the ANOVA F-Test 125 6.8 A Myriad of Post-Hoc Tests 127 6.9 Factorial ANOVA 129 6.10 Statistical Interactions 131 6.11 Interactions in the Sample Are a Virtual Guarantee: Interactions in the Population Are Not 133 6.12 Modeling the Interaction Term 133 6.13 Plotting Residuals 134 6.14 Randomized Block Designs and Repeated Measures 135 6.15 Nonparametric Alternatives 138 6.15.1 Revisiting What “Satisfying Assumptions” Means: A Brief Discussion and Suggestion of How to Approach the Decision Regarding Nonparametrics 140 6.15.2 Your Experience in the Area Counts 140 6.15.3 What If Assumptions Are Truly Violated? 141 6.15.4 Mann-Whitney U Test 144 6.15.5 Kruskal-Wallis Test as a Nonparametric Alternative to ANOVA 145 Review Exercises 147 7 Simple and Multiple Linear Regression 148 7.1 Why Use Regression? 150 7.2 The Least-Squares Principle 152 7.3 Regression as a “New” Least-Squares Line 153 7.4 The Population Least-Squares Regression Line 154 7.5 How to Estimate Parameters in Regression 155 7.6 How to Assess Goodness of Fit? 157 7.7 R2 – Coefficient of Determination 158 7.8 Adjusted R2 159 7.9 Regression in Python 161 7.10 Multiple Linear Regression 164 7.11 Defining the Multiple Regression Model 164 7.12 Model Specification Error 166 7.13 Multiple Regression in Python 167 7.14 Model-Building Strategies: Forward, Backward, Stepwise 168 7.15 Computer-Intensive “Algorithmic” Approaches 171 7.16 Which Approach Should You Adopt? 171 7.17 Concluding Remarks and Further Directions: Polynomial Regression 172 Review Exercises 174 ftoc.indd 9 07-04-2021 10:32:00

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.