ebook img

Machine Learning and Security: Protecting Systems with Data and Algorithms PDF

385 Pages·2018·4.22 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Machine Learning and Security: Protecting Systems with Data and Algorithms

Machine Learning & S ecurity PROTECTING SYSTEMS WITH DATA AND ALGORITHMS Clarence Chio & David Freeman Praise for Machine Learning and Security The future of security and safety online is going to be defined by the ability of defenders to deploy machine learning to find and stop malicious activity at Internet scale and speed. Chio and Freeman have written the definitive book on this topic, capturing the latest in academic thinking as well as hard-learned lessons deploying ML to keep people safe in the field. —Alex Stamos, Chief Security Oicer, Facebook An excellent practical guide for anyone looking to learn how machine learning techniques are used to secure computer systems, from detecting anomalies to protecting end users. —Dan Boneh, Professor of Computer Science, Stanford University If you’ve ever wondered what machine learning in security looks like, this book gives you an HD silhouette. —Nwokedi C. Idika, PhD, Sotware Engineer, Google, Security & Privacy Organization Machine Learning and Security Protecting Systems with Data and Algorithms Clarence Chio and David Freeman BBeeiijjiinngg BBoossttoonn FFaarrnnhhaamm SSeebbaassttooppooll TTookkyyoo Machine Learning and Security by Clarence Chio and David Freeman Copyright © 2018 Clarence Chio and David Freeman. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐ tutional sales department: 800-998-9938 or [email protected]. Editor: Courtney Allen Interior Designer: David Futato Production Editor: Kristen Brown Cover Designer: Karen Montgomery Copyeditor: Octal Publishing, Inc. Illustrator: Rebecca Demarest Proofreader: Rachel Head Tech Reviewers: Joshua Saxe, Hyrum Anderson, Indexer: WordCo Indexing Services, Inc. Jess Males, and Alex Pinto February 2018: First Edition Revision History for the First Edition 2018-01-26: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491979907 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Machine Learning and Security, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-491-97990-7 [LSI] Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1. Why Machine Learning and Security?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Cyber Threat Landscape 3 The Cyber Attacker’s Economy 7 A Marketplace for Hacking Skills 7 Indirect Monetization 8 The Upshot 8 What Is Machine Learning? 9 What Machine Learning Is Not 10 Adversaries Using Machine Learning 11 Real-World Uses of Machine Learning in Security 12 Spam Fighting: An Iterative Approach 14 Limitations of Machine Learning in Security 23 2. Classifying and Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Machine Learning: Problems and Approaches 25 Machine Learning in Practice: A Worked Example 27 Training Algorithms to Learn 32 Model Families 33 Loss Functions 35 Optimization 36 Supervised Classification Algorithms 40 Logistic Regression 40 Decision Trees 42 Decision Forests 45 Support Vector Machines 47 Naive Bayes 49 v k-Nearest Neighbors 52 Neural Networks 53 Practical Considerations in Classification 55 Selecting a Model Family 55 Training Data Construction 56 Feature Selection 59 Overfitting and Underfitting 61 Choosing Thresholds and Comparing Models 62 Clustering 65 Clustering Algorithms 65 Evaluating Clustering Results 75 Conclusion 77 3. Anomaly Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 When to Use Anomaly Detection Versus Supervised Learning 80 Intrusion Detection with Heuristics 81 Data-Driven Methods 82 Feature Engineering for Anomaly Detection 85 Host Intrusion Detection 85 Network Intrusion Detection 89 Web Application Intrusion Detection 92 In Summary 93 Anomaly Detection with Data and Algorithms 93 Forecasting (Supervised Machine Learning) 95 Statistical Metrics 106 Goodness-of-Fit 107 Unsupervised Machine Learning Algorithms 112 Density-Based Methods 116 In Summary 118 Challenges of Using Machine Learning in Anomaly Detection 119 Response and Mitigation 120 Practical System Design Concerns 121 Optimizing for Explainability 121 Maintainability of Anomaly Detection Systems 123 Integrating Human Feedback 123 Mitigating Adversarial Effects 123 Conclusion 124 4. Malware Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Understanding Malware 126 Defining Malware Classification 128 Malware: Behind the Scenes 131 vi | Table of Contents Feature Generation 145 Data Collection 146 Generating Features 147 Feature Selection 171 From Features to Classification 174 How to Get Malware Samples and Labels 178 Conclusion 179 5. Network Traic Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Theory of Network Defense 183 Access Control and Authentication 183 Intrusion Detection 184 Detecting In-Network Attackers 185 Data-Centric Security 185 Honeypots 186 Summary 186 Machine Learning and Network Security 187 From Captures to Features 187 Threats in the Network 193 Botnets and You 197 Building a Predictive Model to Classify Network Attacks 203 Exploring the Data 205 Data Preparation 210 Classification 214 Supervised Learning 216 Semi-Supervised Learning 222 Unsupervised Learning 223 Advanced Ensembling 228 Conclusion 233 6. Protecting the Consumer Web. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Monetizing the Consumer Web 236 Types of Abuse and the Data That Can Stop Them 237 Authentication and Account Takeover 237 Account Creation 243 Financial Fraud 248 Bot Activity 251 Supervised Learning for Abuse Problems 256 Labeling Data 256 Cold Start Versus Warm Start 258 False Positives and False Negatives 258 Multiple Responses 259 Table of Contents | vii Large Attacks 259 Clustering Abuse 260 Example: Clustering Spam Domains 261 Generating Clusters 262 Scoring Clusters 266 Further Directions in Clustering 271 Conclusion 272 7. Production Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Defining Machine Learning System Maturity and Scalability 275 What’s Important for Security Machine Learning Systems? 277 Data Quality 277 Problem: Bias in Datasets 277 Problem: Label Inaccuracy 279 Solutions: Data Quality 279 Problem: Missing Data 280 Solutions: Missing Data 281 Model Quality 284 Problem: Hyperparameter Optimization 285 Solutions: Hyperparameter Optimization 285 Feature: Feedback Loops, A/B Testing of Models 289 Feature: Repeatable and Explainable Results 293 Performance 297 Goal: Low Latency, High Scalability 297 Performance Optimization 298 Horizontal Scaling with Distributed Computing Frameworks 300 Using Cloud Services 305 Maintainability 307 Problem: Checkpointing, Versioning, and Deploying Models 307 Goal: Graceful Degradation 308 Goal: Easily Tunable and Configurable 309 Monitoring and Alerting 310 Security and Reliability 311 Feature: Robustness in Adversarial Contexts 312 Feature: Data Privacy Safeguards and Guarantees 312 Feedback and Usability 313 Conclusion 314 8. Adversarial Machine Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Terminology 316 The Importance of Adversarial ML 317 Security Vulnerabilities in Machine Learning Algorithms 318 viii | Table of Contents

Description:
Can machine learning techniques solve our computer security problems and finally put an end to the cat-and-mouse game between attackers and defenders? Or is this hope merely hype? Now you can dive into the science and answer this question for yourself. With this practical guide, you'll explore ways
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.