ptg10701870 From the Library of Martin Spilovsky DevOps Troubleshooting ptg10701870 From the Library of Martin Spilovsky This page intentionally left blank ptg10701870 From the Library of Martin Spilovsky DevOps Troubleshooting Linux® Server Best Practices Kyle Rankin ptg10701870 Upper Saddle River, NJ (cid:129) Boston (cid:129) Indianapolis (cid:129) San Francisco New York (cid:129) Toronto (cid:129) Montreal (cid:129) London (cid:129) Munich (cid:129) Paris (cid:129) Madrid Capetown (cid:129) Sydney (cid:129) Tokyo (cid:129) Singapore (cid:129) Mexico City From the Library of Martin Spilovsky Many of the designations used by manufacturers and sellers to distinguish their Editor-in-Chief products are claimed as trademarks. Where those designations appear in this book, Mark Taub and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. Executive Editor Debra Williams Cauley The author and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for Development Editor errors or omissions. No liability is assumed for incidental or consequential damages Michael Thurston in connection with or arising out of the use of the information or programs Managing Editor contained herein. John Fuller The publisher offers excellent discounts on this book when ordered in quantity Project Editor for bulk purchases or special sales, which may include electronic versions and/or Elizabeth Ryan custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact: Copy Editor Rebecca Rider U.S. Corporate and Government Sales (800) 382-3419 Indexer [email protected] Richard Evans For sales outside the United States, please contact: Proofreader Diane Freed International Sales [email protected] Technical Reviewer Bill Childers Visit us on the Web: informit.com/aw Publishing Coordinator Cataloging-in-Publication Data is on fi le with the Library of Congress. ptg10701870 Kim Boedigheimer Compositor Kim Arney Copyright © 2013 Pearson Education, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. To obtain permission to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, One Lake Street, Upper Saddle River, New Jersey 07458, or you may fax your request to (201) 236-3290. ISBN-13: 978-0-321-83204-7 ISBN-10: 0-321-83204-3 Text printed in the United States on recycled paper at RR Donnelley in Crawfordsville, Indiana. First printing, November 2012 From the Library of Martin Spilovsky This book wouldn’t be possible without the support of my wife, Joy, who once again helped me manage my time so I could complete the book, only this time while carrying our fi rst child, Gideon. I’d also like to dedicate this book to my son, Gideon, who so far is easier to troubleshoot than any server. ptg10701870 From the Library of Martin Spilovsky This page intentionally left blank ptg10701870 From the Library of Martin Spilovsky Contents Preface xiii Acknowledgments xix About the Author xxi CHAPTER 1 Troubleshooting Best Practices 1 Divide the Problem Space 3 Practice Good Communication When Collaborating 4 Conference Calls 4 Direct Conversation 5 Email 6 Real-Time Chat Rooms 7 ptg10701870 Have a Backup Communication Method 8 Favor Quick, Simple Tests over Slow, Complex Tests 8 Favor Past Solutions 9 Document Your Problems and Solutions 10 Know What Changed 12 Understand How Systems Work 13 Use the Internet, but Carefully 14 Resist Rebooting 15 CHAPTER 2 Why Is the Server So Slow? Running Out of CPU, RAM, and Disk I/O 17 System Load 18 What Is a High Load Average? 20 Diagnose Load Problems with top 20 Make Sense of top Output 22 Diagnose High User Time 24 Diagnose Out-of-Memory Issues 25 Diagnose High I/O Wait 27 Troubleshoot High Load after the Fact 29 Confi gure sysstat 30 View CPU Statistics 30 vii From the Library of Martin Spilovsky viii Contents View RAM Statistics 31 View Disk Statistics 32 View Statistics from Previous Days 33 CHAPTER 3 Why Won’t the System Boot? Solving Boot Problems 35 The Linux Boot Process 36 The BIOS 36 GRUB and Linux Boot Loaders 37 The Kernel and Initrd 38 /sbin/init 39 BIOS Boot Order 45 Fix GRUB 47 No GRUB Prompt 47 Stage 1.5 GRUB Prompt 48 Misconfi gured GRUB Prompt 49 Repair GRUB from the Live System 49 Repair GRUB with a Rescue Disk 50 Disable Splash Screens 51 Can’t Mount the Root File System 51 ptg10701870 The Root Kernel Argument 52 The Root Device Changed 52 The Root Partition Is Corrupt or Failed 55 Can’t Mount Secondary File Systems 55 CHAPTER 4 Why Can’t I Write to the Disk? Solving Full or Corrupt Disk Issues 57 When the Disk Is Full 58 Reserved Blocks 59 Track Down the Largest Directories 59 Out of Inodes 61 The File System Is Read-Only 62 Repair Corrupted File Systems 63 Repair Software RAID 64 CHAPTER 5 Is the Server Down? Tracking Down the Source of Network Problems 67 Server A Can’t Talk to Server B 68 Client or Server Problem 69 Is It Plugged In? 69 From the Library of Martin Spilovsky Contents ix Is the Interface Up? 70 Is It on the Local Network? 71 Is DNS Working? 72 Can I Route to the Remote Host? 74 Is the Remote Port Open? 76 Test the Remote Host Locally 76 Troubleshoot Slow Networks 78 DNS Issues 79 Find the Network Slowdown with traceroute 80 Find What Is Using Your Bandwidth with iftop 81 Packet Captures 83 Use the tcpdump Tool 84 Use Wireshark 88 CHAPTER 6 Why Won’t the Hostnames Resolve? Solving DNS Server Issues 93 DNS Client Troubleshooting 95 No Name Server Confi gured or Inaccessible Name Server 95 Missing Search Path or Name Server Problem 97 ptg10701870 DNS Server Troubleshooting 98 Understanding dig Output 98 Trace a DNS Query 101 Recursive Name Server Problems 104 When Updates Don’t Take 107 CHAPTER 7 Why Didn’t My Email Go Through? Tracing Email Problems 119 Trace an Email Request 120 Understand Email Headers 123 Problems Sending Email 125 Client Can’t Communicate with the Outbound Mail Server 126 Outbound Mail Server Won’t Allow Relay 130 Outbound Mail Server Can’t Communicate with the Destination 131 Problems Receiving Email 135 Telnet Test Can’t Connect 136 Telnet Can Connect, but the Message Is Rejected 137 Pore Through the Mail Logs 138 From the Library of Martin Spilovsky