OpenStack Operations Guide SET UP AND MANAGE YOUR OPENSTACK CLOUD Tom Fifield, Diane Fleming, Anne Gentle, Lorin Hochstein, Jonathan Proulx, Everett Toews & Joe Topjian Join the global community! OVER 70 GLOBAL USER GROUPS Get Involved and get more out of OpenStack! Take the User Survey and influence the OpenStack Roadmap Find a local User Group near you and attend a meet up Attend a Training Course OpenStack Operations Guide by Tom Fifield, Diane Fleming, Anne Gentle, Lorin Hochstein, Jonathan Proulx, Everett Toews, and Joe Topjian OpenStack Operations Guide by Tom Fifield, Diane Fleming, Anne Gentle, Lorin Hochstein, Jonathan Proulx, Everett Toews, and Joe Topjian Copyright © 2014 OpenStack Foundation. All rights reserved. Printed in the United States of America. Published by O'Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O'Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corpo‐ rate/institutional sales department: 800-998-9938 or [email protected]. Editors: Andy Oram and Brian Anderson Interior Designer: David Futato Cover Designer: Karen Montgomery March 2014: First Edition See http://oreilly.com/catalog/errata.csp?isbn=9781491946954 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly Media, Inc. OpenStack Operations Guide, the image of a Crested Agouti, and related trade dress are trademarks of O'Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O'Reilly Media, Inc., was aware of a trade‐ mark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information con‐ tained herein. 978-1-491-94695-4 [LSI] Table of Contents Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv 1. Provisioning and Deployment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Automated Deployment 21 Disk Partitioning and RAID 22 Network Configuration 23 Automated Configuration 23 Remote Management 24 2. Cloud Controller Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Hardware Considerations 25 Separation of Services 27 Database 27 Message Queue 27 Application Programming Interface (API) 28 Extensions 28 Scheduler 29 Images 29 Dashboard 30 Authentication and Authorization 30 Network Considerations 30 3. Scaling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 The Starting Point 33 Adding Controller Nodes 35 Segregating Your Cloud 35 iii Cells and Regions 36 Availability Zones and Host Aggregates 37 Scalable Hardware 38 Hardware Procurement 38 Capacity Planning 39 Burn-in Testing 39 4. Compute Nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 CPU Choice 41 Hypervisor Choice 41 Instance Storage Solutions 42 Off Compute Node Storage – Shared File System 42 On Compute Node Storage – Shared File System 43 On Compute Node Storage – Non-shared File System 44 Issues with Live Migration 44 Choice of File System 44 Overcommitting 45 Logging 45 Networking 45 5. Storage Decisions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 OpenStack Storage Concepts 47 Object Storage 48 Block Storage 48 File-level Storage 49 Choosing Storage Back-ends 49 Commodity Storage Back-end Technologies 51 Notes on OpenStack Object Storage 53 6. Network Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Management Network 55 Public Addressing Options 55 IP Address Planning 56 Network Topology 57 VLANs 58 Multi-NIC 59 Multi-host and Single-host Networking 59 Services for Networking 59 NTP 59 DNS 60 7. Example Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 iv | Table of Contents Overview 61 Rationale 62 Why Not Use the OpenStack Network Service (quantum)? 64 Why Use Multi-host Networking? 64 Detailed Description 64 Optional Extensions 66 8. Lay of the Land. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Client Command Line Tools 69 Installing the Tools 70 Administrative Command Line Tools 70 Getting Credentials 71 Command Line Tricks and Traps 72 Servers and Services 74 Diagnose your compute nodes 76 Network 77 Users and Projects 77 Running Instances 78 9. Managing Projects and Users. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Projects or Tenants? 81 Managing Projects 81 Adding Projects 81 Quotas 82 Set Compute Service Quotas 83 Set Block Storage quotas 85 User Management 87 Creating New Users 87 Associating Users with Projects 88 Customizing Authorization 89 Users that Disrupt Other Users 91 10. User-facing Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Images 93 Adding Images 93 Deleting Images 94 Other CLI Options 94 The Image Service and the Database 94 Example Image Service Database Queries 95 Flavors 95 How do I modify an existing flavor? 96 Security groups 97 Table of Contents | v Block Storage 99 Block Storage Creation Failures 100 Instances 101 Starting Instances 101 Instance Boot Failures 101 Instance-specific Data 102 Associating Security Groups 104 Floating IPs 104 Attaching Block Storage 105 Taking Snapshots 106 Ensuring snapshots are consistent 107 Instances in the Database 108 11. Maintenance, Failures, and Debugging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Cloud Controller and Storage Proxy Failures and Maintenance 111 Planned Maintenance 111 Rebooting a cloud controller or Storage Proxy 111 After a Cloud Controller or Storage Proxy Reboots 112 Total Cloud Controller Failure 112 Compute Node Failures and Maintenance 112 Planned Maintenance 113 After a Compute Node Reboots 113 Instances 114 Inspecting and Recovering Data from Failed Instances 114 Volumes 117 Total Compute Node Failure 117 /var/lib/nova/instances 118 Storage Node Failures and Maintenance 118 Rebooting a Storage Node 119 Shutting Down a Storage Node 119 Replacing a Swift Disk 119 Handling a Complete Failure 120 Configuration Management 121 Working with Hardware 121 Adding a Compute Node 121 Adding an Object Storage Node 122 Replacing Components 122 Databases 122 Database Connectivity 123 Performance and Optimizing 123 HDWMY 123 Hourly 123 vi | Table of Contents Daily 123 Weekly 124 Monthly 124 Quarterly 124 Semi-Annually 124 Determining which Component Is Broken 124 Tailing Logs 125 Running Daemons on the CLI 125 Example of Complexity 125 Upgrades 126 Uninstalling 127 12. Network Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Using “ip a” to Check Interface States 129 Network Traffic in the Cloud 130 Finding a Failure in the Path 131 tcpdump 131 iptables 133 Network Configuration in the Database 133 Manually De-Associating a Floating IP 133 Debugging DHCP Issues 134 Debugging DNS Issues 137 13. Logging and Monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Where Are the Logs? 139 Cloud Controller 139 Compute Nodes 139 Block Storage Nodes 140 How to Read the Logs 140 Tracing Instance Requests 141 Adding Custom Logging Statements 142 RabbitMQ Web Management Interface or rabbitmqctl 143 Centrally Managing Logs 143 rsyslog Client Configuration 143 rsyslog Server Configuration 144 StackTach 145 Monitoring 145 Process Monitoring 146 Resource Alerting 146 OpenStack-specific Resources 147 Intelligent Alerting 149 Table of Contents | vii Trending 150 14. Backup and Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 What to Backup 153 Database Backups 154 File System Backups 154 Compute 154 Image Catalog and Delivery 155 Identity 155 Block Storage 155 Object Storage 155 Recovering Backups 155 15. Customize. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 DevStack 157 Middleware Example 160 Nova Scheduler Example 165 Dashboard 170 16. Upstream OpenStack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Getting Help 171 Reporting Bugs 172 Confirming & Prioritizing 173 Bug Fixing 174 After the Change is Accepted 174 Join the OpenStack Community 175 Features and the Development Roadmap 175 How to Contribute to the Documentation 177 Security Information 177 Finding Additional Information 178 17. Advanced Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Differences between various drivers 179 Periodic tasks 180 Specific configuration topics 181 OpenStack Compute (Nova) 181 A. Use Cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 B. Tales From the Cryp^H^H^H^H Cloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 viii | Table of Contents