ebook img

Pro Python System Administration (2nd ed.) [Sil.. PDF

411 Pages·2014·5.16 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Pro Python System Administration (2nd ed.) [Sil..

BOOKS FOR PROFESSIONALS BY PROFESSIONALS® Sileika Pro Python System Administration RELATED Pro Python System Administration, Second Edition explains and shows how to apply Python scripting in practice. It will show you how to approach and resolve real-world issues that most system administrators will come across in their careers. This book has been updated using Python 2.7 and Python 3 where appropriate. It also uses various new and relevant open source projects and tools that should now be used in practice. In this updated edition, you will find several projects in the categories of network administration, web server administration, and monitoring and database management. In each project, the author will define the problem, design the solution, and go through the more interesting implementation steps. Each project is accompanied by the source code of a fully working prototype, which you’ll be able to use immediately or adapt to your requirements and environment. This book is primarily aimed at experienced system administrators whose day-to-day tasks involve looking after and managing small-to-medium-sized server estates. It will also be beneficial for system administrators who want to learn more about automation and want to apply their Python knowledge to solve various system administration problems. Python developers will also benefit from reading this book, especially if they are involved in developing automation and management tools. You’ll learn how to: • Solve real-world system administration problems using Python • Manage devices with SNMP and SOAP • Build a distributed monitoring system • Manage web applications and parse complex log files • Monitor and manage MySQL databases automatically Shelve in ISBN 978-1-4842-0218-0 Programming Languages/General 55999 User level: SECOND Intermediate–Advanced EDITION SOURCE CODE ONLINE 9781484202180 www.apress.com For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to access them. Contents at a Glance About the Author ��������������������������������������������������������������������������������������������������������������xvii About the Technical Reviewers �����������������������������������������������������������������������������������������xix Acknowledgments �������������������������������������������������������������������������������������������������������������xxi Introduction ���������������������������������������������������������������������������������������������������������������������xxiii ■ Chapter 1: Reading and Collecting Performance Data Using SNMP ����������������������������������1 ■ Chapter 2: Managing Devices Using the SOAP API ����������������������������������������������������������37 ■ Chapter 3: Creating a Web Application for IP Address Accountancy ������������������������������79 ■ Chapter 4: Integrating the IP Address Application with DHCP ��������������������������������������111 ■ Chapter 5: Maintaining a List of Virtual Hosts in an Apache Configuration File ������������143 ■ Chapter 6: Gathering and Presenting Statistical Data from Apache Log Files ��������������163 ■ Chapter 7: Performing Complex Searches and Reporting on Application Log Files ������189 ■ Chapter 8: A Website Availability Check Script for Nagios ��������������������������������������������217 ■ Chapter 9: Management and Monitoring Subsystem ����������������������������������������������������241 ■ Chapter 10: Remote Monitoring Agents ������������������������������������������������������������������������275 ■ Chapter 11: Statistics Gathering and Reporting ������������������������������������������������������������301 ■ Chapter 12: Distributed Message Processing System ���������������������������������������������������331 ■ Chapter 13: Automatic MySQL Database Performance Tuning ��������������������������������������349 ■ Chapter 14: Using Amazon EC2/S3 as a Data Warehouse Solution �������������������������������367 Index ���������������������������������������������������������������������������������������������������������������������������������391 v Introduction The role of the system administrator has grown dramatically over the years. The number of systems supported by a single engineer has also increased. As such, it is impractical to handcraft each installation, and there is a need to automate as many tasks as possible. The structure of systems varies from organization to organization, therefore system administrators must be able to create their own management tools. Historically, the most popular programming languages for these tasks were UNIX shell and Perl. They served their purposes well, and I doubt they will ever cease to exist. However, the complexity of current systems requires new tools, and the Python programming language is one of them. Python is an object-oriented programming language suitable for developing large-scale applications. Its syntax and structure make it very easy to read—so much so that the language is sometimes referred to as “executable pseudocode.” The Python interpreter allows for interactive execution, so in some situations an administrator can use it instead of a standard UNIX shell. Although Python is primarily an object-oriented language, it is easily adopted for procedural and functional styles of programming. Given all that, Python makes a perfect fit as a new language for implementing system administration applications. There are a large number of Linux system utilities already written in Python, such as the Yum package manager and Anaconda, the Linux installation program. The Prerequisites for Using this Book This book is about using the Python programming language to solve specific system administration tasks. We look at the four distinctive system administration areas: network management, web server and web application management, database system management, and system monitoring. Although I explain in detail most of the technologies used in this book, bear in mind that the main goal here is to display the practical application of the Python libraries so as to solve rather specific issues. Therefore, I assume that you are a seasoned system administrator. You should be able to find additional information yourself; this book gives you a rough guide for how to reach your goal, but you must be able to work out how to adapt it to your specific system and environment. As we discuss the examples, you will be asked to install additional packages and libraries. In most cases, I provide the commands and instructions to perform these tasks on a Fedora system, but you should be ready to adopt the instructions to the Linux distribution that you are going to use. Most of the examples also work without many modification on a recent OS X release (10.10.X). I also assume that you have a background in the Python programming language. I introduce the specific libraries that are used in system administration tasks, as well as some lesser known or less often discussed language functionality, such as the generator functions or the class internal methods, but the basic language syntax is not explained here. If you want to refresh your Python skills, I recommend the following books: Pro Python by Marty Alchin and J. Burton Browning (Apress, 2012; but watch for a new edition due to be released in early 2015); Python Programming for the Absolute Beginner by Mike Dawson (Course Technology PTR, 2010); and Core Python Applications Programming by Wesley Chun (Prentice Hall, 2012) All examples presented in this book assume the Python version 2.7. This is mostly dictated by the libraries that are used in the examples. Some libraries have been ported to Python 3; however, some have not. So if you need to run Python 3, make sure you check that the required libraries have Python 3 support. xxiii ■ IntroduCtIon The Structure of this Book This book contains 14 chapters, and each chapter solves a distinctive problem. Some examples span multiple chapters, but even then, each chapter deals with a specific aspect of the particular problem. In addition to the chapters, several other organizational layers characterize this book. First, I grouped the chapters by the problem type. Chapters 1 to 4 deal with network management issues; Chapters 5 to 7 talk about the Apache web server and web application management; Chapters 8 to 11 are dedicated to monitoring and statistical calculations; and Chapters 12 and 13 focus on database management issues. Second, I maintain a common pattern in all chapters. I start with the problem statement and then move on to gather requirements and proceed through the design phase before moving into the implementation section. Third, each chapter focuses on one or more technologies and the Python libraries that provide the language interface for the particular technology. Examples of such technologies could be the SOAP protocol, application plug-in architecture, or cloud computing concepts. More specifically, here’s a breakdown of the chapters: Chapter 1: Reading and Collecting Performance Data Using SNMP Most network-attached devices expose the internal counters via the Simple Network Management Protocol (SNMP). This chapter explains basic SNMP principles and the data structure. We then look at the Python libraries that provide the interface to SNMP–enabled devices. We also investigate the round robin database, which is the de facto standard for storing the statistical data. Finally, we look at the Jinja2 template framework, which allows us to generate simple web pages. Chapter 2: Managing Devices Using the SOAP API Complicated tasks, such as managing the device configuration, cannot be easily done by using SNMP because the protocol is too simplistic. Therefore, advanced devices, such as the Citrix Netscaler load balancers, provide the SOAP API interface to the device management system. In this chapter, we investigate the SOAP API structure and the libraries that enable the SOAP–based communication from the Python programming language. We also look at the basic logging functionality using the built-in libraries. This second edition of the book includes examples of how to use the new REST API to manage the load balancer devices. Chapter 3: Creating a Web Application for IP Address Accountancy In this chapter, we build a web application that maintains the list of the assigned IP addresses and the address ranges. We learn how to create web applications using the Django framework. I show you the way the Django application should be structured, tell how to create and configure the application settings, and explain the URL structure. We also investigate how to deploy the Django application using the Apache web server. Chapter 4: Integrating the IP Address Application with DHCP This chapter expands on the previous chapter, and we implement the DHCP address range support. We also look at some advanced Django programming techniques, such as customizing the response MIME type and serving AJAX calls. This second edition adds new functionality to manage dynamic DHCP leases using OMAPI protocol. Chapter 5: Maintaining a List of Virtual Hosts in an Apache Configuration File This is another Django application that we develop in this book, but this time our focus is on the Django administration interface. While building the Apache configuration management application, you learn how to customize the default Django administration interface with your own views and functions. xxiv ■ IntroduCtIon Chapter 6: Gathering and Presenting Statistical Data from Apache Log Files In this chapter, the goal is to build an application that parses and analyses the Apache web server log files. Instead of taking the straightforward but inflexible approach of building a monolithic application, we look at the design principles involved in building plug-in applications. You learn how to use the object and class type discovery functions and how to perform a dynamic module loading. This second edition of the book shows you how to perform data visualization based on the gathered data. Chapter 7: Performing Complex Searches and Reporting on Application Log Files This chapter also deals with the log file parsing, but this time I show you how to parse complex, multi-line log file entries. We investigate the functionality of the open-source log file parser tool called Exctractor, which you can download from http://exctractor.sourceforge.net/. Chapter 8: A Web Site Availability Check Script for Nagios Nagios is one of the most popular open-source monitoring systems, because its modular structure allows users to implement their own check scripts and thus customize the tool to meet their needs. In this chapter, we create two scripts that check the functionality of a website. We investigate how to use the Beautiful Soup HTML parsing library to extract the information from the HTML web pages. Chapter 9: Management and Monitoring Subsystem This chapter starts a three-chapter series in which we build a complete monitoring system. The goal of this chapter is not to replace mature monitoring systems such as Nagios or Zenoss but to show the basic principles of the distributed application programming. We look at database design principles such as data normalization. We also investigate how to implement the communication mechanisms between network services using the RPC calls. Chapter 10: Remote Monitoring Agents This is the second chapter in the monitoring series, where we implement the remote monitoring agent components. In this chapter, I also describe how to decouple the application from its configuration using the ConfigParser module. Chapter 11: Statistics Gathering and Reporting This is the last part of the monitoring series, where I show you how to perform basic statistical analysis on the collected performance data. We use scientific libraries: NumPy to perform the calculations and matplotlib to create the graphs. You learn how to find which performance readings fall into the comfort zone and how to calculate the boundaries of that zone. We also do the basic trend detection, which provides a good insight for the capacity planning. Chapter 12: Distributed Message Processing System This is a new chapter for the second edition of the book. In this chapter I show you how to convert the distributed management system to use Celery, a remote task execution framework. Chapter 13: Automatic MySQL Database Performance Tuning In this chapter, I show you how to obtain the MySQL database configuration variables and the internal status indicators. We build an application that makes a suggestion on how to improve the database engine performance based on the obtained data. Chapter 14: Amazon EC2/S3 as a Data Warehouse Solution This chapter shows you how to utilize the Amazon Elastic Compute Cloud (EC2) and offload the infrequent computation tasks to it. We build an application that automatically creates a database server where you can transfer data for further analysis. You can use this example as a basis to build an on-demand data warehouse solution. xxv ■ IntroduCtIon The Example Source Code The source code of all the examples in this book, along with any applicable sample data, can be downloaded from the Apress website by following instructions at www.apress.com/source-code/. The source code stored at this location contains the same code that is described in the book. Most of the prototypes described in this book are also available as open-source projects. You can find these projects at the author’s website, http://www.sysadminpy.com/. xxvi Chapter 1 Reading and Collecting Performance Data Using SNMP Most devices that are connected to a network report their status using SNMP (the Simple Network Management Protocol). This protocol was designed primarily for managing and monitoring network-attached hardware devices, but some applications also expose their statistical data using this protocol. In this chapter we will look at how to access this information from your Python applications. We are going to store the obtained data in an RRD (round robin database), using RRDTool—a widely known and popular application and library, which is used to store and plot the performance data. Finally we’ll investigate the Jinja2 template system, which we’ll use to generate simple web pages for our application. Application Requirements and Design The topic of system monitoring is very broad and usually encompasses many different areas. A complete monitoring system is rather complex and often is made up of multiple components working together. We are not going to develop a complete, self-sufficient system here, but we’ll look into two important areas of a typical monitoring system: information gathering and representation. In this chapter we’ll implement a system that queries devices using an SNMP protocol and then stores the data using the RRDTool library, which is also used to generate the graphs for visual data representation. All this is tied together into simple web pages using the Jinja2 templating library. We’ll look at each of these components in more detail as we go along through the chapter. Specifying the Requirements Before we start designing our application we need to come up with some requirements for our system. First of all we need to understand the functionality we expect our system to provide. This will help us to create an effective (and we hope easy-to-implement) system design. In this chapter we are going to create a system that monitors network-attached devices, such as network switches and routers, using the SNMP protocol. So the first requirement is that the system be able to query any device using SNMP. The information gathered from the devices needs to be stored for future reference and analysis. Let’s make some assumptions about the use of this information. First, we don’t need to store it indefinitely. (I’ll talk more about permanent information storage in Chapters 9–11.) This means that the information is stored only for a predefined period of time, and once it becomes obsolete it will be erased. This presents our second requirement: the information needs to be deleted after it has “expired.” Second, the information needs to be stored so that graphs can be produced. We are not going to use it for anything else, and therefore the data store should be optimized for the data representation tasks. 1 Chapter 1 ■ reading and ColleCting performanCe data Using snmp Finally, we need to generate the graphs and represent this information on easily accessible web pages. The information needs to be structured by the device names only. For example, if we are monitoring several devices for CPU and network interface utilization, this information needs to be presented on a single page. We don’t need to present this information on multiple time scales; by default the graphs should show the performance indicators for the last 24 hours. High-Level Design Specification Now that we have some ideas about the functionality of our system, let’s create a simple design, which we’ll use as a guide in the development phase. The basic approach is that each of the requirements we specified earlier should be covered by one or more design decisions. The first requirement is that we need to monitor the network-attached devices, and we need to do so using SNMP. This means that we have to use appropriate Python library that deals with the SNMP objects. The SNMP module is not included in the default Python installation, so we’ll have to use one of the external modules. I recommend using the PySNMP library (available at http://pysnmp.sourceforge.net/), which is readily available on most of the popular Linux distributions. The perfect candidate for the data store engine is RRDTool (available at http://oss.oetiker.ch/rrdtool/). The round robin database means that the database is structured in such a way that each “table” has a limited length, and once the limit is reached, the oldest entries are dropped. In fact they are not dropped; the new ones are simply written into their position. The RRDTool library provides two distinct functionalities: the database service and the graph-generation toolkit. There is no native support for RRD databases in Python, but there is an external library available that provides an interface to the RRDTool library. Finally, to generate the web page we will use the Jinja2 templating library (available at http://jinja.pocoo.org, or on GitHub: https://github.com/mitsuhiko/jinja2), which lets us create sophisticated templates and decouple the design and development tasks. We are going to use a simple Windows INI-style configuration file to store the information about the devices we will be monitoring. This information will include details such as the device address, SNMP object reference, and access control details. The application will be split into two parts: the first part is the information-gathering tool that queries all configured devices and stores the data in the RRDTool database, and the second part is the report generator, which generates the web site structure along with all required images. Both components will be instantiated from the standard UNIX scheduler application, cron. These two scripts will be named snmp-manager.py and snmp-pages.py, respectively. Introduction to SNMP SNMP (Simple Network Management Protocol) is a UDP-based protocol used mostly for managing network-attached devices, such as routers, switches, computers, printers, video cameras, and so on. Some applications also allow access to internal counters via the SNMP protocol. SNMP not only allows you to read performance statistics from the devices, it can also send control messages to instruct a device to perform some action—for example, you can restart a router remotely by using SNMP commands. There are three main components in a system managed by SIMPLE NETWORK MANAGEMENT PROTOCOL (SNMP): • The management system which is responsible for managing all devices • The managed devices, which are all devices managed by the management system • The SNMP agent, which is an application that runs on each of the managed devices and interacts with the management system This relationship is illustrated in Figure 1-1. 2 Chapter 1 ■ reading and ColleCting performanCe data Using snmp The Management System SNMP Agent SNMP Agent SNMP Agent software software ... software Managed device 1 Managed device 2 Managed device X Figure 1-1. The SNMP network components This approach is rather generic. The protocol defines seven basic commands, of which the most interesting to us are get, get bulk, and response. As you may have guessed, the former two are the commands that the management system issues to the agent, and the latter is a response from the agent software. How does the management system know what to look for? The protocol does not define a way of exchanging this information, and therefore the management system has no way to interrogate the agents to obtain the list of available variables. The issue is resolved by using a Management Information Base (or MIB). Each device usually has an associated MIB, which describes the structure of the management data on that system. Such a MIB would list in hierarchical order all object identifiers (OIDs) that are available on the managed device. The OID effectively represents a node in the object tree. It contains numerical identifiers of all nodes leading to the current OID starting from the node at the top of the tree. The node IDs are assigned and regulated by the IANA (Internet Assigned Numbers Authority). An organization can apply for an OID node, and when it is assigned it is responsible for managing the OID structure below the allocated node. Figure 1-2 illustrates a portion of the OID tree. 3

Description:
Python Programming for the Absolute Beginner by Mike Dawson (Course This chapter explains basic SNMP principles and the data structure. Therefore, advanced devices, such as the Citrix Netscaler load balancers, provide
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.