Haines_6102FRONT.fm Page i Friday, April 14, 2006 7:51 AM Pro Java EE 5 Performance Management and Optimization ■ ■ ■ Steven Haines Haines_6102FRONT.fm Page ii Friday, April 14, 2006 7:51 AM Pro Java EE 5 Performance Management and Optimization Copyright © 2006 by Steven Haines All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher. ISBN-13: 1-59059-610-2 ISBN-10: 978-1-59059-610-4 Printed and bound in the United States of America 9 8 7 6 5 4 3 2 1 Trademarked names may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, we use the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. Lead Editor: Steve Anglin Technical Reviewers: Mark Gowdy, Dilip Thomas Editorial Board: Steve Anglin, Ewan Buckingham, Gary Cornell, Jason Gilmore, Jonathan Gennick, Jonathan Hassell, James Huddleston, Chris Mills, Matthew Moodie, Dominic Shakeshaft, Jim Sumser, Keir Thomas, Matt Wade Project Manager: Beth Christmas Copy Edit Manager: Nicole LeClerc Copy Editors: Heather Lang, Nicole LeClerc Assistant Production Director: Kari Brooks-Copony Production Editor: Laura Cheu Compositor: Susan Glinert Proofreader: Liz Welch Indexer: Broccoli Information Management Artist: Kinetic Publishing Services, LLC Cover Designer: Kurt Krames Manufacturing Director: Tom Debolski Distributed to the book trade worldwide by Springer-Verlag New York, Inc., 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax 201-348-4505, e-mail [email protected], or visit http://www.springeronline.com. For information on translations, please contact Apress directly at 2560 Ninth Street, Suite 219, Berkeley, CA 94710. Phone 510-549-5930, fax 510-549-5939, e-mail [email protected], or visit http://www.apress.com. The information in this book is distributed on an “as is” basis, without warranty. Although every precaution has been taken in the preparation of this work, neither the author(s) nor Apress shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in this work. The source code for this book is available to readers at http://www.apress.com in the Source Code section. Haines_6102.book Page 155 Thursday, April 13, 2006 6:57 AM C H A P T E R 6 ■ ■ ■ Performance Tuning Methodology “I have been reading about performance tuning on our application server vendor’s Web site, and it looks so complicated. There are all of these ratios that I need to watch and formulas to apply them to. And which ones are the most important? What’s going on with this?” John was getting frustrated with his tuning efforts. He had his team implementing the proactive performance testing methodology that I helped him with, but the concept of the by- the-book performance tuning was evading him. “Don’t let those ratios fool you—there is a much better approach to performance tuning. Let me ask you, when you take your car in for service, does the service technician plug your car into a computer and tell you what’s wrong with it, or does he ask you to describe your problems?” “Well of course he asks me about my problems, otherwise how would he know where to start looking? A car is a complicated machine,” John replied. “Exactly. There are so many moving parts that you wouldn’t want to look at each one. Similarly, when tuning an enterprise application, we want to look at its architecture and common pathways to optimize those pathways. When we step back from application server ratios and focus on what the application does and how it uses the application server, the task becomes much easier.” From the look on his face, I could see that he got it. He saw that the focus of tuning should be on the application, not on abstract ratios that he did not understand. Performance Tuning Overview Performance tuning is not a black art, but it is something that is not very well understood. When tasked with tuning an enterprise Java environment, you have three options: (cid:129) You can read through your application server’s tuning documentation. (cid:129) You can adopt the brute-force approach of trial and error. (cid:129) You can hire professional services. The problem with the first approach is that the application server vendor documentation is usually bloated and lacks prioritization. It would be nice to have a simple step-by-step list of tasks to perform that realize the most benefit with the least amount of effort and the order in which to perform them, but alas, that does not exist. Furthermore, when consulted on best 155 Haines_6102.book Page 156 Thursday, April 13, 2006 6:57 AM 156 CHAPTER 6 ■ PERFORMANCE TUNING METHODOLOGY practices of tuning options, application server vendors typically advise that the optimal config- uration depends on your application. This is true, but some general principles can provide a strong starting point from which to begin the tuning process. The second approach is highly effective, but requires a lot of time and a deep understanding of performance measurements to determine the effect of your changes. Tuning is an iterative process, so some trial and error is required, but it is most effective when you know where to start and where you are going. The final approach, paying someone else to tune your environment for you, is the most effective, but also the most expensive. This approach has a few drawbacks: (cid:129) It is difficult to find someone who knows exactly how to handle this task. (cid:129) Unless knowledge transfer is part of the engagement, you are powerless when your application changes; you become dependent on the consultant. (cid:129) It is expensive. Talented consultants can cost thousands of dollars per day and expect to provide at least two or three weeks’ worth of services. If you decide to go this route, when you’re looking for a reliable and knowledgeable resource, consider the consultant’s reputation and referrals. Look for someone who has worked in very complicated environments and in environments that are similar to yours. Furthermore, you want an apples-for-apples tuner: do not hire a WebSphere expert to tune WebLogic. These programs are similar but idiosyncratically different. In addition, always include knowledge transfer in the engagement statement of work. You do not want to be dependent on someone else for every little change that you make in the future. Keep in mind, though, that it is a good idea to re-engage a proven resource for substan- tial changes and for new applications. Encourage your team to learn from the consultant, but do not expect them to be fully trained by looking over someone’s shoulder for a couple of days. The cost of a consultant’s services may be high, but the cost of application failure is much higher. If you are basing your business and your reputation on an application, then a $50,000 price tag to ensure its performance is not unreasonable. Perform a cost-benefit analysis and a return on investment (ROI) analysis, and see if the numbers work out. If they do, then you can consider hiring a consultant. One alternative that I have neglected to mention is to befriend an expert in this area and ask him to guide your efforts. And that is the focus of this book and in particular this chapter. In this chapter, I share with you my experience tuning environments ranging from small, isolated applications to huge, mission-critical applications running in a complex shared environment. From these engagements, I have learned what always works and best practice approaches to performance tuning. Each application and environment is different, but in this chapter I show you the best place to start and the 20 percent of tuning effort that will yield 80 percent of your tuning impact. It is not rocket science as long as someone explains it to you. Load Testing Methodology Before starting any tuning effort, you need to realize that tuning efforts are only effective for the load that your environment is tuned against. To illustrate this point, consider the patient moni- toring system that I have alluded to in earlier chapters. It is database intensive in most of its functionality, and if the load generator does not test the database functionality with enough Haines_6102.book Page 157 Thursday, April 13, 2006 6:57 AM CHAPTER 6 ■ PERFORMANCE TUNING METHODOLOGY 157 load, then you can have no confidence that your configuration will meet the demands of users when you roll out the application to production. With that said, how do you properly design your load tests? Before the application is deployed to a production environment and you can observe real end-user behavior, you have no better option than to take your best guess. “Guess” may not be the most appropriate word to describe this activity, as you’ve spent time up-front constructing detailed use cases. If you did a good job building the use cases, then you know what you expect your users to do, and your guess is based on the distribution of use cases and their scenarios. In the following sections, we’ll examine how to construct representative load scenarios and then look at the process of applying those load scenarios against your environment. Load Testing Design Several times in this book I have emphasized the importance of understanding user patterns and the fact that you can attain this information through access log file analysis or an end-user monitoring device. But thus far I have not mentioned what to do in the case of a new application. When tuning a new application and environment, it is important to follow these three steps: 1. Estimate 2. Validate 3. Reflect The first step involves estimating what you expect your users to do and how you expect your application to be used. This is where well-defined and thorough use cases really help you. Define load scenarios for each use case scenario and then conduct a meeting with the applica- tion business owner and application technical owner to discuss and assign relative weights with which to balance the distribution of each scenario. It is the application business owner’s responsibility to spend significant time interviewing customers to understand the application functionality that users find most important. The application technical owner can then translate business functionality into the application in detailed steps that implement that functionality. Construct your test plan to exercise the production staging environment with load scripts balanced based off of the results of this meeting. The environment should then be tuned to optimally satisfy this balance of requests. ■ Note If your production staging environment does not match production, then there is still value in running a balanced load test; it allows you to derive a correlation between load and resource utilization. For example, if 500 simulated users under this balanced load use 20 database connections, then you can expect 1,000 users to use approximately 40 database connections to satisfy a similar load balance. Unfortunately, linear interpolation is not 100 percent accurate, because increased load also affects finite resources such as CPU that degrade performance rapidly as they approach saturation. But linear interpolation gives you a ballpark estimate or best practice start value from which to further fine-tune. In Chapter 9 I address the factors that limit interpolation algorithms and help you implement the best configurations. Haines_6102.book Page 158 Thursday, April 13, 2006 6:57 AM 158 CHAPTER 6 ■ PERFORMANCE TUNING METHODOLOGY After deploying an application to production and exposing it to end users, the next step is to validate usage patterns against expectations. This is the time to incorporate an access log file analyzer or end-user experience monitor to extract end-user behavior. The first week can be used to perform a sanity-check validation to identify any gross deviations from estimates, but depending on your user load, a month or even a quarter could be required before users become comfortable enough with your application to give you confidence that you have accurately captured their behavior. User requests that log file analysis or end-user experience monitors reveal need to be reconstructed into use case scenarios and then traced back to initial estimates. If they do match, then your tuning efforts were effective, but if they are dramatically different, then you need to retune the application to the actual user patterns. Finally, it is important to perform a postmortem analysis and reflect on how estimated user patterns mapped to actual user patterns. This step is typically overlooked, but it is only through this analysis that your estimates will become more accurate in the future. You need to understand where your estimates were flawed and attempt to identify why. In general, your users’ behavior is not going to change significantly over time, so your estimates should become more accurate as your application evolves. Your workload as an enterprise Java administrator should include periodically repeating this procedure of end-user pattern validation. In the early stages of an application, you should perform this validation relatively frequently, such as every month, but as the application matures, you will perform these validation efforts less frequently, such as every quarter or six months. Applications evolve over time, and new features are added to satisfy user feedback; therefore, you cannot neglect even infrequent user pattern validation. For example, I once worked with a customer who deployed a simple Flash game into their production environment that subse- quently crashed their production servers. Other procedural issues were at the core of this problem, but the practical application here is that small modifications to a production environment can dramatically affect resource utilization and contention. And, as with this particular customer, the consequences can be catastrophic. Load Testing Process If you want your tuning efforts to be as accurate as possible, then ideally you should maintain a production staging environment with the same configuration as your production environ- ment. Unfortunately, most companies cannot justify the additional expense involved in doing so and therefore construct a production staging environment that is a scaled-down version of production. The following are three main strategies used to scale down the production staging environment: (cid:129) Scale down the number of machines, but use the same class of machines (cid:129) Scale down the class of machines (cid:129) Scale down both the number of machines (size of the environment) as well as the class of machines Unless financial resources dedicated to production staging are plentiful, scaling down the size of an environment is the most effective plan. For example, if your production environment maintains eight servers, then a production staging environment with four servers is perfectly accurate to perform scaled-down tuning against. A scaled-down environment running the same class of machines (with the same CPU, memory, and so forth) is very effective because Haines_6102.book Page 159 Thursday, April 13, 2006 6:57 AM CHAPTER 6 ■ PERFORMANCE TUNING METHODOLOGY 159 you can understand how your application should perform on a single server, and depending on the size, you can calculate the percentage of performance lost in interserver communica- tion (such as the overhead required to replicate stateful information across a cluster). Scaling down classes of machine, on the other hand, can be quite problematic. In many cases, it is necessary—for example, consider a production environment running in a $10 million mainframe. Chances are that this customer is not going to spend an additional $10 million on a testbed. When you scale down classes of machine, then the best your load testing can accom- plish is to identify the relative balance of resource utilizations. This information is still interesting because it allows you to extract information about which service requests resolve to database or external resource calls, the relative response times of each service request, relative thread pool utilization, cache utilization, and so on. Most of these values are relative to each other, but as you deploy to a stronger production environment, you can define a relative scale of resources to one another, establishing best “guess” values and scaling resources appropriately. To perform an accurate load test, you need to quantify your projected user load and configure your load tester to generate a graduated load up to the projected user load. Each step should be graduated with enough granularity so as not to oversaturate the application if a performance problem occurs. Wait-Based Tuning I developed the notion of wait-based tuning by drawing from two sources: (cid:129) Oracle database tuning theory (cid:129) IBM WebSphere tuning theory I owe a debt of thanks to an associate of mine, Dan Wittry, who works in the Oracle tuning realm. Dan explained to me that in previous versions of Oracle, performance tuning was based upon observing various ratios. For example, what is the ratio of queries serviced in memory to those loaded from disk? How far and how frequently is a disk head required to move? The point is that tuning a database was based upon optimizing performance ratios. In newer releases of the Oracle database, the practice has shifted away from ratios and toward the notion of identi- fying wait points. No longer do we care about the specifics of performance ratio values; we’re now concerned with the performance of our queries. Chances are that a database serving content well will maintain superior performance ratios, but the ratios are not the primary focus of the tuning effort—expediting queries is. After reading through tuning manuals for IBM WebSphere, BEA WebLogic, Oracle Application Server, and JBoss, I understood well the commonalities between their implementations and the similarity between application server tuning and database tuning: a focus on performance ratios. While IBM addressed performance ratios, it traveled down a different path: where in an application can a request wait? IBM identified four main areas: (cid:129) Web server (cid:129) Web container (cid:129) EJB container (cid:129) Database connection pools Haines_6102.book Page 160 Thursday, April 13, 2006 6:57 AM 160 CHAPTER 6 ■ PERFORMANCE TUNING METHODOLOGY Furthermore, IBM posed the supposition that the best place for a request to wait is as early in the process as possible. Once you have learned the capacities of each wait zone, then allow only that number of requests to be processed; force all others to wait back at the Web server. In general, a Web server is a fairly light server: it has a very tight server socket listening process that funnels requests into a queue for processing. Threads assigned to that queue examine the request and either forward it to an application server (or other content provider) or return the requested resource. If the environment is at capacity, then it is better for the Web server to accept the burden of holding on to the pending request rather than to force that burden on the application server. Tuning Theory IBM’s paradigm provides better insight into the actual performance of an application server and makes as much sense as Oracle’s notion of wait points. The focus is on maximizing the performance of an application’s requests, not on ratios. Equipped with these theories, I delved a little further into the nature of application requests as they traverse an enterprise Java environment and asked the question, Where in this tech- nology stack can requests wait? Figure 6-1 shows the common path for an application request. Figure 6-1. Common path an application request follows through a Java EE stack As shown in Figure 6-1, requests travel across the technology stack through request queues. When a browser submits a Web request to a Web server, the Web server receives it through a listening socket and quickly moves the request into a request queue, as only one thread can listen on a single port at any given point in time. When that thread receives the request, its primary responsibility is to return to its port and receive the next connection. If it processed requests serially, then the Web server would be capable of processing only one request at a time— not very impressive. A Web server’s listening process would look something like the following: Haines_6102.book Page 161 Thursday, April 13, 2006 6:57 AM CHAPTER 6 ■ PERFORMANCE TUNING METHODOLOGY 161 public class WebServer extends Thread { ... public void run() { ServerSocket serverSocket = new ServerSocket( 80 ); while( running ) { Socket s = serverSocket.accept(); Request req = new Request( s ); addRequestToQueue( req ); } } While this is a very simplistic example, it demonstrates that the thread loop is very tight and acts simply as a pass-through to another thread. Each queue has an associated thread pool that waits for requests to be added to the queue to process them. When a request is added to the queue, a thread wakes up, removes the request from the queue, and processes it, for example: public synchronized void addRequestToQueue( Request req ) { this.requests.add( req ); this.requests.notifyAll(); } Threads waiting on the request’s object are notified, and the first one there accepts the request for processing. The actions of the thread are dependent on the request (or in the case of separation of business tiers, the request may actually be a remote method invocation). Consider a Web request against an application server. If the Web server and application are separated, then the Web server forwards the request to the application server and the same process repeats. Once the request is in the application server, then the application server needs to determine the appropriate resource to invoke. In this example, it is going to be either a servlet or a JSP file. For the purpose of this discussion, we will consider JSP files to be servlets. ■ Note JSP files are convenient to build because in simple implementations you are not required to create a web.xml file containing <servlet> and <servlet-mapping> entries. But in the end, a JSP file will become a servlet. The JSP file itself is translated into an associated .java servlet file, compiled into a .class file, and then loaded into memory to service a request. If you have ever wondered why a JSP file took so much time to respond the first time, it is because it needs to be translated and compiled prior to being loaded into memory. You do have the option to precompile JSP files, which buys you the ease of development of a JSP file and the general performance of a servlet. The running thread loads the appropriate servlet into memory and invokes its service() method. This starts the Java EE application request processing as we tend to think of it. Depending on your use of Java EE components, your next step may be to create a stateless session bean to implement your application’s transactional business logic. Rather than your having to create a new stateless session bean for each request, they are pooled; your servlet obtains one from the pool, uses it, and then returns it to the pool. If all of the beans in the pool are in use, then the processing thread must wait for a bean to be returned to the pool. Haines_6102.book Page 162 Thursday, April 13, 2006 6:57 AM 162 CHAPTER 6 ■ PERFORMANCE TUNING METHODOLOGY Most business objects make use of persistent storage, in the form of either a database or a legacy system. It is expensive for a Java application to make a query across a network to persistent storage, so for certain types of objects, the persistence manager implements a cache of frequently accessed objects. The cache is queried, and if the requested object is not found, then the object must be loaded from persistent storage. While caches can provide performance an order of magnitude better than resolving all queries to persistent storage, there is danger in misusing them. Specifically, if a cache is sized too small, then the majority of requests will resolve to querying persistent storage, but we added the overhead of checking the cache for the requested object, selecting an object to be removed from the cache to make room for the new one (typically using a least-recently used algorithm), and adding the new object to the cache. In this case, querying persistent storage would perform much better. The final trade-off is that a large cache requires storage space; if you need to maintain too many objects in a cache to avoid thrashing (that is, rapidly adding and removing objects to and from the cache), then you really need to question whether the object should be cached in the first point. Establishing a connection to persistent storage is an expensive operation. For example, establishing a database connection can take between a half a second and a second and a half on average. Because you do not want your pending request to absorb this overhead on each request, application servers establish these connections on start-up and maintain them in connection pools. When a request needs to query persistent storage, it obtains a connection from the connection pool, uses it, and then returns it to the connection pool. If no connection is available, then the request waits for a connection to be returned to the pool. Once the request has finished processing its business logic, it needs to be forwarded to a presentation layer before returning to the caller. The most typical presentation layer imple- mentation is to use JavaServer Pages (JSP). As previously mentioned, using JSP can incur the additional overhead of translation to servlet code and compilation, if the JSPs are not precompiled. This up-front performance hit can impact your users and should be addressed, but from a pure tuning perspective, JSP compilation does not impact the order of magnitude of the application performance: the impact is observed once, but there is no further impact as the number of users increases. Observing the scenario we have been discussing, we can identify the following wait points: (cid:129) Web server thread pool (cid:129) Application server or tier thread pool (cid:129) Stateless session bean or business object pool (cid:129) Cache management code (cid:129) Persistent storage or external dependency connection pool You can feel free to add to or subtract from this list to satisfy the architecture of your appli- cation, but it is a good general framework to start with. Tuning Backward The order of wait points is as important as what they are waiting on. IBM’s notion of sustaining waiting requests as close to the Web server as possible has been proven to be a highly effective
Description: