ebook img

Experiments with Test Case Generation and Runtime Analysis PDF

18 Pages·2003·0.89 MB·English
by  ArthoCyrille
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Experiments with Test Case Generation and Runtime Analysis

Experiments with Test Case Generation and Runtime Analysis Cyrille Artho 1, Doron Drusinksy 2, Allen Goldberg 3, Klaus Havelund 3, Mike Lowry 4, Corina Pasareanu 3, Grigore Ro_u 5, and Willem Visser 6 wun-_pu_=l Systems ' ":.... ETH Zurich, CH-8092 Zurich, S;_.tzer!and 2 Naval Postgraduate School, Monterey, California, USA, and Time Rover, Inc, Cupertino, California, USA 3 Kestrel Technology, NASA Ames Research Center, Moffett Field, California USA 4 NASA Ames Research Center, Moffett Field, California USA 5 Department of Computer Science, University of Illinois at Urbana-Champaign RIACS, NASA Ames Research Center, Moffett Field, California USA Abstract. Software testing is typically an ad hoc process where human testers manually write many test inputs and expected test results, per- haps automating their execution in a regression suite. This process is cumbersome and costly. This paper reports preliminary results on an approach to further automate this process. The approach consists of combining automated test case generation based on systematically ex- ploring the program's input domain, with runtime analysis, where exe- cution traces are monitored and verified against temporal logic specifica- tions, or analyzed using advanced algorithms for detecting concurrency errors such as data races and deadlocks. The approach suggests to gen- erate specifications dynamically per input instance rather than statically once-and-for-all. The paper describes experiments with variants of this approach in the context of two examples, a planetary rover controller and aspace craft fault protection system. 1 Introduction A program is typically tested by manually creating a test suite, which in turn is a set of test cases. An individual test case is a description of a test input sequence to the program, and a description of properties that the corresponding output is expected to have. This procedure seems complicated but ultimately unavoidable since for real systems, writing test cases is an inherently innovative process requiring human insight into the logic of the application being tested. Discussions with robotics and space craft engineers at NASA seems to support this view. However, an equally widespread opinion is that a non-trivial part of the testing work can be automated. In [3] is described a case study, where an 8,000 line Java application was tested by different student groups using different testing techniques. It is conjectured that the vast majority of bugs in this system that were found by the students could have been found in a fully automatic way. The paper presents reflections and preliminary work on applying low-budget automatetedstingtoidentifbyugsquicklyT.hepapesrhalbl eseeansaposition statemeonftfutureworkb, aseodnexperimenutssingourtestingtoolsoncase studies. Wesuggeasftramewofrokrgeneratitnegstcaseisnafullyautomatwicayas illustratebdyFigure1.Foraparticulaprrogramtobetestedo,neestablishaes testharnescsonsistinogffourmoduleas: test input generator module, a property generator module, a program instrumentation module and an observer module. generation Property Observer lnslrumentatic, n Fig. 1. Test case generation (test input generation and property generation) and run- time analysis (instrumentation and observation). The test input generator automatically generates inputs to the application, one by one. A generated input is fed to the the property generator, which auto- matically generates a set of properties that the program should satisfy when executed on that input. The input is then fed to the program, which executes, generating an execution trace. The observer module "observes" the executing program, checking its behavior against the generated set of properties. That is, the observer takes as input an execution trace and the set of properties generated by the property generator. The program itself must be instrumented to report events that are relevant for monitoring that the properties are satisfied on a par- ticular execution. This instrumentation can in many cases be automated. The test input generator and the property generator are both written ("hard-wired") specifically for the application that is tested. This replaces manual construction of test cases. However, the instrumentation and observer modules are generic tools that are re-used on different applications. In the rest of this paper we use the term test case generation to refer to test input generation and property gen- eration, and the term runtime analysis to refer to instrumentation as well as observation. The above described framework was applied to two case studies, a planetary rover controller and a space craft fault protection system. In each case the sys- tem properties were expressed in temporal logic. For the rover controller, test cases were generated using a model checker and the properties generated were specific to a single test case. For the fault protection system, test cases were gen- eratedbyasmalplrograma1n,d universal correctness properties were manually constructed. Property generation is the difficult step in this process. We are investigating problem characteristics and tradeoffs between the two approaches used in the studies. The approach of generatingtproperties specific to a single test case is more novel and will be investigated further. The paper is organized as follows. Section 2 outlines the abstract framework for test case generation that we have tried to adhere to. Section 3 describes the runtime analysis techniques. Sections 4 and 5 describe the two case studies. Finally Section 6 concludes the paper and outlines how this preliminary work will be continued. 2 Test Case Generation This section presents, in abstract, the test case generation framework. As men- tioned earlier, we consider test generation as consisting of test input generation and property generation. 2.1 Test Input Generation In practice today, the generation of test inputs for a program under test is a time- consuming and mostly manual activity. However, test input generation naturally lends itself to automation, and therefore has been the focus of much research attention - recently it has also been adopted in industry [21,25,6,10]. There are two main approaches to generating test inputs automatically: a static approach that generates inputs from some kind of model of the system (also called model- based testing), and a dynamic approach that generates tests by executing the program repeatedly, while employing criteria to rank the quality of the tests produced [20,24]. The dynamic approach is based on the observation that test input generation can be seen as an optimization problem, where the cost function used for optimization is typically related to the code coverage (e.g. statement or branch coverage). The model-based test input (test case) generation approach is used more widely (see Hartman's survey of the field [12]). The model used for model-based testing is typically a model of expected system behavior and can be derived from a number of sources, namely, a model of the requirements, use cases, design specifications of a system [12] - even the code itself can be used to create a model (e.g. symbolic execution based approaches [19,21]). As with the dynamic approach, it is most typical to use some notion of coverage of the model to derive test inputs, i.e., generate inputs that cover 'all transitions (or branches, etc.) in the model. To construct a model of the expected system behavior can, however, be a costly process. On the other hand, generating test inputs just based on a spec- ification of the input structure and input pre-conditions can be very effective, while typically less costly. We propose to use a model checker to traverse the space of possible valid inputs, in order to generate test inputs. We describe the input model as a nondeterministic program that describes all valid inputs, and then we use the model checker to traverse the state space of this program. We also assert, as the property the model checker should check for, that no such test input exists - this causes the model checker to produce a counterexample whenever a valid test input has been generated and from this counterexample trace we then produce the test input. It is important that various techniques for searching the state space should be available since this gives the flexibility to gen- erate a large array of test inputs to achieve better coverage of the behavior of the system under test. For test input generation we use the Java PathFinder model checker (JPF) that analyzes Java programs [26] and supports various heuristic search strategies (for example, based on branch coverage [11] or random search). In Section 4.2 we show how this model checker is used to generate test inputs for the Mars K9 rover. The most closely related work to ours is the Korat tool [2] that generates test inputs from Java predicates, but instead of model checking they use a dedicated, efficient, search strategy. The use of the counterexample capability of a model checker to generate test inputs have also been studied by many others (see [18] for a good survey), but most of these are based on a full system model, not just the input structure and pre-eonditions as suggested here. 2.2 Property Generation As mentioned earlier, the ideal goal is from a particular test input to generate properties that can be checked on executions of the program on that input. More precisely, assume a particular program that we want to test and the domain Input of inputs. Then we have to construct the following objects. First of all, we need to define what is the domain of observable behaviors. We shall regard the executing program as generating an execution trace in the domain Trace, where a trace is a sequence of observable events in the domain Event. We must define a function: execute : Input -+ Trace that for a given input returns the execution trace generated by the program when applied to that input. Defining the domain Event and the function execute in practice amounts to instrumenting the program to log events of importance. The resulting execution trace will then consist of these logged events. Obviously we also need to define the domain Property of properties that are relevant for the application and a relation: _ C_ Trace x Property that determines what traces satisfy what properties. %Vesay that (a,_) E _ , also written as a _ _, when the trace a satisfies the property _. EssentiMly what is now needed is a function translate: translate : Input --_ Property-set that for a given input returns the set of properties that it is regarded as relevant to test on the execution of the program on that input. A well-behaved program satisfies the following formula: Vi E Input. V_ E translate(i) -execute(i) _ Our experience is that temporal logic is an appropriate notation for writing prop- erties about the applications we have investigated, and which will be studied in subsequent sections. For a particular application one needs to provide the in- strumentation (execute) and the property generator (translate), which generates a set of temporal logic properties for a particular input. We shall discuss each of these aspects in connection with the case studies. 3 Runtime Analysis Runtime analysis consists of instrumenting the program and observing the exe- cution of the instrumented program. The runtime analysis modules consist of a code instrumentation module, that augments the program under test with code that generates an event log, and an observer module chat evaluates the event log for conformance to desired properties. The event log can be transmitted via inter-process communication or stored as a file. This allows for running an in- strumented executable remotely and with little impac_ on the performance of the system under test. In our case studies we used two different runtime analyzers: the commercial tool DBRover, based on an extension of the Temporal Rover tool [6] and the research tool JPaX [14,1]. These systems will be described in the following. The architecture of the JPa,X runtime analysis framework is designed to al- low several different event interpreters to be easily plugged into the observer component. In the test studies two event interpreters were used: one algorithm analyzes temporal logic properties, as already discussed, and the other concur- rency properties. These algorithms are discussed below. 3.1 Instrumentation Framework Instrumentation takes aprogram and the properties it is expected to satisfy and produces an instrumented version of the program that when executed generates an event log with the information required for the observer to determine the correctness of the properties. For ,]PAX we have implemented a very general and powerful instrumentation package for instrumenting Java bytecode. The requirements of the instrumentation package include power, flexibility, ease of use, portability, and efficiency. We rejected alternative approaches such as instrumenting Java source code, using the debugging interface, and modi- fying the Java Virtual Machine because they violated one or another of these requirements. It is essential to minimize the impact of the instrumentation on program ex- ecution. This is especially the case for real time applications, applications that may particularly benefit from this approach. Low-impact instrumentation may require careful trades between what is computed locally by the instrumentation and the amount of data that need be transmitted to the observer. The instru- mentation package allows such trades to be made by allowing seamless insertion of Java code at any program point. Codeisinstrumentbeadseodnan instrument specification that consists of a collection of predicate-action rules. The predicate is a predicate on the byte- code instructions that are the translation of a Java statement. These predicates are conjunctions of atomic predicates that include predicates that distinguish statement types, presence of method invocations, pattern-matched references to fields and local variables and so on. The actions are specifications describing the inserted instrumentation code. The actions include reporting the program point (method, and source statement number), a time stamp, the executing thread, the statement type, the value of variables or an expression, and invocation of auxiliary methods. Values of primitive types are recorded in the event log, but if the value is an object a unique integer descriptor of the object is recorded. This has been implemented using Jtrek [5], a Java API that provides lower- level instrumentation functionality. In general, use of bytecode instrumentation, and use of Jtrek in particular, has worked out well, but there are some remaining challenges with respect to instrumenting the concurrency aspects of program execution. 3.2 Observer Framework As described above, run time analysis is divided into two parts: instrumenting and running the instrumented program, which produces a series of events, and observing these events. The second part, event observation (see Figure 2), can be split into two stages: event analysis, which reads the events and reconstructs the run-time context, and event interpretation. Note that there may be many event interpreters. To minimize impact on the program under test, log entries contain minimal contextual information and the log entries from different threads are interleaved. Event analysis disentangles the interleaved entries and reconstructs the context from them. The contextual information, transmitted in internal events, include thread names, code locations, and reentrant acquisitions of locks (lock counts). The event analysis package maintains a database with the full context of the event log. This allows for writing simpler event interpreters, which can subscribe to particular event types made accessible through an observer interface [9] and which are completely decoupled from each other. Each event interpreter builds its own model of the event trace, which may consist of dependency graphs or other data structures. It is up to the event interpreter to record all relevant information for keeping a history of the events, since the context maintained by the event analysis changes dynamically with the event evaluation. Any information that needs to be kept for the final output, in addition to the context information, needs to be stored by the event interpreter. If an analysis detects violations of its rules in the model, it can then show the results using the stored data and context information such as code locations, thread names, etc. Observer Event Event analysis i Internal events Events E_ Observable events Result Fig. 2. The observer architecture. 3.3 Temporal Logic Monitoring Temporal logic in general, and Linear-time Temporal Logic (LTL) in particular, has been investigated for the last twenty )'ears as a specification language for reactive systems [22]. LTL is a propositional logic with the standard connectives A, V, -4 and -_. It furthermore includes four temporal operators: Dp (always p), 0p (eventually p), p5/q (p until q - and q has to eventually occur), op (in next step p), and four dual past-time operators (always p in the past, p some time in the past, p since q, and in previous step p). As an example, consider the future time formula [](p --4 0q). It states that it is always the case ([3), that when p holds, then eventually (0) q holds. LTL has the property of being intuitively similar to natural language and capable of describing many interesting properties of reactive systems. Metric Temporal Logic (MTL) extends LTL so that every temporal operator can be augmented with a relative-time or real-time constraint. Hence, for exam- ple, p Uc<5 q means p must be true until a future time, at most 5 c real-time units in the future, where q must hold. Here c is some clock. Similarly p L/<5 q requires q to be true at most 5 cycles in the future, using the underlying state, or cycle based semantics to define the notion of a cycle. As mentioned, we have ex- perimented with two systems that perform temporal logic monitoring: DBRover [6] and JPaX [14,1]. The DBRover Temporal Observer The DBRover is a MTL monitoring tool, based on the TemporalRover code generator [6]. DBRover extends MTL withtwoformsofparametrizatimonu:lti-instanciwngh,ichallowfsoraruleto beindependenvtalylidatepderinstancoefanobjecctlassp,rocesosr,thread, andparametrizatiboansedontimeseriedsatavalues[7],whichenabletshe verificatioonfpropertiessuchastemporasltabilityandmonotoniciIttyc.onsists ofaGUIforeditingtemporaaslsertionasg,raphicLaTl L/MTLsimulatoar,ndan executioenngineT.heDBRovebruildsandexecutetesmporarullesforatarget programorapplicatioant;runtime,theDBRovelirstensforevenmt essages from thetargetapplicationand evaluatescorresponding temporal assertions. The JPaX Temporal Observer With respect to temporal logics, we have implemented several specialized algorithms in JPaX: traversing the execution trace either forwards or backwards, based on either rewriting or automata gen- eration, implemented in either Java or Maude [4]. We next briefly sketch one of these algorithms and refer the interested readers to more elaborated descrip- tions. Efficiency of runtime analysis algorithms is always an important aspect of our research, even if the observer operates off-line. A crucial observation is that one can design more efficient algorithms if one focusses on segments of temporal logics rather than on the entire logic. Thus, we were able to develop optimal algorithms for future time and for past time temporal logics separately. We do not regard this segmentation as a problem in practice, because in our experience so far one rarely or never uses both future and past time operators in the same requirements formula. The algorithm we are going to describe monitors future time temporal logic formulas and is entirely based on rewriting technology. The idea is to maintain a set of monitoring requirements as future time LTL formulas and modify them accordingly when a new event is emitted by the instrumented program. If one of these formulas ever becomes false then it means that that formula has been violated, so an error message is generated and an appropriate action is taken. Four rewriting rules, inspired from known recurrences of tem- poral operators, transform the formulas whenever a new nonterminal event e is received (and four others not mention here are called on terminal events): (oF){e} _ F, (I-1F){e} -+ F{e} A 7]F, (0F){e} -+ F{e} VOF, (Y 5/F'){e} -+ F'{e} V (F{e} A F U F') The formula F{e}, for some formula F, is the (transformed) formula which should hold next, after receiving the event e. For example, for OF to hold now, where the first event in the remaining trace is e, either F must hold now (F{e}), or OF must again hold in the future, thus postponing the obligation. Using memoization (or hashing) techniques provided by advanced rewriting engines such as Maude, the simple rewriting algorithm above performs well in practice. We were for example able to monitor 100 million events in less than 3 minutes on a formula [](9 --+ (-_r) L/y) stating a safety policy of a traffic light controller (yellow should come after green). The interested reader is referred to [15,16] for proofs of correctness, complexity analysis and evaluation of this algorithm. AseconadpproactohbuildingLTLobservebrasseodnautomatcaonstruc- tionisfoundin[16]A. rewriting-basaelgdorithmformonitorinpgasttime LTL requirements formulas has been presented in [13], which is quite different from the one for future time LTL. A dynamic algorithm approach to monitoring past time LTL formulas is presented in [17]. 3.4 The JPaX Concurrency Analyzer Multi-threaded programs are particularly difficult to test due to the fact that they are non-deterministic. A multi-threaded program consists of several threads that execute in parallel. A main issue for a programmer of a multi-threaded application is to ensure mutual exclusion to shared objects. That is: to avoid data races where one thread writes to an object while other threads simultaneously either write to or read the same object. Multi-threading programming languages therefore provide constructs for ensuring mutual exclusion. To ensure mutual exclusion on a shared object, a thread can acquire a lock before accessing the object, releasing it after. If other threads re:quire the same lock when accessing the object, mutual exclusion is guaranteed. If threads do not acquire the same lock (or don't acquire locks at all) when accessing an object then there is a risk of a data race. The Eraser algorithm [23J can detect such disagreements by analyzing single execution traces. The Eraser algorithm has been implemented in the JPaX tool. Deadlocks can occur when two or more threads acquire locks in a cyclic manner. As an example of such a situation consider two threads T1 and T2 both acquiring locks A and B. Thread 7"1 acquires first A and then B before releasing A. Thread T_ acquires B and then A before releasing B. This situation poses a deadlock situation since thread T1 can acquire A whereafter thread T2 acquires B, where after both threads cannot progress further. In JPaX we have implemented an algorithm for detecting such deadlock situations. It builds a lock graph, where nodes are locks and edges represent the lock hierarchy. That is, for the above example, there will be an edge from A to B and another edge from B to A. Hence for this example the graph contains a cycle, and a cycle represents a potential deadlock situation. This algorithm yields false positives (false warnings) and false negatives (missed deadlocks). In [1] an extension to this algorithm is described that reduces the number of false positives. JPaX's concurrency analysis has been integrated with the DBRover, where deadlock results are graphically displayed as UML message sequence charts. 4 Case Study 1: A Planetary Rover Controller Our first case study is the planetary rover controller K9, and in particular its executive subsystem, developed at NASA Ames Research Center - a full account of this case study is described in [3]. The executive receives plans of actions that the rover is requested to carry out, and executes these plans. First we present a description of the system, including a description of what plans (the input 10 Plan -_ Node Node -+ Block [ Task (block :id plan Block --+ (block :continue-on-failure NodeA_r :node-list ( :node-list (NodeList) ) (task :id drivel NodeList :start-condition (ti_8 +l +5) :end-condition (time +I +30) Task (task :action BaseMovel NodeAttr :duration 20 :action Symbol ) [:tan] (task [:duration Duration Time]) :id drive2 :end-condition (time +i0 +16) Node Attr -_ :id Symbol :action BaseMove2 [:start-condition Conditio_ :fail [:end-condition Condition] ) [:continue-on-failure] ) ) Condition --+ (time StartTirne EndTime) Fig. 3. Plan grammar (left) and an example of a plan (right). domain) look like. Then we outline how plans (test inputs) can be automatically generated using model checking, and finally we describe how for each plan a set of temporal logic properties can be automatically generated, that the executive must satisfy when executing the plan. 4.1 System Description The executive is a multi-threaded system (8,000 lines of Java code) that receives flexible plans from a planner, which it executes according to a plan language semantics. A plan is a hierarchical structure of actions that the rover must perform. Traditionally, plans are deterministic sequences of actions. However, increased rover autonomy requires added flexibility. The plan language therefore allows for branching based on conditions that need to be checked, and also for flexibility with respect to the starting time and ending time of an action. We give here a short presentation of the (simplified) language used in the description of the plans that the rover executive must execute. Plan Syntax A plan is a node; a node is either a task, corresponding to an action to be executed, or a block, corresponding to a logical group of nodes. Figure 3 (left) shows the grammar for the language; we should note that all the

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.