ebook img

NASA Technical Reports Server (NTRS) 20000054877: Automatic Generation of Test Oracles - From Pilot Studies to Application PDF

5 Pages·0.34 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview NASA Technical Reports Server (NTRS) 20000054877: Automatic Generation of Test Oracles - From Pilot Studies to Application

Automatic Generation of Test Oracles - From Pilot Studies to Application Martin S. Feather Ben Smith Jet Propulsion Laboratory, Jet Propulsion Laboratory, California Institute of Technology California Institute of Technology 4800 Oak Grove Drive 4800 Oak Grove Drive Pasadena, CA 91109, USA Pasadena, CA 91109, USA +1 818354 1194 +1 818 353 5371 Martin.S.Feather @Jpl.Nasa.Gov Ben.D.Smith @Jpl.Nasa.Gov ABSTRACT • First pilot study: rapid automated analysis (Section 2). In this study we determined the viability of a rapid There is atrend towards the increased use of automation in analysis approach. We did ease studies of two kinds of V&V. Automation can yield savings in time and effort. For traditior.al design information, yielding confirmation of the viability of the analysis method for this kind of critical systems, where thorough V&V is required, these information. savings can be substantial. • Second pilot study: application to an autonomous.- We describe a progression from pilot studies to planner (Section 3). We needed this second study to development and use of V&V automation. We used pilot determine suitability of the rapid analysis approach to, studies to ascertain opportunities for, and suitability of, specifically, checking plans generated by an AI automating various analyses whose results would planner. Particular concerns were scalability of the contribute to V&V. These studies culminated in the approach, and investment of domain experts' time. The development of an automatic generator of automated test pilot study produced instances of automatic test oracles. This was then applied and extended in the course oracles. of testing an AI planning system that is a key component of • Development of automated generator of planner test an autonomous spacecraft. oracles (Section 4). Based on the lessons learned from Keywords the second pilot study, we committed to developing a Test Oracles, Verification and Validation, Analysis, tool to be used in actual spacecraft testing. The tool Planning, NASA would go beyond the capabilities of the second pilot study by both extending aspects of the analyses 1 INTRODUCTION performed, and automating the generation of the test Cost, performance and functionality concerns are driving a oracles themselves. trend towards use of self-sufficient autonomous systems in • Application to V&V of spacecraft planner (Section 5). place of human-controlled mechanisms. Verification and We applied the tool during spacecraft planner testing. validation (V&V) of such systems is particularly crucial Using it, we checked thousands of test cases for given that they will operate for long periods with little or adherence to hundreds of flight rules. Additionally, we no human supervision. Furthermore, V&V must itself be extended it to perform additional validation checks of done at low cost, rapidly and effectively, even as the particularly complex rules. systems to which it is applied grow in complexity and • Lessons learned (Section 6). We describe lessons sophistication. learned for both software engineering: Spacecraft - especially deep space probes - exemplify • Our experience re-iterates several well- these concerns. We have been involved in V&V of an AI understood virtues of pilot studies as a precursor planner that is a key component of a spacecraft's to actual development. autonomous control system. In [Feather & Smith 1997] we • When domain experts" time is a critical resource, report our use of an automated generator of automated test follow an "on-demand" policy of knowledge oracles to support these V&V activities. The paper is acquisition. organized to show the progression of steps we followed leading up to this application, and the lessons we have and V&V: learnt by reflecting upon our experience: • V&V can make good use of redundancy and The architecture of the system developed in this phase is involved intervals of SEP thrusting would exercise a shown in Figure 3. For the remainder of this paper we will constraint of the tbrm "every thrusting interval must ret'cr to this system as the "planchecker". It has the same stages as the second pilot study, but with some additional Insights gained from development experience capabilities: The development effort did indeed culminate in the • Additional analyses: the planner experts asked for planchecker tool (use of which is discussed in the next further analyses beyond temporal constraints, notably section). We therefore confirmed the validity of the typechecking of plan elements, and cross-checking of conclusions drawn from the second pilot study. We also plan activities against their rationale (information on gained some further insights. These fell into two key areas: which is included in the generated plans). These • The second pilot study had suggested that the required loading additional information from plans into translation from planner constraints to database queries the database, and development of additional database would be straightforward. In practice, automating the queries. translation of the full planner language turned out to be • Automatic translation: there were over 200 temporal more complex than the pilot study had indicated (see planner constraints (counting each lowest-level clause Appendix B for examples). While a procedural as one constraint). Based on the observations of the approach to programming the planchecker's translator second pilot study, we recognized that manual sufficed to meet the development goals, we concluded translation of the whole set would be a tedious task. that translation warrants further attention. We will Worse yet, we expected the set of planner constraints return to this in Section 6, Lessons Learned. to grow and change over time. In keeping with our • In practice, testers need analysis results with more overall goal of judicious use of automation, it was content and structure than simply "pass" or "fail". decided build an automatic translator that would take Again, details can be found in Appendix B, and any constraint expressible in the planner language and discussion is deferred to Section 6. Lessons Learned. "- generate the equivalent database query. 5 USE OF ANALYSIS TOOL • Extended output: the planner experts wanted the The ptanchecker was used by the second author (a planning query results to report more than simply "OK" when a expert) during testing. Interaction with the V&V expert was plan passed the checks. In essence, they wanted a not required during this phase. justification for why a temporal constraint was satisfied. For example, a constraint that says every The planchecker was applied to check each plan generated. SEP-thrusting interval is followed by an SEP-idle Its results were accumulated alongside other statistics about interval would be justified by listing, for each SEP- the plan generation, e.g., how long it took to generate the thrusting interval, the specific SEP-idle interval found plan, how much memory was required to do so. It was easy to satisfy the constraint. to apply in "batch mode" to a whole series of plans. It was tolerably efficient, taking on the order of 2 minutes to • Coverage analysis: the planner experts also wanted to complete the checking of a typical plan. know which of the planner constraints had been exercised in the plan. For example, only plans that Over the course of use, several sets of changes were made activides Manual conditions PLANNER Conceptual decomposition Goals & initial fplan constraint and expression [ (natural _, Constraints Automatic language) Database schema loading of database expression DATABASE data _m I Database query ] Automatic analysis Query results Figure 4- Extended use of Planchecker (extensions shown in bold) out to be a driving concern. Our database-based approach Engineering (San Francisco: October 1996), ACM to analysis sufficed. More important to us was the Press, 106- I17. investment of effort that would be required of our domain [Dillon & Yu 19941 L. Dillon & Q. Yu. Oracles for experts, whose time was in short supply. This led us to automate the generation of test oracles from a domain- checking temporal properties of concurrent systems. Proceedings 2"a ACM SIGSOFT Symposium specific representation. Thus the domain experts' effort it would take to construct that generator became out dominant Foundations of Software Engineering (New Orleans, December 1994), ACM Press, 140-153. concern. Approaches that could reduce this kind of effort include the parameterized tableaus of [Dillon & {DS I 1998] http://nmp.jpl.nasa.gov/ds 1/ Ramakrishna 1996], or the algebraic-signature based mappings of {Reyes & Richardson 1998]. We found, {Feather1998] M.S. Feather. Rapid Application of however, the need to yield needed test results with finer Lightweight Formal Methods for Consistency Analyses. distinctions than simply "passed" or "failed." Information IEEE Transactions on Software Engineering, 24(1 I): 949-959, Nov 1998. about "passed" cases was useful to for test coverage analysis, and for ascertaining that the test had been passed [Feather & Smith 1998]. M.S. Feather & B. Smith. V&V of "for the fight reasons". Information about "failed" cases a Spacecraft's Autonomous Planner through Extended was useful to locate the relevant portions of the plan Automation. Proceedings of the 23"dAnnual Software contributing to those failures, and so speed the domain Engineering Workshop (NASA Goddard, MD, expert in debugging what was going wrong in the planner. December 1998). We are not aware of work on automatic generation of test [.lagadeesan et al 1997] L.J. Jagadeesan, A. Proter, C, oracles that supports this capability. Based on our practical Puchol, J.C. Ramming & L.G.Votta. Specification- experience of application of test oracle generation, we see based Testing of Reactive Software: Tools and the need for further investigation of this area. Experiments. Proceedings of the 19th International ACKNOWLEDGEMENTS Conference on Software Engineering (Boston, MA," May 1997), 525-535. The research described in this paper was carried out by the Jet Propulsion Laboratory, California Institute of [Pell 1996] B. Peil, D.E. Bernard, S.A. Chien, E. Gat, N. Technology, under acontract with the National Aeronautics Muscettola, P.P. Nayak, M.D. Wagner & B.C. and Space administration. Reference herein to any specific Williams. A Remote Agent Protoype for Spacecraft commercial product, process, or service by trade name, Autonomy. Proceedings of the SPIE conference on trademark, manufacturer, or otherwise, does not constitute Optical Science, Engineering and Instrumentation, or imply its endorsement by the United States Government 1996. or the .let Propulsion Laboratory, California Institute of Technology. {Pell 1997] B. Pell, D.E. Bernard, S.A. Chien, E. Gat, N. Muscettola, P.P. Nayak, M.D. Wagner & B.C. The authors thank the other members of the DS-1 planner Williams. An Autonomous Spacecraft Agent Prototype. team, Nicola Muscettola and Kanna Rajah, for their help. Proceedings First International Conference on REFERENCES Autonomous Agents. ACM Press, 1997. [Allen 1983] J.F. Allen. Maintaining Knowledge about [PAX 19981 Temporal Intervals. Communications of the ACM, http://rlmp.jpl.nasa.gov/dsl/tech/autora.html 26(11):832-843, 1983. [Reyes & Richardson 1998] A.A. Reyes & D.J. [Andrews 199.8] J.H. Andrews. Testing using Log File Richardson. Specification-Based Testing of Ada Units Analysis: Tools, Methods, and Issues. Proceedings of with Low Encapsulation. Proceedings of the 13's IEEE the 13'hIEEE International Conference on Automated International Conference on Automated Software Software Engineering (Honolulu, Hawaii, October Engineering (Honolulu, Hawaii, October 1998), IEEE 1998), IEEE Computer Society, 157-166. Computer Society, 22-31. [Cohen 1989] D. Cohen. Compiling Complex Database [Richardson, Aha & O'Malley 1992] D.J. Richardson, S.L. Transition Triggers. Proceedings of the ACM SIGMOD Aha & T.). O'Malley. Specification-based Test Oracles International Conference on the Management of Data for Reactive Systems. Proceedings of the 14th (Portland, Oregon, 1989), ACM Press, 225-234. International Conference on Software Engineering [Dillon & Ramakrishna 1996] L.K. Dillon & Y.S. (Melbourne, Australia, May 1992), 105-118. Ramakrishna. Generating Oracles from Your Favorite [SOHO 1998] SOHO Mission lnterr,_ption Preliminary. Temporal Logic Specifications. Proceedings 4'h ACM Status and Background Report - July 15. 1998 SIGSOFT Symposium Foundations of Software http://umbra.nascom, nasa.gov/soho/prelim_and backgr ound_rept.html a "compatibility." The V&V expert preferred to think of this as a "constraint," in keeping with the [Wasscrman & Blum 1997] H. Wasserman & M. Blum. terminology of the database tool. Another example is Software Reliability via Run-Time Result-Checking. the "?_any_value" term, which serves as a wildcard, JACM 44(6): 826-845, 1997. indicating any acceptable parameter value may occur [Wile 1997] D. Wile. Abstract Syntax from Concrete in the corresponding parameter position. Again, the Syntax. Proceedings of the 19th International V&V expert had the exact same concept, but preferred Conference on Software Engineering (Boston, MA, a different syntax. May 1997), 472-480. • Confirmation of shared understanding: there were APPENDIX A - DETAILS OF THE SECOND PILOT some areas of shared understanding, but these had to STUDY be confirmed, not taken for granted. A trivial example Example of planner constraint is "AND", which in the above is used to indicate that the The following example of one of the simpler plan constraint [compatibility] holds if all of the clauses of constraints, as expressed in the planner's special purpose this AND hold. More interesting are the terms "meets" language, will convey a feel for the challenges faced in this and "met-by," which are binary temporal relations pilot study: between intervals, drawn from the work by Allen (De fine_Compa tibi iity [Allen 1983]. ;; ;; Idle_Segment The net result was that the V&V expert required an intensive session of coaching on the meaning of the planner ;; (SINGLE ((SEP_Schedule SEP_Schedule_SV) ) notations (plans and constraint language) at the start of this (Idle Segment)) pilot study, and incremental assistance at various points :duration_bounds [i _plus infinity_] throughout. Overall this did not amount to an undue :compatibility_spec consumption of planner experts' time. (AND ;; Thrust and Idle segments must all Example of Translation from Planner Constraint to meet--no gaps Database Query (meets Consider the Idle_Segment constraint given earlier. Its (SINGLE essential core is the following: ((SEP_Schedule SEP_Schedule_SV) ) ((Thrust Segment (? any_value_ (SINGLE ((SEP_Schedule . . . (Idle_Segment)) ?_any_value_) )))) :compa tibi iity_spec (me t_by (AND (SINGLE (meets (SINGLE ((SEP_Schedule ... ((SEP_Schedule SEP_Schedule_SV) ) (Thrust_Segment (?,?))) ((Thrust_Segment (?_any value_ (met_by (SINGLE ((SEP_Schedule ... ?_any_value_) )))))) (Thrust_Segment (?,?)))) This illustrates several areas where knowledge held by the The fragments (SINGLE ((SEP_Schedule . .. introduce planner experts had to be acquired by the V&V expert: descriptions that are to match to activities of the SEP scheduled in the plan. The first such description is of an • Overall application domain knowledge: "SEP" is an Idle_Segment activity. For every instance of an activity acronym for "Solar Electric Propulsion," the in the plan matching that description, the constraint innovative engine that provides thrust to DS-1. requires that the logical condition (AND ... ) is true. The "Thrust" and "Idle" are the two main states this logical condition is the conjunct of two clauses. The first engine cab be in. says that the matching instance meets a Thrus t_Segment Knowledge such as this of the spacecraft domain activity, i.e., the end-point of the Idle_Segment activity provided useful intuition to the V&V expert, and this exactly coincides with the start point of some second pilot study warranted a deeper level of Thrust_Segment also in the plan. The second says that understanding than had been necessary for the first the matching instance is met_by a Thrust_Segment pilot study. activity, i.e., the start point of the former exactly coincides with the end point of the latter Pictorially, • Problem-specific terminology: "SINGLE" has a connotation specific to DS-l's planner. It introduces a 7_azust__t I Id.l.e...Segre,at I_arust_SeOrmat I description that matches a single interval. (One alternatives is "MULTIPLE," introducing a description that matches a contiguous sequence of intervals). • Terminological variants: The overall definition is of For translation, this is split into two separate constraints, All the DS-I planner constraints taketheoverall form: In an early version of theplanner, a few of the for every activity-I that matches description-I there constraints referenced information that is not stored in exists an activity-2 that matches description-2. A plans. In essence, this external information directed constraint of this form is trivially satisfied if the plan which one of several constraints is to apply. The contains no activities matching description-l. The planchecker's constraint translations handle these planchecker separates trivial and non-trivial cases in its circumstances by checking each alternative. If all fail, reports of constraint satisfaction. it is an anomaly. If the plan is found to satisfy one of the alternatives, again, a special kind of constraint The DS- 1planner generates plans for asegment of the satisfaction is reported, which included the deduction entire mission (e.g., one week). Thus aplan is bounded of what the external information must he to direct the within some "horizon"- it has a start and an end. Yet, choice of the satisfied constraint. the constraints may extend across this planning horizon. Such an instance is reported as a special kind The details are domain-specific, but we see a recurring of constraint satisfaction in which the plan satisfies the need to make distinctions among classes of "pass" reports, constraint within its horizon, but defers some residual and structure the analysis results accordingly. checking for the next plan. The details of all such deferred checks are included within the planchecker's report. II

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.