ebook img

Data Mining Using Grammar Based Genetic Programming and Applications PDF

228 Pages·2000·1.868 MB·English
by  WongCheung.
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Mining Using Grammar Based Genetic Programming and Applications

DATA MINING USING GRAMMAR BASED GENETIC PROGRAMMING AND APPLICATIONS GENETIC PROGRAMMING SERIES SeriesEditor JohnKoza Stanford University Also in the series: GENETIC PROGRAMMING AND DATA STRUCTURES: Genetic Programming+ Data Structures = Automatic Programming! William B. Langdon;ISBN: 0-7923-8135-1 AUTOMATIC RE-ENGINEERING OF SOFTWARE USING GENETIC PROGRAMMING, Conor Ryan; ISBN: 0-7923-8653-1 The cover image was generated using Genetic Programming and interactive selection. Anargyros Sarafopoulos created the image, and the GP interactive selectionsoftware. DATA MINING USING GRAMMAR BASED GENETIC PROGRAMMING AND APPLICATIONS by Man Leung Wong Lingnan University, Hong Kong KwongSakLeung The Chinese University of Hong Kong KLUWER ACADEMIC PUBLISHERS NEW YORK / BOSTON / DORDRECHT / LONDON / MOSCOW eBookISBN: 0-306-47012-8 Print ISBN: 0-792-37746-X ©2002 Kluwer Academic Publishers New York, Boston, Dordrecht, London, Moscow All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America Visit Kluwer Online at: http://www.kluweronline.com and Kluwer's eBookstore at: http://www.ebooks.kluweronline.com Contents LIST OF FIGURES ............................................................................................ ix LIST OF TABLES .............................................................................................. xi PREFACE..........................................................................................................xiii CHAPTER 1 INTRODUCTION ....................................................................... 1 1.1. DATAMINING......................................................................................... 1 1.2. MOTIVATION........................................................................................... 3 1.3. CONTRIBUTIONS OF THE BOOK............................................................... 5 1.4. OUTLINE OF THE BOOK........................................................................... 7 CHAPTER 2 AN OVERVIEW OF DATA MINING ...................................... 9 2.1. DECISIONTREEAPPROACH..................................................................... 9 2.1.1. ID3............................................................................................. 10 2.1.2. C4.5.......................................................................................... 11 2.2. CLASSIFICATIONRULE............................................................................ 12 2.2.1. AQ Algorithm .......................................................................... 13 2.2.2. CN2 ................................................................................................. 14 2.2.3. C4.5RULES................................................................................ 15 2.3. ASSOCIATIONRULE........................................................................... 16 2.3.1. Apriori ............................................................................................ 17 2.3.2. Quantitative Association Rule Mining ........................................... 18 2.4 STATISTICALAPPROACH........................................................................... 19 2.4.1. Bayesian Classifier ........................................................................ 19 2.4.2. FORTY-NINER .............................................................................. 20 2.4.3. EXPLORA..........................................................................................21 2.5 BAYESIANNETWORKLEARNING............................................................. 22 2.6. OTHERAPPROACHES............................................................................25 CHAPTER 3 AN OVERVIEW ON EVOLUTIONARY ALGORITHMS .. 27 3.1. EVOLUTIONARYALGORITHMS..............................................................27 3.2. GENETICALGORITHMS(GAs) ..............................................................29 3.2.1. The Canonical Genetic Algorithm ................................................30 3.2.1.1. Selection Methods ................................................................. 34 3.2.1.2. Recombination Methods ....................................................... 36 3.2.1.3. Inversion and Reordering ...................................................... 39 3.2.2. Steady State Genetic Alg .............................................................. 40 3.2.3. Hybrid Algorithms ....................................................................... 41 3.3. GENETICPROGRAMMING(GP)............................................................. 41 3.3.1. Introduction to the Traditional GP .............................................. 42 3.3.2. Strongly Typed Genetic Programming (STGP) ........................... 47 vi Contents 3.4. EVOLUTIONSTRATEGIES(ES).............................................................. 48 3.5. EVOLUTIONARYPROGRAMMING(EP)................................................... 53 CHAPTER 4 INDUCTIVE LOGIC PROGRAMMING ............................... 57 4.1. INDUCTIVECONCEPTLEARNING........................................................... 57 4.2. INDUCTIVELOGICPROGRAMMING(ILP).............................................. 59 4.2.1. Interactive ILP ............................................................................. 61 4.2.2. Empirical ILP ............................................................................... 62 4.3. TECHNIQUESANDMETHODSOFILP..................................................... 64 4.3.1. Bottom-up ILP Systems ................................................................ 64 4.3.2. Top-down ILP Systems ................................................................. 65 4.3.2.1. FOIL........................................................................................ 65 4.3.2.2.mFOIL..................................................................................... 68 CHAPTER 5 THE LOGIC GRAMMARS BASED GENETIC PROGRAMMING SYSTEM(LOGENPRO).................................................. 71 5.1. LOGIC GRAMMARS ............................................................................... 72 5.2. REPRESENTATIONS OF PROGRAMS ........................................................ 74 5.3. CROSSOVER OF PROGRAMS................................................................... 81 5.4. MUTATION OF PROGRAMS ..................................................................... 94 5.5. THEEVOLUTIONPROCESSOFLOGENPRO......................................... 97 5.6. DISCUSSION .......................................................................................... 99 CHAPTER 6 DATA MINING APPLICATIONS USING LOGENPRO ... 101 6.1. LEARNINGFUNCTIONALPROGRAMS ................................................... 101 6.1.1. Learning S-expressions Using LOGENPRO ..............................102 6.1.2. The DOT PRODUCT Problem .......................................... 104 6.1.3. Learning Sub-functions Using Explicit Knowledge ....................110 6.2. INDUCINGDECISIONTREESUSINGLOGENPRO...............................115 6.2.1. Representing Decision Trees as S-expressions ....................... 115 6.2.2. The Credit Screening Problem ................................................. 117 6.2.3. The Experiment ..................................................................... 119 6.3. LEARNINGLOGICPROGRAMFROMIMPERFECTDATA........................ 125 6.3.1. The Chess Endgame Problem ....................................................127 6.3.2. The Setup of Experiments ......................................................128 6.3.3. Comparison of LOGENPRO With FOIL ....................................131 6.3.4. Comparison of LOGENPRO With BEAM-FOIL........................133 6.3.5. Comparison of LOGENPRO With mFOIL1 ...............................133 6.3.6. Comparison of LOGENPRO With mFOIL2 ...............................134 6.3.7. Comparison of LOGENPRO With mFOIL3 ...............................135 6.3.8. Comparison of LOGENPRO With mFOIL4 ...............................135 6.3.9. Discussion..................................................................................136 CHAPTER 7 APPLYING LOGENPRO FOR RULE LEARNING ........... 137 7.1. GRAMMAR..........................................................................................137 7.2. GENETICOPERATORS.......................................................................... 141 vii 7.3. EVALUATION OF RULES......................................................................143 7.4. LEARNINGMULTIPLERULESFROMDATA..........................................145 7.4.1. Previous Approaches .................................................................. 146 7.4.1.1. Pre-selection.................................................................. 146 7.4.1.2. Crowding............................................................................ 146 7.4.1.3. Deterministic Crowding .................................................... 147 7.4.1.4. Fitness Sharing .................................................................. 147 7.4.2. Token Competition .....................................................................148 7.4.3. The Complete Rule Learning Approach .....................................150 7.4.4. Experiments With Machine Learning Databases ...................... 152 7.4.4.1. Experimental Results on the Iris Plant Database ...................... 153 7.4.4.2. Experimental Results on the Monk Database .......................... 156 CHAPTER 8 MEDICAL DATA MINING ...................................................161 8.1. A CASESTUDY ON THE FRACTUREDATABASE...................................161 8.2. A CASESTUDY ON THE SCOLIOSISDATABASE...................................164 8.2.1. Rules for Scoliosis Classification.............................................165 8.2.2. Rules About Treatment ...............................................................166 CHAPTER 9 CONCLUSION AND FUTURE WORK ...............................169 9.1. CONCLUSION.......................................................................................169 9.2. FUTUREWORK....................................................................................172 APPENDIX A THE RULE SETS DISCOVERED .......................................177 A.1. THEBESTRULESETLEARNED FROM THE IRISDATABASE................. 177 A.2. THEBESTRULESETLEARNED FROM THE MONKDATABASE............. 178 A.2.1. Monk1................................................................................ 178 A.2.2. Monk2........................................................................................... 179 A.2.3. Monk3..................................................................................... 182 A.3. THEBESTRULESETLEARNED FROM THE FRACTUREDATABASE.............. 183 A.3.1. Type I Rules: About Diagnosis ...................................................183 A.3.2. Type II Rules: About Operation/Surgeon ........................................ 184 A.3.3. Type III Rules: About Stay ......................................................... 186 A.4. THEBESTRULESETLEARNED FROM THE SCOLIOSISDATABASE.............. 189 A.4.1. Rules for Classification ..............................................................189 A.4.1.1. King-I................................................................................... 189 A.4.1.2. King-II.................................................................................. 190 A.4.1.3. King-III............................................................................. 191 A.4.1.4. King-IV................................................................................ 191 A.4.1.5. King-V................................................................................. 192 A.4.1.6. TL......................................................................................... 192 A.4.1.7. L........................................................................................... 193 A.4.2. Rules for Treatment ......................................................................... 194 A.4.2.1. Observation......................................................................... 194 A.4.2.2. Bracing............................................................................ 194 viii Contents APPENDIX B THE GRAMMAR USED FOR THE FRACTURE AND SCOLIOSIS DATABASES .......................................................................... 197 B.1. THE GRAMMAR FOR THE FRACTURE DATABASE ................................ 197 B.2. THE GRAMMAR FOR THE SCOLIOSIS DATABASE ................................ 198 REFERENCES ............................................................................................. 199 INDEX .......................................................................................................... 211 List of figures FIGURE 2.1: ADECISION TREE ..........................................................................10 FIGURE 2.2: A BAYESIAN NETWORK EXAMPLE ................................................. 23 FIGURE3.1 : CROSSOVER OF CGA. A ONE-POINT CROSSOVER OPERATION IS PERFORMED ON TWO PARENT, 1100110011 AND 0101010101, AT THE FIFTH CROSSOVER LOCATION. TWO OFFSPRING, 1100110101 AND 0101010011 ARE PRODUCED....................................................................................................32 FIGURE 3.2: MUTATION OF CGA. A MUTATION OPERATION IS PERFORMED ON A PARENT 1100110101 AT THE FIRST AND THE LAST BITS. THE OFFSPRING 0100110100IS PRODUCED ............................................................................33 FIGURE 3.3: THE EFFECTS OF A TWO-POINT (MULTI-POINT) CROSSOVER. A TWO- POINT CROSSOVER OPERATION IS PERFORMED ON TWO PARENT, 11001100 AND 01010101, BETWEEN THE SECOND AND THE SIXTH LOCATIONS. TWO OFFSPRING, 11010100 AND01001101,ARE PRODUCED ................................37 FIGURE3.4: THE EFFECTS OF A UNIFORM CROSSOVER. A UNIFORM CROSSOVER OPERATION IS PERFORMED ON TWO PARENST, 1100110011 AND 0101010101, AND TWO OFFSPRING WILL BE GENERATED. THIS FIGURE ONLY SHOWS ONE OF THEM(1101110001).....................................................................................38 FIGURE 3.5: THE EFFECTS OF AN INVERSION OPERATION. AN INVERSION OPERATION IS PERFORMED ON THE PARENT, 1100110101, BETWEEN THE SECOND AND THE SIXTH LOCATIONS. AN OFFSPRING, 1111000101, IS PRODUCED....................................................................................................40 FIGURE3.6: A PARSE TREE OF THE PROGRAM (* (+ X (/ Y 1.5)) (- z 0.3))..................................................................................................43 FIGURE3.7: THE EFFECTS OF CROSSOVER OPERATION. A CROSSOVER OPERATION IS PERFORMED ON TWO PARENTAL PROGRAMS, (* (* 0.5 X) (+ X Y) AND (/ (+ X Y) (* (-X Z) X)). THE SHADED AREAS ARE EXCHANGED AND TWO OFFSPRING GENERATED ARE: (* (-X Z) (t X Y)) AND(/ (+ X Y) (* (* 0.5 X) X)) ......................................................................................................46 FIGURE3.8: THE EFFECTS OF A MUTATION OPERATION. A MUTATION OPERATION IS PERFORMED ON THE PROGRAM (* (* 0.5 X) (+ X Y)).T HE SHADED AREA OF THE PARENTAL PROGRAM IS CHANGED TO A PROGRAM FRAGMENT ( / ( + Y 4 ) Z ) AND THE OFFSPRING PROGRAM (* (/ (+ Y 4) Z) (+ X Y)) IS PRODUCED. ................................... 47 FIGURE5.1 : A DERIVATION TREE OF THE S-EXPRESSION IN LISP (* (/W1.5) (/W1.5) (/W1.5)) ..................................................75 FIGURE5.2: ANOTHER DERIVATION TREE OF THE S-EXPRESSION (* (/W1.5) (/W1.5) (/W1.5)) ..................................................80 FIGURE5.3 : THE DERIVATIONS TREE OF THE PRIMARY PARENTAL PROGRAM (+ (-Z 3.5) (-Z 3.8) (/ Z 1.5))....................................... 87 FIGURE5.4: THE DERIVATIONS TREE OF THE SECONDARY PARENTAL PROGRAM (* (/ W 1. 5) (+ (-W 11) 12) (-W 3.5))......................... 87

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.