ebook img

Data analysis, machine learning and applications : proceedings of the 31st Annual Conference of the Gesellschaft fü̈r Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7-9, 2007 PDF

2008·10.3 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data analysis, machine learning and applications : proceedings of the 31st Annual Conference of the Gesellschaft fü̈r Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7-9, 2007

Studies in Classification, Data Analysis, and Knowledge Organization ManagingEditors EditorialBoard H.-H.Bock,Aachen Ph.Arabie,Newark W.Gaul,Karlsruhe D.Baier,Cottbus M.Vichi,Rome F.Critchley,MiltonKeynes R.Decker,Bielefeld E.Diday,Paris M.Greenacre,Barcelona C.Lauro,Naples J.Meulman,Leiden P.Monari,Bologna S.Nishisato,Toronto N.Ohsumi,Tokyo O.Opitz,Augsburg G.Ritter,Passau M.Schader,Mannheim C.Weihs,Dortmund Titles in the Series: E. Diday, Y. Lechevallier, and M. Schwaiger and O. Opitz (Eds.) O. Opitz (Eds.) Ordinal and Exploratory Data Analysis in Symbolic Data Analysis. 1996 Empirical Research. 2003 R. Klar and O. Opitz (Eds.) M. Schader, W. Gaul, and M. Vichi (Eds.) Classification and Knowledge Between Data Science and Applied Organization. 1997 Data Analysis. 2003 C. Hayashi, N. Ohsumi, K. Yajima, H.-H. Bock, M. Chiodi, and Y. Tanaka, H.-H. Bock, and Y. Baba (Eds.) A. Mineo (Eds.) Data Science, Classifaction, Advances in Multivariate Data and Related Methods. 1998 Analysis. 2004 I. Balderjahn, R. Mather, and D. Banks, L. House, F.R. McMorris, M. Schader (Eds.) P. Arabie, and W. Gaul (Eds.) Classification, Data Analysis, and Classification, Clustering, and Data Data Highways. 1998 Minig Applications. 2004 A. Rizzi, M. Vichi, and H.-H. Bock (Eds.) D. Baier and K.-D. Wernecke (Eds.) Advances in Data Science Innovations in Classification, Data and Classification. 1998 Science, and Information Systems. 2005 M. Vichi and O. Optiz (Eds.) M. Vichi, P. Monari, S. Mignani, and Classification and Data Analysis. 1999 A. Montanari (Eds.) New Developments in Classification W. Gaul and H. Locarek-Junge (Eds.) and Data Analysis. 2005 Classification in the Information Age. 1999 D. Baier, R. Decker, and L. Schmidt-Thieme (Eds.) Data Analysis and Decision Support. 2005 H.-H. Bock and E. Diday (Eds.) Analysis of Symbolic Data. 2000 C. Weihs and W. Gaul (Eds.) Classification - the Ubiquitous H. A. L. Kiers, J.-P. Rasson, P.J.F. Challenge. 2005 Groenen, and M. Schader (Eds.) Data Analysis, Classification, and M. Spiliopoulou, R. Kruse, C. Related Methods. 2000 Borgelt, A. Nürnberger, and W. Gaul (Eds.) From Data and Information Analysis W. Gaul, O. Opitz, M. Schader (Eds.) to Knowledge Engineering. 2006 Data Analysis. 2000 V. Batagelj, H.-H. Bock, A. Ferligoj, R. Decker and W. Gaul (Eds.) and A. Žiberna (Eds.) Classification and Information Data Science and Classification. 2006 Processing at the Turn of the Millenium. 2000 S. Zani, A. Cerioli, M. Riani, M. Vichi (Eds.) Data Analysis, Classification and the S. Borra, R. Rocci, M. Vichi, Forward Search. 2006 and M. Schader (Eds.) Advances in Classification and Data P. Brito, P. Bertrand, G. Cucumel, Analysis. 2000 F. de Carvalho (Eds.) Selected Contributions in Data Analysis W. Gaul and G. Ritter (Eds.) and Classification. 2007 Classification, Automation, and New Media. 2002 R. Decker, H.-J. Lenz (Eds.) Advances in Data Analysis. 2007 K. Jajuga, A. Sokolowski, and H.-H. Bock (Eds.) C. Preisach, H. Burkhardt, L. Schmidt-Thieme, Classification, Clustering and Data R. Decker (Eds.) Analysis. 2002 Data Analysis, Machine Learning and Applications. 2008 · Christine Preisach Hans Burkhardt · Lars Schmidt-Thieme Reinhold Decker (Editors) Data Analysis, Machine Learning and Applications Proceedingsof the 31st Annual Conference of the Gesellschaftfür Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7–9, 2007 With226figuresand96tables 123 Editors Christine Preisach Professor Dr. Hans Burkhardt Institute of Computer Science and Lehrstuhl für Mustererkennung und Institute of Business Economics and Bildverarbeitung Information Systems Universität Freiburg University of Hildesheim Gebäude 052 Marienburgerplatz 22 79110 Freiburg i. Br. 31141 Hildesheim Germany Germany Professor Dr. Dr. Lars Schmidt-Thieme Professor Dr. Reinhold Decker Institute of Computer Science and Fakultät für Wirtschaftswissenschaften Institute of Business Economics and Lehrstuhl für Betriebswirtschaftslehre, Information Systems insbes. Marketing Marienburgerplatz 22 Universitätsstraße 25 31141 Hildesheim 33615 Bielefeld Germany Germany ISBN: 978-3-540-78239-1 e-ISBN: 978-3-540-78246-9 Library of Congress Control Number: 2008925870 © 2008 Springer-Verlag Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifi cally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfi lm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable for prosecution under the German Copyright Law. The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a specifi c statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover Design: WMX Design GmbH, Heidelberg, Germany Printed on acid-free paper 5 4 3 2 1 0 springer.com Preface This volume contains the revised versions of selected papers presented during the 31stAnnualConferenceoftheGermanClassificationSociety(GesellschaftfürKlas- sifikation – GfKl). The conference was held at the Albert-Ludwigs-University in Freiburg,Germany,inMarch2007.ThefocusoftheconferencewasonDataAnaly- sis,MachineLearning,andApplications,itcomprised200talksin36sessions.Ad- ditionally 11 plenary and semi-plenary talks were held by outstanding researchers. With 292 participants from 19 countries in Europe and overseas this GfKl Confer- ence, once again, provided an international forum for discussions and mutual ex- change of knowledge with colleagues from different fields of interest. From alto- gether 120 full papers that had been submitted for this volume 82 were finally ac- cepted. With the occasion of the 30st anniversary of the German Classification Society theassociatedsocietiesSekcjaKlasyfikacjiiAnalizyDanychPTS(SKAD),Verenig- ingvoorOrdinatieenClassificatie(VOC),JapaneseClassificationSociety(JCS)and ClassificationandDataAnalysisGroup(CLADAG)havesponsoredthefollowingin- vitedtalks:PaulEilers-StatisticalClassificationforReliableHigh-volumeGenetic Measurements(VOC);EugeniuszGatnar-FusionofMultipleStatisticalClassifiers (SKAD);AkinoriOkada-Two-DimensionalCentralityofaSocialNetwork(JCS); Donatella Vicari - Unsupervised Multivariate Prediction Including Dimensionality Reduction(CLADAG). Thescientificprogramincludedabroadrangeoftopics,besidesthemaintheme oftheconference,especiallymethodsandapplicationsofdataanalysisandmachine learningwereconsidered.Thefollowingsessionswereestablished: I.TheoryandMethods SupervisedClassification,Discrimination,andPatternRecognition(G.Ritter);Clus- ter Analysis and Similarity Structures (H.-H. Bock and J. Buhmann); Classifica- tion and Regression (C. Bailer-Jones and C. Hennig); Frequent Pattern Mining (C. Borgelt);DataVisualizationandScalingMethods(P.Groenen,T.Imaizumi,andA. Okada);ExploratoryDataAnalysisandDataMining(M.MeyerandM.Schwaiger); MixtureAnalysisinClustering(S.Ingrassia,D.Karlis,P.SchlattmannandW.Sei- VI Preface del); Knowledge Representation and Knowledge Discovery (A. Ultsch); Statistical Relational Learning (H. Blockeel and K. Kersting); Online Algorithms and Data Streams(C.Sohler);AnalysisofTimeSeries,LongitudinalandPanelData(S.Lang); ToolsforIntelligentDataAnalysis(M.HahslerandK.Hornik);DataPreprocessing andInformationExtraction(H.-J.Lenz);TypingforModeling(W.Esswein). II.Applications MarketingandManagementScience(D.Baier,Y.Boztug,andW.Steiner);Banking and Finance (K. Jajuga and H. Locarek-Junge); Business Intelligence and Person- alization(A.Geyer-SchulzandL.Schmidt-Thieme);DataAnalysisinRetailing(T. Reutterer); Econometrics and Operations Research (W. Polasek); Image and Sig- nal Analysis (H. Burkhardt); Biostatistics and Bioinformatics (R. Backofen, H.-P. KlenkandB.Lausen);MedicalandHealthSciences(K.-D.Wernecke);TextMining, WebMining,andtheSemanticWeb(A.NürnbergerandM.Spiliopoulou);Statistical NaturalLanguageProcessing(P.Cimiano);Linguistics(H.GoeblandP.Grzybek); SubjectIndexingandLibraryScience(H.-J.HermesandB.Lorenz);StatisticalMu- sicology (C. Weihs); Archaeology and Archaeometry (M. Helfert and I. Herzog); Psychology(S.Krolak-Schwerdt);DataAnalysisinHigherEducation(A.Hilbert). ContributedSessions(byCLADAGandSKAD) Latent class models for classification (A. Montanari and A. Cerioli); Classification and models for interval-valued data (F. Palumbo); SelectedProblems in Classifica- tion(E.Gatnar);RecentDevelopmentsinMultidimensionalDataAnalysisbetween research and practice I (L. D’Ambra); Recent Developments in Multidimensional DataAnalysisbetweenresearchandpracticeII(B.Simonetti). The editors would like to emphatically thank all the section chairs for doing suchagreatjobregardingtheorganizationoftheirsectionsandtheassociatedpaper reviews. Cordial thanks also go to the members of the scientific program committee for their conceptual and practical support as well as for the paper reviews: D. Baier (Cottbus), H.-H. Bock (Aachen), H. Bozdogan (Tennessee), J. Buhmann (Zürich), H.Burkhardt(Freiburg),A.Cerioli(Parma);R.Decker(Bielefeld),W.Gaul(Karl- sruhe),A.Geyer-Schulz(Karlsruhe),P.Groenen(Rotterdam),T.Imaizumi(Tokyo), K.Jajuga(Wroclaw),R.Kruse(Magdeburg),S.Lang(Innsbruck),B.Lausen(Erlan- gen-Nürnberg), H.-J. Lenz (Berlin), F. Murtagh (London), H. Ney (Aachen), A. Okada (Tokyo), L. Schmidt-Thieme (Hildesheim), C. Schnoerr (Mannheim), M. Spiliopoulou(Magdeburg),C.Weihs(Dortmund),D.A.Zighed(Lyon). Furthermorewewouldliketothanktheadditionalreviewers:A.Hotho,L.Mar- inho,C.Preisach,S.Rendle,S.Scholz,K.Tso. The great success of this conference would not have been possible without the support of many people mainly working in the backstage. We would like to par- ticularly thank M. Temerinac (Freiburg), J. Fehr (Freiburg), C. Findlay (Freiburg), E. Patschke (Freiburg), A. Busche (Hildesheim), K. Tso (Hildesheim), L. Marinho (Hildesheim) and the student support team for their hard work in the preparation Preface VII of this conference, for the support during the event and the post-processing of the conference. TheGfKlConference2007wouldnothavebeenpossibleinthewayittookplace withoutthefinancialand/ormaterialsupportofthefollowinginstitutionsandcom- panies(inalphabeticalorder):Albert-Ludwigs-UniversityFreiburg–FacultyofAp- pliedSciences,GesellschaftfürKlassifikatione.V.,MicrosoftMünchenandSpringer Verlag.Weexpressourgratitudetoallofthem.Finally,wewouldliketothankDr. Martina Bihn from Springer Verlag, Heidelberg, for her support and dedication to theproductionofthisvolume. Hildesheim,FreiburgandBielefeld,February2008 ChristinePreisach HansBurkhardt LarsSchmidt-Thieme ReinholdDecker Contents PartI Classification Distance-basedKernelsforReal-valuedData LluísBelanche,JeanLuisVázquez,MiguelVázquez ..................... 3 FastSupportVectorMachineClassificationofVeryLargeDatasets JanisFehr,KarinaZapiénArreola,HansBurkhardt...................... 11 FusionofMultipleStatisticalClassifiers EugeniuszGatnar ................................................. 19 Calibrating Margin–based Classifier Scores into Polychotomous Probabilities MartinGebel,ClausWeihs ......................................... 29 ClassificationwithInvariantDistanceSubstitutionKernels BernardHaasdonk,HansBurkhardt .................................. 37 ApplyingtheKohonenSelf-organizingMapNetworkstoSelectVariables KamilaMigda(cid:273)Najman,KrzysztofNajman ............................. 45 ComputerAssistedClassificationofBrainTumors NorbertRöhrl,JoséR.Iglesias-Rozas,GaliaWeidl....................... 55 Model Selection in Mixture Regression Analysis – A Monte Carlo SimulationStudy MarkoSarstedt,ManfredSchwaiger .................................. 61 ComparisonofLocalClassificationMethods JuliaSchiffner,ClausWeihs ......................................... 69 Incorporating Domain Specific Information into Gaia Source Classification KesterW.Smith,CarolaTiede,CorynA.L.Bailer-Jones................... 77 X Contents IdentificationofNoisyVariablesforNonmetricandSymbolicDatain ClusterAnalysis MarekWalesiak,AndrzejDudek ..................................... 85 PartII Clustering FamiliesofDendrograms PatrickErikBradley ............................................... 95 MixtureModelsinForwardSearchMethodsforOutlierDetection DanielaG.Calò ..................................................103 OnMultipleImputationThroughFiniteGaussianMixtureModels MarcoDiZio,UgoGuarnera........................................111 Mixture Model Based Group Inference in Fused Genotype and PhenotypeData BenjaminGeorgi,M.AnneSpence,PamelaFlodman,AlexanderSchliep .....119 TheNoiseComponentinModel-basedClusterAnalysis ChristianHennig,PietroCoretto .....................................127 AnArtificialLifeApproachforSemi-supervisedLearning LutzHerrmann,AlfredUltsch........................................139 HardandSoftEuclideanConsensusPartitions KurtHornik,WalterBöhm ..........................................147 RationaleModelsforConceptualModeling SinaLehrmann,WernerEsswein .....................................155 MeasuresofDispersionandCluster-TreesforCategoricalData UlrichMüller-Funk................................................163 InformationIntegrationofPartiallyLabeledData SteffenRendle,LarsSchmidt-Thieme..................................171

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.