The anonymous 1821 translation of Goethe’s Faustus: A cluster analytic approach By Refat A. Ali A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy School of English Literature, Language and Linguistics Newcastle University September, 2015 Abstract This study tests the hypothesis proposed by Frederick Burwick and James McKusick in 2007 that Samuel Taylor Coleridge was the author of the anonymous translation of Goethe's Faust published by Thomas Boosey in 1821. The approach to hypothesis testing is stylometric. Specifically, function word usage is selected as the stylometric criterion, and 80 function words are used to define a 73-dimensional function word frequency profile vector for each text in the corpus of Coleridge's literary works and for a selection of works by a range of contemporary English authors. Each profile vector is a point in 80- dimensional vector space, and cluster analytic methods are used to determine the distribution of profile vectors in the space. If the hypothesis being tested is valid, then the profile for the 1821 translation should be closer in the space to works known to be by Coleridge than to works by the other authors. The cluster analytic results show, however, that this is not the case, and the conclusion is that the Burwick and McKusick hypothesis is falsified relative to the stylometric criterion and analytic methodology used. i Dedication To my aged mother, Without your blessings on me none of my success would be possible ii Acknowledgments Thanking every individual without missing others is a daunting task. But, I would like to express my sincere gratitude and thank all individuals who have supported me during these years and who have become invaluable for me along this Ph.D. journey, to only some of whom it is possible to give a particular mention here. Above all, this thesis would be unthinkable without the commitment of Dr. Hermann Moisl, my retired teacher and principal supervisor. He pushed me to a higher level of stylometric research, by emphasizing the importance of quantitative methodology and innovation, but also by having confidence in me. He was always there when needed, mathematical training and computational techniques, always ready with solutions for the problems, and always tolerant through the downs in the research, which God Almighty knows how frequent they were, and I cannot thank him enough for all that he has done for me. I will forever be thankful to my second teacher and supervisor Prof. Michael Rossington, firstly, for suggesting the topic of this thesis and secondly, for his insightful comments for improving the materials related to the literary side of the thesis and for tracking down some elusive information which led to the development of the ideas presented in it. My deepest heartfelt appreciation goes to Prof. Charles Romesburg from the Utah State University and Prof. James C. McKusick from the University of Montana for taking time out from their busy schedule to answer my inquires and provide me with valuable remarks. I am also greatly indebted to all my APR panelists who provided me with useful suggestions and guidelines throughout my Ph.D research project, particularly: a retired teacher Prof. Noel Burton Roberts and Dr. Geoffrey Poole. I owe a very important debt to Prof. Anders Holmberg who was truly an influential part of my whole Ph.D application process: I would not be here without his indispensable comments on my early preliminary proposal of the thesis and application which led to the interviews then Ph.D course admission. iii Thanks are due to the Librarians of Robinson library-Newcastle University, Middlesbrough central library, Grimsby town library, and Immingham library for their help and assistance in providing me with the valuable references and sources needed for the research. Additionally, I would like to thank the directors of postgraduate studies, the former, Dr. James Procter and, the current, Dr. Anne Whitehead, for their advice and assistance. Last, but by no means least, I would like to thank three important groups of people for their support, encouragement, and love. First and foremost, special thanks to my mother whose prayer requests contributed a lot to my entire life and to the completion of this project in particular. I owe a debt of gratitude to my brother Mr. Wajdi, my best friends Mr. Nick Plummer and Pauline McLaughlin. I am lucky to have met Shelley Gibson here, and I thank her for her love, support, and unyielding encouragement. iv Table of Contents Abstract………………………………………………………………………………….i Dedication……………………………………………………………………………….ii Acknowledgments…………………………………………………………………....iii-iv Table of Contents…………..………………………………………………………...v-vii List of Figures……………………………………………………………………….viii-xi List of Tables………...……………………………………………………………...xii-xiii List of Appendices......……………………………………………………………….....xiv List of Abbreviations….………………………………………………………………...xv Introduction…………………………………………………………………………….1-2 Chapter One: Motivation, History and Current State of the 1821 Faustus Translation Authorship Debate 1.1 Motivation…………………………………………………………………………...3-4 1.2 Bibliographic overview of translations of Faustus (Part I) in the early nineteenth- century…………………………………………………………………………….....4-6 1.3 Existing attributions of Boosey’s 1821 Faustus to Coleridge…………………........6-7 1.3.1 The circumstantial historical argument………………………………………...8-11 1.3.2 The qualitative stylistic argument…………………………………………….11-15 1.3.3 The quantitative stylistic argument…………………………………………...15-18 1.4 Assessment…………………………………………………………………………...18 1.4.1 The present discussion’s own reaction………………………………………..18-24 1.4.2 Other reactions……………………………………………………………………24 1.4.2.1 The circumstantial historical argument…………………………………….....24-26 1.4.2.2 The qualitative stylistic argument………………………………………….....26-28 1.4.2.3 The quantitative stylistic argument…………………………………………...28-31 v Chapter Two: Research Question and Methodology 2.1 Research question…………………………………………………………………32 2.2 Methodology………………………………………………………………………32 2.2.1 The authorship identification problem………………………………………..32-33 2.2.2 Literature review………………………………………………………………33-34 2.2.2.1 Older works…………………………………………………………………34-36 2.2.2.2 Recent developments………………………………………………………..36-55 2.2.3 The methodology used in the present study……………………………………...56 2.2.3.1 Hypothesis testing…………………………………………………………...56-61 2.2.3.2 Vector space methods………………………………………………………..61-72 2.2.3.3 Data creation………………………………………………………………....72-99 2.2.3.4 Data analysis………………………………………………………………...99-129 Chapter Three: Analysis 3.1 Data creation: Function words frequency in Coleridge’s works……………….131-145 3.2 Coleridge’s usage of function words…………………………………………..145-156 3.3 Comparison of Coleridge’s usage of function words with contemporary authors………………………………………………………….……………..........156-169 3.4 Where Faustus fits……………………………………………………………..169-180 3.5 Coleridge and the other translators of Faustus………………………………...180-198 Chapter Four: Interpretation……………………………………………………199-210 Chapter Five: Conclusions, Limitations, and Further Research 4.1 Conclusions…………………………………………………………………….211-215 vi 4.2 Limitations…………………………………………………………………………..215 4.3 Further research………………………………………………………………...215-216 Appendices………………………………………………………………………...217-260 Bibliography………………………………………………………………………261-304 vii List of Figures Figure (1.1) Word length measurements for Faustus 1821 and Remorse………………..16 Figure (1.2) Word length measurements for Faustus 1821 and Anster’s translation…....16 Figure (1.3) Word length measurements for Faustus 1821 and Gower’s translation…....17 Figure (2.1) An example of a vector …………………………………………………….61 Figure (2.2) Data items and variables in a data matrix m x n……………………………62 Figure (2.3) 2- and 3-dimensional vector space……………………...…………………..64 Figure (2.4) A vector in space……………………………………………………………64 Figure (2.5) Vector length………………………………………………………………..65 Figure (2.6) The angle between vectors………………………………………………….66 Figure (2.7) Vector distances…………………………………………………………….66 Figure (2.8) Figure………………………………………………………………………..67 Figure (2.9) Figure………………………………………………………………………..67 Figure (2.10) Figure……………………………………………………………………....68 Figure (2.11) Euclidean distance measure………………………………………………..69 Figure (2.12) Euclidean distance between V1 and V2…………………………………...70 Figure (2.13) Text-length based clustering………………………………………………78 Figure (2.14) Categories of manifold definition…………………………………………83 Figure (2.15) Effect of dimensionality increase on the size of a cube…………………...84 Figure (2.16) Data set of 1000 vectors in 3-dimensional space………………………….86 Figure (2.17) Plots of very large vectors in 2-dimensional space………………………..86 Figure (2.18) Five 2-dimensional vectors in space ……………………………………...88 Figure (2.19) Two 3-dimesnional vectors in space………………………………………88 Figure (2.20) Sparse data in the space……………………………………………………90 Figure (2.21) Concentrations of distance among vectors in space…………………….....91 Figure (2.22) Scatter plots of 2-dimensional data………………………………………100 viii Figure (2.23) Two-dimensional data distribution with orthogonal basis……………….104 Figure (2.24) Alternative orthogonal basis ……………………………………………..105 Figure (2.25) Highly correlated two-dimensionality vectors with orthogonal basis……105 Figure (2.26) Alternative orthogonal basis for vectors…………………………………106 Figure (2.27) Three–dimensional data distribution with orthogonal basis……………..106 Figure (2.28) N x N covariance matrix of 6 phonetic segments for DMC……………...107 Figure (2.29) Linear and non-linear distance between points on the Earth’s surface…..115 Figure (2.30) A manifold embedded in metric space…………………………………...116 Figure (2.31) Neighborhoods in metric space…………………………………………..117 Figure (2.32) Scatter plot of randomly generated two-dimensional matrix M………….118 Figure (2.33) Graph interpretation of the neighborhood matrix………………………...120 Figure (2.34) Structure of a self-organizing map…………………………………….....122 Figure (2.35) SOM input lattice………………………………………………………...123 Figure (2.36) SOM lattice……………………………………………………………….123 Figure (2.37) An example of SOM trained on 20 vectors………………………………124 Figure (2.38) Hierarchical clustering tree………………………………………………126 Figure (2.39) Single linkage clustering…………………………………………………127 Figure (2.40) Complete linkage clustering……………………………………………...127 Figure (2.41) Average linkage clustering…………………………………………….....128 Figure (3.1) Variation in the lengths of the texts in the Coleridge’s Matrix D…………142 Figure (3.2) Ward’s analysis of Coleridge’s Matrix D………………………………….143 Figure (3.3) The distribution of function words in frequency matrix F1.………………144 Figure (3.4) Single linkage with cophenetic correlation………………………………..147 Figure (3.5) Complete linkage with cophenetic correlation…………………………….148 Figure (3.6) Average linkage with cophenetic correlation……………………………...149 Figure (3.7) Ward linkage with cophenetic correlation…………………………………150 ix
Description: