Optimization of character coding and a stepwise execution of cladistic analyses J. C. von Vaupel Klein Vaupel Klein, J.C. von. Optimization of character coding and a stepwise execution of cladistic analyses. Zool. Med. Leiden 83 (20), 9.vii.2009: 741-758, fi gs 1-3.― ISSN 0024-0672. J.C. von Vaupel Klein, Division of Systematic Zoology at Leiden University [retired; current postal ad- dress: Beetslaan 32, NL-3723 DX Bilthoven, The Netherlands] ([email protected]. nl / [email protected]). Key words: a/p coding; absence/presence coding; binarization; character coding; character linkage; cla- distic analysis; datamatrix reduction; homoplasy bias; optimization; phylogeny reconstruction; redun- dancy; stepwise analysis; successive outgroup comparison. The absence/presence routine of character coding is examined in regard to minimizing its inherent redun- dancy eff ects with the purpose of optimizing the structuring of a comprehensive, binary datamatrix. Cladistic analytical procedures are next evaluated with respect to the successive use of such a datama- trix at diff erent hierarchical levels. It is concluded that performing a stepwise analysis has various ad- vantages over the more oft en employed techniques, i.e., the ‘total analysis’ routines and the ‘partitioned’ approaches. Introduction Reconstructing the historical course of evolution makes a grand goal but is by no means an easy task, as systematists are well aware. Since Hennig’s (1950) landmark study of phylogenetic systematics became available in English (Hennig, 1966), systematics has earned its place among the natural sciences by adopting a strict and objectively verifi able protocol for reconstructing The Natural System. In the fi rst 30-odd years following that date, i.e., well into the 1990s, the methods of analysing a datamatrix and constructing cladograms constituted the main focus of phylogenetics, and optimization of the various routines was vigorously pursued (e.g., Felsenstein, 1982; Huelsenbeck & Hillis, 1993). Although this part of cladistic patt ern recognition has by no means been tried exhaus- tively yet, we may nonetheless note a shift in focus towards the data that form the basis of that matrix, i.e., the proper way of coding the characters recognized, from the early 1990s (e.g., Hauser & Presch, 1991; Wilkinson, 1992, 1995; Slowinski, 1993; Meier, 1994) until the present (e.g., Goloboff et al., 2006; Lawing et al., 2008). In particular the First Biennial International Conference of The Systematics Association held at Oxford in 1997 (Scotland & Pennington, 2000) may be acknowledged for having truly boosted theoretical developments in the realm of character coding. However, also in this fi eld no consensus has been reached so far and consequently no dominant approach has emerged yet. Nevertheless, both a proper way of representing character states in a primary data- matrix and a proper analysis of the structure of that matrix constitute crucial stages in the analytical procedure. Only if those stages can be optimized, we may expect an opti- mal use of the information contained in the distribution of the character states over the taxa as well as an optimal representation of that information in the eventual cladogram. Hence, the absence of reliable, universal formats for these operations still comprises an impediment to producing objective, reproducible results in cladistics, which obviously 742 von Vaupel Klein. Optimization of character coding. Zool. Med. Leiden 83 (2009) detracts from the confi dence we may have in the application of its fundamental objec- tivity. Thus, in the present paper I want to discuss an aspect of character coding, i.e., alleged redundancy, that might be solved or at least mitigated through a stepwise exe- cution of the analytical and synthetical phases employed in fi nding the (natural) order hidden in the datamatrix. Coding and use of characters in the analytical sequence The phylogenetic method of historical patt ern analysis has been in use for over 40 years now, and many results have been published. Yet, there are no signs of a general agreement among systematists with regard to the detailed implementation of the vari- ous steps in the analytical procedure. As a consequence, it is necessary to fi rst recite the sequence at issue in order to make clear precisely in which stages the points here raised are supposed to fi t. The key word in taxonomy is variation, i.e., variation among the members of natural, monophyletic groups and conceived in any applicable sense, morphological or other- wise. Following careful analysis of the taxa at issue with regard to the variation ob- served, the att ributes or features found, whether morphological, physiological, ecologi- cal, ethological, genetic, chemical, molecular, or otherwise, are examined in order to recognize meaningful characters that can be used in phylogeny reconstruction. In this fi rst truly taxonomic step, allegedly homologous, individual features are aggregated to form characters, which are defi ned as sets of character states that are linked through a priori hypotheses of homology. The compilation of a datamatrix of use in the construc- tion of a hierarchical scheme, to be interpreted as describing the historical pathways along successive speciation events, next requires that the characters and their states be properly coded. In doing so, the character states are aligned into transformation series, the binary characters are defi ned per se, and the multistate characters can either be used as they are (but see below), or be broken down to series of binary characters. What fol- lows is the analytical procedure sensu stricto in which the resulting transformation se- ries with more than two elements are, at some stage, ordered by acknowledging the signifi cance of a certain sequence, e.g., 0-1-2 should indeed be placed in that order, whether as 0-1-2 or as 2-1-0, while binary characters were already ordered by defi nition, as these can only yield 0-1 or 1-0. In a subsequent stage the series are also polarized through outgroup comparison, which recognizes the character states comprised as ei- ther plesiomorphous, or apomorphous at a certain level. Based on the distributions of their apomorphous character states, the taxa are next clustered hierarchically onto the branches of an essentially dichotomous cladogram that takes into account their connec- tion with other taxa in series of sister group relationships, which are based on the synapomorphous possession of character states. The fi nal, a posteriori step in the proce- dure then involves recognizing true homologies on the one hand, and relegating the a priori hypotheses apparently describing non-homologous character states to ad hoc statements of homoplasy or character reversal, on the other. Aft er the analytic and syn- thetic routines have been completed, the ultimate stage encompasses interpreting the cladogram as (an approximation of) a historically correct phylogenetic tree, at least through the addition of a time scale on the vertical, y-axis and, possibly but not neces- sarily, by plott ing some measure of similarity on the horizontal, x-coordinate. von Vaupel Klein. Optimization of character coding. Zool. Med. Leiden 83 (2009) 743 We all know this sequence, and it has been adequately described in textbooks of phylogenetic inference, like Wiley (1981), Forey et al. (1992), and Kitching et al. (1998) to name only the more prominent. Yet, I recite this procedure here in full both as a re- minder of all the steps involved, and in order to expose the vulnerability of any cladistic analysis. Each of those successive steps in the analytical sequence, namely, is equally crucial and an improper execution of any step will introduce information that is bound to be erroneous to some unkown extent, hence fl awing the fi nally resulting cladogram to an unpredictable degree and in an equally unpredictable direction (or directions). Indeed, most steps can be approached via more than one method, and in by far the ma- jority of cases the choice of an alternative method for any step will yield an incongruent cladogram, hence a diff erent hypothesis about the true historical course of evolution. Thus, to determine which method may be considered the ‘best’ in every stage, is a mat- ter that has to be taken most seriously. Binary versus multistate characters The primary issue to be discussed herein, is how to incorporate multistate charac- ters into a datamatrix in a maximally pure and unbiased way. This concerns all three kinds of multistate characters, i.e., the continuous characters, the meristic ones, and the so-called classes. The two former categories, continuous and meristic, can also be charac- terized as quantitative, whereas the classes, just like the purely binary characters, may be acknowledged as being qualitative in nature. Those truly binary characters are not at issue here: they comprise features that can be described in full by only two states, i.e., 0 and 1, like the presence or absence of an att ribute, e.g., an external shell, or else the straight or twisted structure of, e.g., a spine. Data like these may be incorporated in the primary matrix as such, by simply coding ‘0’ for absent and ‘1’ for present, or, e.g., ‘0’ for not twisted and ‘1’ for twisted. On the con- trary, what concerns us here are characters described as so-called multistate classes, i.e., characters with states to which values deviating from 0 or 1 can be att ributed in a, oft en well considered but nonetheless arbitrary, way. In this respect, it is evident that in a cladistic analysis, such in contrast to the situation in a phenetic approach, continuous or meristic characters usually cannot be included in the datamatrix as they are: they have to be converted to ‘classes’ fi rst. In the case of con- tinuous characters, like the length of a wing varying, e.g., between 1.31 and 3.09 mm, the states in the resulting transformation series will have to be defi ned as falling into, e.g., classes 0 = wings absent, 1 = 1.00-1.99 mm, 2 = 2.00-2.99 mm, and 3 = 3.00-3.99 mm, or, of course, any alternative scheme that would be more relevant in the case at issue. Where meristic characters are concerned, like the number of spines on a given part of the body, varying, e.g., between 0 and 140, classes could be defi ned in a similar way by (at least to some degree arbitrarily) dividing the range of 0-140 into a relevant number of partitions, like: 0, 1-35, 36-70, 71-105, 106-140; or by any other scheme that may be interpreted as adequately representing the variation observed. The classes so recog- nized may then be coded as, e.g., 0, 1, 2, 3, 4. A comparable yet fundamentally diff erent situation may be recognized in a case of, e.g., a small number of spines with discrete positions: if 1-4 spines are present, i.e., on positions I-IV, and if cases of ‘3 spines in total’ may be distinguished in the absence of a fourth spine either in position no. II, or no. III, 744 von Vaupel Klein. Optimization of character coding. Zool. Med. Leiden 83 (2009) it is more relevant to code: spine in posi- Table 1. Example of two taxa with an equal number tion II absent or present = 0 - 1, spine in of spines that are, however in part present at diff er- position III absent or present = 0 - 1, etc., ent positions rather than simply resorting to 1 spine = Taxon / Position of spine I II III IV 1, 2 spines = 2, 3 spines = 3, and 4 spines Taxon A, 3 spines present s - s s = 4, since available information about the Taxon B, 3 spines present s s - s confi guration of the spines would then be lost. This may immediately be appar- ent from the observations (see table 1). The fi nal category of multistate characters is that in which classes are inevitable from the start, as the various states cannot otherwise be represented in a numerical way. For instance, when colour is used as a character, this may, e.g., comprise the colours red, yellow, and blue, which can be represented as classes through coding as, e.g., red = 1, yellow = 2, blue = 3 (and 0 may be used for, e.g., ‘other colour’). It may thus be evident from the above, that all basically non-binary characters in a cladistic analysis will have to be represented somehow as classes, whether in a primary or in a secondary sense. Binary representation of multistate characters The justifi cation, or even necessity, of representing multistate characters in discrete units already emerges from the basic principle of phylogenetic systematics as formu- lated by Hennig (1966) himself. Character states can be either plesiomorphous or apo- morphous, but nothing in between: they can have no such status as ‘largely apomor- phous’ or ‘for 0.33 plesiomorphous’ or anything like it. Character states will invariably have to be recognized as either plesiomorphous, or apomorphous (at least at the hierar- chical level at issue), implying that only two possible ratings could be assigned to any character state. In binary characters these are either 0 or 1, but in transformation series with more than two elements there is the obvious possibility of assigning, next to 0 and 1, also states coded as 2, 3, 4, etc. It is here that, in a later stage, the ordering as referred to above is to be applied. Thus, although in linear or branched transformation series of three and more ele- ments any element can only be apomorphous at a single level, the possibility of coding states with values above 1 remains intact. Yet, high values may fl aw the analytical pro- cedure to some degree by gaining disproportionate preponderance in comparison with characters that have values limited to 0 and 1. In addition, any character state being incorporated in a multistate transformation series, as, e.g., ‘3’ in a series from 0 to 5, is prone to be infl uenced by the behaviour of the other elements in the series and thus may not be completely ‘free’ to manifest itself as apomorphous at a certain level in the analytical procedure. This is because, the outcome of an analysis will depend on the distribution of all character states throughout the matrix and, though character states ‘linked’ together in a single, multistate character will tend to have a greater infl uence on the fi nal result as a group, it is generally considered that this may at the same time ob- scure their own, individual merits to some (though hardly quantifi able) degree. More- over, any ‘greater infl uence as a group’ is fundamentally undesirable, since any such infl uence has an a priori chance of directing the fi nal result, thus reducing the independ- ent, objective character of the analysis. von Vaupel Klein. Optimization of character coding. Zool. Med. Leiden 83 (2009) 745 It has thus been considered that the purest way of representing multistate charac- ters would be to break these down into series of binary characters. In executing the binarization, each state is recognized as a separate character that can either be present, or absent. In the above example of three colours, this then leads to: not red vs. red 0 - 1 not yellow vs. yellow 0 - 1 not blue vs. blue 0 - 1 Thus, the equivalence of the colours is guaranteed and each may be valued accor- ding to its own merits and behaviour in the course of the patt ern analysis, literally as an equal towards the other states occurring for that character. Hence, as we may presume, this way the fundamental objectivity of this part of the procedure can be maximized. The implementation of a/p character coding In an att empt at formalizing the process of the coding of characters initially recog- nized as multistate classes in the form of binary sequences, a seminal paper was pro- duced by Pleij el (1995). In that study, the author has investigated various ways of implementing such an operation and the ultimate conclusion he reached, acknowl- edged the fundamental superiority of the so-called absence/presence (or a/p) type of coding. In this protocol, all individual states into which a transformation series may be dismembered are treated as potential apomorphie s (cf. Pleij el, 1995: 315), thus guaranteeing the maximization of the chances for those character states to be recog- nized according to their true, historical status at any relevant level in the fi nal clado- gram. Pleij el’s (1995: 310, his fi g. 1) paradigm comprised a feature X that is found in fi ve conditions, or expressions: (1) absent; (2) round and black; (3) round and striped; (4) square and black; and (5) square and striped, which he coded in a binary way by im- plementing this a/p procedure (type ‘D’ in his paper). The scheme advocated to such end was, using his own example (compare also fi g. 1 herein), to code the fi ve possible states as follows: (1) Feature X: absent (0) / present (1) (2) Rounded shape of feature X: absent (0) / present (1) (3) Square shape of feature X: absent (0) / present (1) (4) Black pigmentation of feature X: absent (0) / present (1) (5) Striped pigmentation of feature X: absent (0) / present (1) Soon aft er the publication of Pleij el’s (1995) paper, systematists became intrigued by this method, to which the impact of the paper at the 1997 Oxford conference may testify: in the proceedings (Scotland & Pennington, 2000), seven out of ten papers (in- cluding the Introduction) cite his article, published hardly two years before the congress took place. Att ending myself at the Oxford meeting, I can state that various partici- pants only learned about the method described by Pleij el (1995) either shortly before, or even at the conference: but even so some had quickly reworked their presentations according to the a/p coding scheme in an att empt at ameliorating, or at least bett er cor- roborating, their results and conclusions. 746 von Vaupel Klein. Optimization of character coding. Zool. Med. Leiden 83 (2009) Advantages of a/p character coding The most prominent advantage of the transcription scheme recommended by Pleij el (1995) unquestionably is the possibility to assess the phylogenetic signifi cance of each individual character state according to its own behaviour in the analytical procedure. As each character state is entered as a separate character, this will allegedly ensure its maxi- mal freedom to emerge as a(n) (syn)apomorphy in the resulting cladogram, provided, of course, that such a status is embodied in the datamatrix as a whole. The basic principle is, that all character states are essentially equal: compare the ex- ample of the colours above. Why should blue have a higher value (3) than either yellow (2), or red (1)? In what way does the arbitrary assignment of those values infl uence the resulting cladogram? With a/p coding, such questions cannot even be asked, for the vari- ous states are treated as equal from the start. Although this by no means can imply that a/p coding would invariably emerge as the ‘best performer’ among coding schemes (cf. Forey & Kitching, 2000; Hawkins, 2000), it certainly may be recognized as carrying the least, even minimal inherent bias, as no (not even inadvertent) weight is assigned Table 2. Datamatrices corresponding to the confi - to any character state in particular. With guration depicted in fi g. 1a: a, the initial matrix; a/p coding, it thus would seem, the judg- and, b, the reduced matrix as adapted following ment of the investigator would tend to be determination of the fi rst, i.e., basal dichotomy in the ingroup, (A)-(B-G). As character (1) is no longer minimized and hence the result based on variable in the new ingroup (B-G), it carries no a maximal infl uence of the character state phylogenetic information relevant for the structure distributions over the datamatrix per se. of that group and has hence been omitt ed. Note Another, also quite convenient trait of there is no diff erence in polarity of the states in (a) a/p coding is that, as Pleij el (1995: 312) ob- and (b), as the new outgroup (A) in matrix (b) serves ‘... the problem with inapplicable shows the same character states as the original out- character states disappears; ...’. Indeed, group (OG) in matrix (a). where simply ‘absence’ or ‘presence’ are Taxon / Character (1) (2) (3) (4) (5) coded, without any a priori interpretation, X Ro Sq Bl St the true status of each state at every level a, Initial datamatrix: may be expected to eventually emerge OG 0 0 0 0 0 from the analysis by itself. This means A 0 0 0 0 0 that no special requirements are neces- B 1 0 1 1 0 sary to deal with missing entries or inap- C 1 0 1 0 1 D 1 1 0 1 0 plicable data (compare, e.g., table 2). E 1 1 0 1 0 F 1 1 0 0 1 Disadvantages of a/p coding examined G 1 1 0 0 1 b, Secondary datamatrix: When advocating a/p coding as an A (= new OG) 0 0 0 0 optimized way of binarizing multistate B 0 1 1 0 characters (i.e., transformation series C 0 1 0 1 D 1 0 1 0 with more than two elements), Pleij el E 1 0 1 0 (1995) already admitt ed there remain F 1 0 0 1 three serious problems with this kind of G 1 0 0 1 coding, the fi rst of which being consi- X, feature X; Ro, rounded; Sq, square; Bl, black; St, dered the most important: (a) an eff ect of striped von Vaupel Klein. Optimization of character coding. Zool. Med. Leiden 83 (2009) 747 redundancy, i.e., more elements are introduced into the datamatrix than absolutely ne- cessary. Indeed, the presence of feature X will already be apparent from two of the char- acters nos. (2)-(5) scoring ‘1’, whereas the absence of X will immediately be revealed by four ‘0’s for those same characters. This obviously means that scoring ‘Feature X absent or present’ as a separate character, seems superfl uous. In addition, two more undesired eff ects may occur: (b) the phenomenon of character linkage, i.e., if dissected that far, the various elements of a transformation series cannot be regarded as independent variables any longer: presence of a square shape automatically implies that ‘rounded shape’ scores ‘0’, and then also one of the colours will score ‘1’, the other ‘0’. Finally, (c) the eff ect of homoplasy bias, i.e., the disproportionate weight binary coded multistate characters may get over ‘purely’ binary characters in a mixed matrix: the above fi ve-state character makes four or fi ve binary characters, whereas a truly binary character will never make more than one character, by its very nature. The recoded multistate characters will thus tend to outweigh the purely binary characters by the sheer numbers of their states, all coded as separate characters. Next, any redundant information as noted under (a), above, may enlarge the already inevitable eff ect of imbalance in numbers, again at the expense of the (eff ective) signifi cance of the purely qualitative, binary characters. These eff ects were also immediately recognized at the 1997 Oxford meeting, and participants agreed with the author that the scheme inherently suff ered from a certain redundancy: if X is present, one shape and one colour will each score ‘1’, so adding another ‘1’ for the mere presence of X does not convey any additional information. Thus, the schedule most users of a/p coding soon adopted, consisted of deleting char- acter no. (1) in the above series, i.e., the primary recognition of the presence or absence of feature X as such. This ‘reduced a/p coding’ would allegedly subvent the redundancy acknowledged. As the author already pointed out himself (Pleij el, 1995), there could be chances that the disadvantages noted above may unduly aff ect the results of cladistic analyses, if not properly dealt with. So, this is why I herein suggest a correction towards the actual ap- plication of a/p coding as described, in order to minimize the redundancy eff ect, viz., by performing cladistic analyses in a stepwise manner. Presumably, this will generally imply that all three negative aspects of the procedure may be considered to potentially become restrained within reasonable limits, since the eff ects mentioned above eventu- ally all come down to some sort of redundancy, i.e., to inadvertent weighting. Performing patt ern analysis stepwise Cladistic patt ern analysis is usually performed with the aid of computerized algo- rithms embedded in computer programs and, as a rule, in one go: the ‘total analysis’ approach. The datamatrix as a whole is analysed to fi nd the ‘best’ structure, which, ac- cording to the program at issue, is indicated as the ‘most parsimonious’ solution, or the solution ‘of best fi t’, or an equivalent term. Usually, a large but restricted number of possible cladograms is probed and from those, the one(s) requiring the least character transformations is/are presented as the result(s). Whether initially rooted or unrooted (the latt er yielding a network only), eventually all results will be rooted to produce one or more ‘maximally parsimonious’ cladograms, from which (if more than one) the in- vestigator has to choose the one that is judged most appropriate (for instance, based on additional, qualitative arguments). 748 von Vaupel Klein. Optimization of character coding. Zool. Med. Leiden 83 (2009) Alternatively, the (super)datamatrix can be analysed in parts, the so-called ‘parti- tioned’ approach, in which, e.g., morphological data are examined separately from mo- lecular data, and the resulting trees will have to be united into a single, integrated cladogram through the use of ‘consensus’ techniques. Aft erwards, the ‘most parsimonious’ cladogram eventually chosen is usually sup- ported, or corroborated in a statistical sense, by indicating (oft en branch by branch) the robustness of the procedure (i.e., mostly based on the percentages of original hypothe- ses of homology that have not been discarded) by referring to a ‘consistency index’ and/ or a ‘retention index’, or else through the application of additional techniques like ‘jack- knifi ng’, ‘bootstrapping’, and the like. These routines all purport to demonstrate that the result ultimately accepted indeed represents the best choice from the, usually many, possible options, and all basically resort to the use of the initial datamatrix. Disregarding the technical (i.e., mathematical) details of the algorithms and of the analytical procedure, the process of rooting, whether beforehand or aft erwards, usually involves applying the criterion of outgroup comparison: the character states in the in- group are compared to the states present in the outgroup, which are by defi nition con- sidered plesiomorphous, whereby the alternative states are labelled as apomorphous for the purpose of the analysis. In the majority of cases, so it would seem, those initial labels ‘plesiomorphous’ and ‘apomorphous’ are retained in the course of the entire analysis, which eff ectively means: in structuring the cladogram as a whole. My prime concern with these methods is, that they all apparently use the initial datamatrix to fi nd the underlying phylogenetic structure throughout the entire clado- gram. However, each and every character state is, in fact, of phylogenetic relevance only at a single level, i.e., exactly at the level where that state once developed as an evolutionary novelty and thus now constitutes a synapomorphy for the taxa that have since evolved from the common ancestor that developed that apomorphous state. At all other levels in the cladogram, that state does not convey phylogenetic information and thus is there, at best, neutral with regard to the performance of the analytical procedure. In the scenario of a stepwise analysis, however, the initial datamatrix is only used to fi nd the fi rst dichotomy that follows the root. To resolve the structure of the higher levels of the tree, the matrix is adapted (a) by removing those characters that, from that point onwards, are no longer variable; and also (b) by a renewed polarization of the remain- ing characters and their states according to that, now accepted, fi rst dichotomy – there- by employing the (newly found) sister group(s) as (an) outgroup(s); and fi nally (c) by removing the initial (or subsequent) outgroup ‘aft er use’. Fig. 1a, based on the various states of feature X of Pleij el (1995), shows a suite of seven ingroup taxa, A-G, plus an outgroup OG. According to classical outgoup com- parison, the presence of feature X constitutes an apomorphous state at the split (A) vs. (B-G). The most parsimonious solution to explain the distribution of feature X over the cladogram is to assume that X developed in the common ancestor of clade (B-G) and hence constitutes a synapomorphy for that group, forming an argument for its monophyly. From the above it follows, that character (1) in the scheme of Pleij el (1995), ‘Feature X absent or present’, is only informative there, and makes a redundancy when analy- sing the structure of clade (B-G), since all members of that clade by defi nition possess von Vaupel Klein. Optimization of character coding. Zool. Med. Leiden 83 (2009) 749 feature X. Thus, it would be desirable to omit character (1) in the analysis of that clade. On the other hand, the presence of X as such provides one argument to re- cognize clade (B-G), whence it consti- tutes by no means a redundant character at level (A)-(B-G). This is where the advantage of a stepwise procedure becomes clear: cha- racter (1), X absent/present, is to be in- cluded in the primary datamatrix in which all character states have been co- ded as separate characters, but once the fi rst dichotomy in the structure of the in- group has been determined, this has to be recognized as the basis of the (here already rooted) cladogram. Then the da- tamatrix needs to be ‘cleaned’ from re- dundant characters, i.e., those characters that are invariable from that level up- ward must be deleted to yield a reduced matrix, and the former outgroup has to be discarded as well. Next, a successive outgroup comparison has to be per- formed in which the remaining terminal clade (B-G) fi gures as ingroup and the sister group, (A) now is taken into ac- count as the outgroup. Generally, several characters may have to be repolarized, as now the character states of the new outgoup A are by defi nition considered plesiomorphous (though this is not al- Fig. 1. Some hypothetical examples of alternative ways at issue, see below). A point that possibilities with regard to the apomorphous or should also be noted is, that the OG- plesiomorphous status of the expressions of ‘Fea- ture X’; the shape of the cladograms has been kept comparison in the program will have to congruent for easy comparison. The absence of X as decide for each character state which of such is plesiomorphous only in (a), apomorphous the conditions ‘0’ or ‘1’ is to be considered only in (b), and plesio- as well as apomorphous in plesiomorphous in any given case: since (c), i.e., ‘absence’ is not homologous throughout the ‘0’ is always coded for absence and ‘1’ for cladogram. Likewise, the striped pigmentation presence, ‘0’ is not automatically inter- would be synapomorphous in (c), while partly preted as plesiomorphous, or ‘1’ as apo- homoplastic and partly synapomorphous in (a) and (b). The black pigmentation would be plesiomor- morphous. phous in (a), partly plesiomorphous with a reversal The above means that the original in (b), and symplesiomorphous in (c). The shapes matrix for feature X as in table 2a, will be are shown as homoplastic in (c), whereas in (a) and reduced to the scheme in table 2b: here (b) ‘rounded’ would be synapomorphous versus only characters (2)-(5) are included, while ‘square’ being symplesiomorphous. 750 von Vaupel Klein. Optimization of character coding. Zool. Med. Leiden 83 (2009) (1) now has been removed, as has group Table 3. Datamatrices corresponding to the confi gu- OG. The fact that no repolarization has ration depicted in fi g. 1b: a, the initial matrix, with ‘Feature X present’ as the plesiomorphous state; b, been performed is due solely to the fact, the reduced matrix as adapted following determi- that also in the new outgroup ‘Feature X’ nation of the fi rst, i.e., basal dichotomy in the in- is absent and hence its derivatives, (2)- group, (A)-(B-G) by omitt ing OG, but retaining (5) score an ‘0’ there. The possession of character (1) as this is variable here; and, c, the par- the shapes, square or rounded, nor of the tial matrix for groups (B-C) and (D-G). Again, there colours, black or striped, can be pola- is no diff erence in polarity of the states in (a) and rized in (B-G) through mere outgroup (b); in (c) outgroup comparison is only possible for the pigmentation characters, but an interpretation comparison, since in outgroup A feature whether ‘black’ in F constitutes a new apomorphy X is absent, and thus has neither a shape, or a reversal, or is due to retaining the ancestral con- nor a colour. However, the initial da- dition from (OG-A), cannot reliably be established; tamatrix already implied that same con- OG comparison is not directly possible for shape dition, and retaining character (1) would ‘rounded’, which is, however, provisionally consi- make no diff erence in this respect. Obvi- dered apomorphous as in (b), while the absence of ously, determining the structure of clade feature X in (D-E) is interpreted as an apomorphous loss. (B-G) in this example will depend on other characters and their states, and the Taxon / Character (1) (2) (3) (4) (5) true status of both the shapes and the X Ro Sq Bl St colours of X will emerge aft erwards by a, Initial datamatrix: interpreting the distribution found ac- OG 1 0 1 1 0 cording to the most parsimonious hy- A 1 0 1 1 0 pothesis of character transformation. B 1 0 1 0 1 C 1 0 1 0 1 Thus, with respect to the status of ‘Fea- D 0 0 0 0 0 ture X present’, this state was assigned E 0 0 0 0 0 as apomorphous in the beginning and F 1 1 0 1 0 would have retained that status through- G 1 1 0 0 1 out the analysis, if not character (1) had b, Secondary datamatrix: been removed aft er the fi rst dichotomy A (= new OG) 1 0 1 1 0 had been determined: it would have B 1 0 1 0 1 C 1 0 1 0 1 been uninformative and hence redun- D 0 0 0 0 0 dant from level (A)-(B-G) onward. E 0 0 0 0 0 In another hypothetical scenario, de- F 1 1 0 1 0 picted in fi g. 1b, the situation is diff erent: G 1 1 0 0 1 here the absence of X constitutes an apo- b, Tertiary, partial datamatrix for (D-G) with (B-C) morphous loss, because X as such was as outgroup, and vice versa: already present as a plesiomorphous state B-C (= new OG) 1 0 1 0 1 D 0 0 0 0 0 in the whole ingroup (A-G) according to E 0 0 0 0 0 the initial outgroup comparison. Here, F 1 1 0 1 0 the respective matrices will be shaped as G 1 1 0 0 1 in table 3a-c. As regards the absence of X, X, feature X; Ro, rounded; Sq, square; Bl, black; St, this state thus was qualifi ed as apomor- striped phous in the initial matrix and will re- tain that status during several successive stages in the stepwise procedure. Fig. 1c presents a condition in which the absence of feature X presumably com- prises both a plesiomorphous state (as in OG and A), and an apomorphous loss, as in