Alasdair Braid MA TESOL An Exploratory Study of Rater Orientations on Pre-sessional Academic Conference Presentations through Verbal Report Protocols by Alasdair Braid Dissertation submitted to the University of Nottingham in partial fulfillment of the requirements for the Degree of Master of Arts September 2016 Word Count: 14,951 i Alasdair Braid MA TESOL Acknowledgements I would like to thank all the participants who kindly gave me their time for the verbal reports and follow-up interviews. Their work is very stressful and, at the time, they had probably had their full share of assessing academic presentations but they were patient and helpful to me throughout. I particularly owe a debt of gratitude to the pre-sessional co-ordinator and the Pre-sessional 3 co-ordinator at the EAP Unit where I conducted my research. These two individuals were my contacts with the EAP Unit. They helped put me in contact with the students and the participants and gave me useful suggestions. Frankly, I am not sure how this dissertation would have been possible without them. I hope this dissertation is worthy of their altruism. I would also like to thank my supervisor Martha Jones, for her helpful feedback and suggestions and for her patience during the early stages of the process. ii Alasdair Braid MA TESOL Abstract This exploratory study focuses on the Academic Conference Presentation (ACP), a frequent form of speaking assessment in EAP (English for Academic Purposes), but one that is under-researched from a rater’s perspective. This study analyses raters’ perceptions of the ACP performance from four aspects: what type of features are heeded by raters most, i.e. genre, criticality or language features, the construct-relevance of those heeded features (Fulcher, 2003), the clarity and processability of the rating scale, and the types of strategy that raters use to cope with this task. The study uses a retrospective verbal report methodology, as exemplified in the studies of Orr (2002), Brown (2000) and May (2006). It focuses on an ACP summative assessment from an interim pre-sessional course delivered by the EAP Unit of a British university. The verbal reports were carried out with five trained raters who work in the EAP Unit and the data elicited from the verbal reports was triangulated with follow-up interviews with each rater (Dörnyei, 2007). The research found that the raters heeded predominantly construct-relevant features of performance and attended to criticality and genre features more than language features. The rating scale was found to put a processing strain on the raters and used a substantial amount of relativistic (Knoch, 2009) wording. The research also found that several raters made impressionistic judgements of the overall score while the performance was in progress, but then checked this against the criteria, or against the details of the unfolding performance. There were differences in terms of the philosophies of two raters, with one rater rewarding positive features of performance to counterbalance negative features while another rater adhered more strictly to the rating scale. However, the rating approaches exhibited by the raters were complex and did not fit neatly into the synthetic and analytic types outlined by Pollitt and Murray (1996). iii Alasdair Braid MA TESOL Contents 1. Introduction ...................................................................................... 1 2. Literature Review ................................................................................... 4 2.1. The Construct of the Test: the Academic Conference Presentation .......... 4 2.2. Discourse Analytical Studies of ACPs ................................................... 6 2.3. Critical Thinking ................................................................................ 9 2.4. Rater Moderation ............................................................................ 11 2.5. Rater Training ................................................................................ 12 2.6. Rating Scales ................................................................................. 12 2.7. Rating Scale Design ........................................................................ 13 2.8. Issues in Rating Scale Development .................................................. 15 2.9. Studies of Rater Strategies .............................................................. 16 2.10. Studies of Rater Orientations .......................................................... 17 3. Methodology ........................................................................................ 19 3.1. The Verbal Report: Definitions, Strengths and Weaknesses .................. 19 3.2. Participants .................................................................................... 21 3.3. The Stimulus .................................................................................. 21 3.4. Piloting .......................................................................................... 22 3.5. Procedure ...................................................................................... 22 3.6. Training ......................................................................................... 23 3.7. Transcription .................................................................................. 23 3.8. Coding .......................................................................................... 24 iv Alasdair Braid MA TESOL 3.9. Follow-up Interviews ....................................................................... 26 3.10. Key to results ............................................................................... 27 4. Results and Discussion .......................................................................... 28 4.1. Research Question 1: To what extent do the raters attend to criterion features of performance? ....................................................................... 28 4.2. Research Question 2: Do the raters focus on language features, such as fluency and accuracy, or are genre and/or criticality equally or more heeded? ........................................................................................................... 29 4.2.1. Criticality Judgements ................................................................ 30 4.2.2. Genre Judgements .................................................................... 32 4.2.3. Criticality and Genre Judgements: Summary ................................ 34 4.3. Research Question 3: Are there any issues in how the raters use the rating scale, such as manageability and clarity? ........................................ 34 4.4. Research Question 4: Can any patterns in rater strategy be construed across the data?.................................................................................... 38 4.4.1. A Comparison of Two Raters ....................................................... 44 5. Limitations .......................................................................................... 49 6. Conclusion ........................................................................................... 50 7. Bibliography ........................................................................................ 53 v Alasdair Braid MA TESOL Figures Adapted from Fulcher’s (2003) model of speaking test performance…………….3 Fulcher’s expanded model of speaking test performance………………………….6 A generic structure of an ACP………………………………………………………...8 Bloom’s taxonomy of the cognitive domain………………………………………….10 The segmentation process…………………………………………………………….25 The coding process…………………………………………………………………….25 Macro-processes and micro-processes for inferring rater strategies……………..39 Tables The procedure for the verbal report…………………………………………………..23 Total thought units in the review turns which express a judgement………………29 The five most frequently evaluated criteria according to each rater………………30 Rater processes………………………….……………………………………………..40 Total self-monitoring and reflecting micro-processes across raters………………42 vi Alasdair Braid MA TESOL 1. Introduction In the two years that I have worked on EAP pre-sessionals, I have worried much more about my consistency as a rater of spoken assessment than I have as a rater of written assessment. This may be due to the real-time aspect of rating speaking exams. Although many speaking exams are video or audio-recorded, the necessity of assigning scores to a multitude of students in a short time usually prevents the rater from reviewing aspects of the performance. Consequently, raters have to make scoring decisions either on the spot or in the few minutes between one performance and the next (Weir, 2005). As a result, I was interested in reading research of raters’ orientations to speaking exams to see whether the researchers found the kind of subjectivity that I felt prone to in my own rating. If we define subjectivity in terms of raters attending to non-criterion features of performance, i.e. those features that are not in the scoring rubric, then the research did show that trained raters often diverged from the rating scale in their evaluations of student performance (Orr, 2002; Brown, 2000; May 2006). The research also showed that raters brought different foci or ‘individual frames of reference’ (May, 2006, p.34) to the rating experience. For example, in May’s study of two trained EAP raters, one rater focused their attention on accuracy features and the other on fluency (Ibid). In the same study, the two raters saw students’ use of intertextuality in different lights. One saw it as detracting from fluency; the other saw it as showing an ability to synthesise sources (Ibid). I wanted to investigate whether some of the same idiosyncrasies could be found in trained raters of an academic genre that is commonly tested in EAP: the academic conference presentation (ACP). This genre has been studied from a student’s perspective (Zappa-Holman, 2007) and from a discourse analytic perspective (particularly in the collection of papers edited by Ventola, Shalom and Thompson, 2002), but, to my knowledge, it hasn’t been studied from a rater’s perspective. I wanted to apply to the ACP genre the same verbal report methodology that Orr (2002), Brown (2000) and May (2006) had applied to the paired speaking (Orr and May) and oral interview tasks (Brown). 1 Alasdair Braid MA TESOL The setting of the research is an EAP Unit of a British University. The EAP Unit provides four main pre-sessional courses which students join according to their IELTS score. These courses follow a developmental path from Pre-sessional 1 at the lower levels, to Pre- sessional 4, after which students should be ready to join their departments. The focus of the present study is Pre-sessional 3 and the summative assessment for this course: that is, the test of what students have grasped during the year (Brown, 2010). This assessment is divided into two parts. The first part is a research paper written on a topic of the student’s choice from their subject specialism. The topic should be linked to the theme of the course, which this year is power. The main points from this research paper, or one particular area of interest in the research paper, are then expanded into a twenty minute presentation. The main focus of my research will be verbal reports with five trained raters from the EAP Unit’s staff, through which I will analyse three main areas: the ‘criterion-ness’ of their rating, whether they focus on genre, criticality or linguistic aspects of the ACP performance, and how they interpret the rating scale. From my investigation of these areas, I hope to draw implications for rating scale design which I hope will be useful for a tutor from the EAP Unit. As can be seen in Fulcher’s schematic, rating scale design, rater training and rater characteristics are closely intertwined in the assessment process and these three strands will run through my literature review, results and discussion (Figure 1). 2 Alasdair Braid MA TESOL Figure 1: Adapted from Fulcher’s (2003) Model of Speaking Test Performance (p.115). Another thread to my research will be to investigate what strategies raters use to assess the presentations. This should shed further light on the manageability of the rating scale, as well as give useful information to rater trainers, and inexperienced raters like myself, about the kinds of strategies experienced raters use to mediate this multi-semiotic speech event (Yang, 2014a). 3 Alasdair Braid MA TESOL 2. Literature Review In the first part of the literature review, I would like to look at some discourse analyses of the ACP genre and some theories of critical thinking which will give me a language to describe the ACP performance in the results section. In the second half, I will discuss research into more practical aspects of speaking assessment such as moderation, rater training and rating scale design. Finally, I will look at some studies of rater orientations to speaking assessment and studies of rater strategies which have informed my research design. First, however, I would like to briefly explain the concept of ‘construct’, as it is fundamental to everything that I will talk about. 2.1. The Construct of the Test: the Academic Conference Presentation A construct is defined by Richards and Schmitt (2010) as a ‘concept that is not observed directly but is inferred on the basis of observable phenomena’ (p.122). In the case of speaking tests the observable phenomena are the behaviours exhibited by the test-taker while enacting the speaking task. This allows the examiner to make an inference about the test-taker’s speaking ability in general (an ability construct) or the test-taker’s ability to perform a similar task in the real-world (a performance construct) (Chapelle and Brindley, 2010). The construct validity of a test, therefore, is how well the test items reflect the theoretical underpinnings, or the construct, of the test (Richards and Schmitt, 2010). As illustrated in Fulcher’s (2003) expanded model of speaking test performance (Figure 2), defining the construct occupies a central position in the test development process, informing ‘the mode of delivery, tasks and method of scoring’ (Ibid, p.19). The close link between the rating scale and the test construct is particularly salient, as the criteria of the rating scale are operationalisations of the test construct (Ibid). This means that rating scale development needs to be a carefully thought-through process. 4
Description: