Journal of Education and Learning; Vol. 7, No. 1; 2018 ISSN 1927-5250 E-ISSN 1927-5269 Published by Canadian Center of Science and Education Identifying New Jersey Teachers’ Assessment Literacy as Precondition for Implementing Student Growth Objectives Victoria Prizovskaya1 1 Elizabeth Public Schools District, Elizabeth, New Jersey, USA Correspondence: Victoria Prizovskaya, 411 Cynthia Court, Princeton, NJ, 08540, USA. E-mail: [email protected] Received: October 4, 2017 Accepted: October 24, 2017 Online Published: October 30, 2017 doi:10.5539/jel.v7n1p184 URL: http://doi.org/10.5539/jel.v7n1p184 Abstract The Student Growth Objectives are assessments created locally or by commercial educational organizations. The students’ scores from the Student Growth Objectives are included in teacher summative evaluation as one of the measures of teacher’s effectiveness. The high amplitude of the requirements in teacher evaluation raised a concern of whether New Jersey public school teachers were competent in assessment theory to effectively utilize the state mandated tests. The purpose of this quantitative study was to identify New Jersey teachers’ competence in student educational assessments. The researcher measured teachers’ assessment literacy level between different groups based on subject taught, years of experience, school assignment and educational degree attained. The data collection occurred via e-mail. Seven hundred ninety eight teachers received an Assessment Literacy Inventory survey developed by Mertler and Campbell. Eighty-two teachers fully completed the survey (N=82). The inferential analysis included an independent-sample t test, One-Way Analyses of Variances test, a post hoc, Tukey test and Welch and Brown-Forsythe tests. The results of this study indicated teachers’ overall scores of 51% on entire instrument. The highest overall score of 61% was for Standard 1, Choosing Appropriate Assessment Methods. The lowest overall score of 39% was for Standard 2, Developing Appropriate Assessment Methods. The conclusion of this study was that New Jersey teachers demonstrated a low level of competence in student educational assessments. In general, the teacher assessment literacy did not improve during the last two decades. Keywords: assessment literacy, student assessment, teacher evaluation 1. Introduction In attempt to reinforce educators’ accountability for students learning the federal and state educational agancies mandate school districts to include the student achievement data in teacher evalautation. The number of school districts that in some way incorporate the students’ outcomes from the assessmetnts in teacher evaluation is growing across the states. In 2011, New Jersey Department of Education (NJDOE) adopted ACHIEVENJ, educators’ evaluation and support system under the Teacher Effectiveness and Accountability for the Children of New Jersey (TEACHNJ) policy. The ACHIEVENJ model consists of three components: Student Growth Percentile (SGP), Student Growth Objectives (SGO) and a few classroom observations by the school administrator using one of the rubrics approved by the NJDOE (New Jersey Department of Education [NJDOE], 2015). The SGPs are the student scores from the standardized tests and are available only for teachers teaching tested grades or subjects such as mathematics and English. The SGOs are student assessments designed locally, by educators, or by commercial organizations and implemented by teachers (Riordan, Lacireno-Paquet, Shakman, Bocala, & Chang, 2015). The students’ outcomes from the SGP and SGO tests are included in teacher summative evaluation. The inclusion of these components provides three measures of teacher effectiveness. The high amplitude of requirements in teacher evaluation raised a concern of whether the New Jersey public school teachers were competent in assessment theory to effectively utilize the state mandated tests, especially the SGO assessments. The rational for this inquiry was that designing any type of assessment is a complex process. An educator who is undertaking this task needs to know the fundamental principles of assessment theory. The poorly developed and incorrectly implemented assessments produce unreliable results and inaccurate inferences about teaching and learning. To address this concern the researcher posted a question: What is the level of competency in student educational assessments of New Jersey public school teachers? To advance the prior 184 jel.ccsenet.org Journal of Education and Learning Vol. 7, No. 1; 2018 studies the comparison of teachers’ competence level occurred among different groups of teachers. The following research questions facilitated the development of this study: What is the statistical comparison of assessment literacy level between teachers from high and low achieving public schools in the state of New Jersey? What is the statistical comparison of assessment literacy level between elementary, middle and high public school teachers in the state of New Jersey? What is the statistical comparison of assessment literacy level between groups of teachers who taught 0-4 years, 5-10 years, 11-20 years, and more than 21 years? What is the statistical comparison of assessment literacy level between tested and nontested teachers? Does a statistically significant difference exists in the level of assessment literacy between groups of teachers based on level of education attained? The study followed quantitative nonexperimental methodology. The data collection occurred via e-mail by utilizing Assessment Literacy Inventory (ALI) developed by Mertler and Campbell (2005). The inferential analysis included an independent-sample t-test and One Way Analysis of Variance (ANOVA) test at α=0.05 level. Eighty two (N=82) teachers from demographically different schools participated in this study forming a purposive sample size. The applications of this study may lead to informed teacher professional development course, improved administrative decisions pertaining to SGO tests and district wide system of student assessments. On a big scale this study may influence the development of teacher evaluation policy in the state of New Jersey. The most important change may occur in teachers’ assessment practices. 2. Review of the Literature Preparing students for life, career, and college became a mission for public schools in the United States. The Smarter Balanced Assessments (SBA) consortium and the Partnership for Assessment of Readiness for College and Careers (PARCC) consortium assembled a common set of K-12 assessments in core academic subjects to create a pathway to college and careers readiness for all students (Herman & Linn, 2013; Rentner & Kober, 2014). The school participation in PARCC or SBA testing compelled classroom teachers to raise students’ scores on standardized tests in accordance with the federal and state policies. The assessments became catalysts for improving students’ achievements and bonds between curriculum and instructions. Simultaneously, teacher assessment literacy became a provision for effective instructions. As a result, the federal and state educational agencies pressured school districts to adopt and implement evaluation systems that measure teacher effectiveness based on students’ performance on standardized tests (Baker, Oluwole, Green, & Preston, 2013). 2.1 The Students Tests Scores in Teacher Evaluation The last school improvement initiative, Race to the Top (RTTT), encouraged school districts across the country to include students’ achievements on standardized tests in teacher evaluation for the exchange of RTTT’s federal monies (Mathis, 2011; Onosko, 2011). The USDOE allocated 350 million dollars to support the states in which teachers’ evaluation was based on 50% of student achievements’ scores (Onosko, 2011). To execute the federal requirements statisticians developed the Value Added Models (VAM) to measure teachers’ effectiveness. In VAMs, the students’ previous tests scores used to predict the future scores on assumption that students perform approximately the same each year. The difference between predicted and current scores considered as a teacher or school contribution into students’ learning (Gitomer, 2011; Groen, 2012; Marder, 2012). The other statistical approach to estimate teacher effectiveness is SGP which is similar to VAMs. The SGP measures the student academic growth from one year to the next compared to students with a similar performance history from across the state academic peers (NJDOE, 2015). The utilization of the SGP scores occurs in the following order: The statisticians assign a percentile rank to each student; teachers receive the percentile ranks of all students they taught this year; each teacher receives a score for the year as a median value of the percentile ranks of her or his students (Gill, English, Furgeson, & McCullough, 2014). The SGP scores are available only for teachers who teach mathematics and English from grade three to eight. During these years, students are taking the state standardized tests in mathematics and English. The inclusion of students’ scores in teacher evaluation brought a big resonance into research and practice. Two polarized understanding regarding teacher evaluation exist in educational community. Some stakeholders believe that the students’ achievements justify how well teachers perform (Hanushek & Haycock, 2010). The others think that teachers should not be evaluated based on students’ tests scores because they do not have control of many factors affecting students’ learning (Gitomer, 2011). 185 jel.ccsenet.org Journal of Education and Learning Vol. 7, No. 1; 2018 The allies of VAM argued that focusing on students’ achievement gains helps to eliminate ineffective teachers from low-performing schools (Hanushek & Haycock, 2010). According to Darling-Hammond (2014) the elimination of 5% to 10% of ineffective teachers every year will increase student academic achievements, and the United States will catch up with the high performing nations. The opponents of the VAMs argued that firing teachers will not solve the problems in teacher evaluation and it will not raise the students’ scores on standardized tests. Hendrickson (2012) speculated that, in Finland, which has one of the strongest education system in the world, educators pay a little attention to evaluating teachers. Instead, the Finnish educators devote more resources to developing collaborative relationship and collective learning among colleagues to promote student learning (Hendrickson, 2012). The substantial research on VAMs and SGP raised questions about the validity of the statistical formulas used to estimate teacher’s effectiveness. One of the questions was that whether the VAM like estimates accurately measure contributions of special education or English Language Learners teachers (Baker et al., 2013). Gitomer (2011) argued that the VAM does not support the theory of teaching and learning because the changes in scores do not explain what teacher did or did not do to improve students learning. Gitomer argued that the good example of controversy is in fact that 10 points gain in the lower scale is equivalent to 10 points gain on the higher scale. The student with the baseline score of 95% may not show increase and teacher of such student was estimated to be ineffective based on VAM formulas (Gitomer, 2011). The variables other than teacher alone influenced the VAMs estimate (Baker et al., 2013). According to Baker et al. the VAMs produced a different rating for teachers compared to other evaluation instruments. The student mobility and missing data compromised VAM’s outcomes (Baker et al., 2013). These facts disrupted the link between the teacher and student. The student prior knowledge, enriched summer classes, private tutor, family background and socio-econimic status brought unfairness to the VAM’s outcomes (Baker et al., 2013). Finally, the nonrandom student assignment to teacher, the classroom composition and school functioning added biases into statistical formulas (Marder, 2012; Onosko, 2011). The SGO assessment is an alternative way to measure teachers’ contribution to students learning, especially for teachers teaching nontested grades or subjects (Riordan et al., 2015). The SGOs are “academic goals for different groups of students that are aligned to the state standards and can be tracked using objective measures” (NJDOE, 2015, p. 3). The SGO development varies from district to district and state to state. The SGO tests may be developed by utilizing local resources or by using commercial educational organizations. The state or school districts decide what type of the SGO to use for teacher evaluation. The SGO developed by teacher was the most common type. According to Lacireno-Paquet, Morgan and Mello’s (2014) study twenty-three states mandated teachers to individually develop the SGO tests. In general, the SGO implementation begins with the developing or selecting appropriate assessments followed by preassessment or diagnostic tests (Gill et al., 2014). Based on the preassessments results teacher sets the learning targets for the entire class, or group of students or individually for each student (Gill et al., 2014). Than teacher chooses measures to evaluate each student or group of students’ proficiency level. At the end of the school year, teachers administer the post SGO test to measure students’ growth (Gill et al., 2014). 2.2 The SGO Assessments in Teacher Evaluation The architects of the SGO asserted that this test perfectly supplies data of students’ growth due to teacher factor. The first onset of the SGO in the state of New Jersey began in 2011, yet it is not fully understood the validity and reliability of this test or the way schools utilize this assessment. Gill et al. (2014) believed that the locally developed SGOs compromise the comparability and reliability of the test. In effort to validate the SGO test, Hu (2015) conducted a quantitative study in Charlotte-Mecklenburg, North Carolina. One hundred and fifty nine schools participated in Hu’s study yielding to 18,800 teachers. To note that, the teacher evaluation in North Carolina is similar to ACHIEVENJ. Both evaluations include the SGO test and the classroom observation by the school principal. The difference between two evaluations is that the ACHIEVENJ includes the SGP scores while the North Carolina evaluation includes the VAM scores. Hu’s (2015) goal was to find correlation between the quality SGO and VAM’s scores. Hu hypothesized that the VAM and SGO scores should be similar in estimating teachers’ effectiveness. Hu found 67% positive correlation and 33% negative correlation between the VAM and quality SGO across years and grades in mathematics, and 73% positive correlation and 27% negative correlation across the grades and years in reading. The student race and ethnicity were significant predictors in the models for both mathematics and reading across the grades and years (Hu, 2015). The class size as a factor varied across the grades (Hu, 2015). 186 jel.ccsenet.org Journal of Education and Learning Vol. 7, No. 1; 2018 Hu (2015) stated that after controlling for the extraneous variables such as class size, prior student achievements and background characteristics the higher quality SGO corresponded to higher teachers’ VAM scores in mathematics and reading. No statistically significant findings of the relationship between the VAM and SGO attainment status existed. Hu explained this fact that some effective teachers did not have skills in designing quality SGO, or some ineffective teachers were skilled in developing quality SGO. Hu suggested that the school districts should not use VAM or SGO to make high stake decisions about teacher practices because there were many factors influencing instructions and learning. Pollins (2014) explored SGO as a process and its implication on teachers and school administrators. Pollins found positive impact of the SGO on elementary and middle school teachers’ practices in Rhode Island public schools. According to Pollins, teachers and administrators agreed that the SGO increased collaboration among colleagues and opened dialogue about students’ learning and common assessments. Additionally, Polins reported that teachers encountered obstacles during the SGO process. The Rhode Island teachers stated that they rarely used quality assessments for the SGO purposes, and teachers needed directions of how to create or choose student assessments for the SGO (Pollins, 2014). Likewise, Pollins suggested not to use the outcomes from the SGO for the decisions related to teacher retention, pay and evaluation. The theory behind the SGO tests is effective teaching through the quality assessments. The SGOs are assessments with multiple roles: To communicate to students learning goals and expectation, to provide students feedback, to help teachers to monitor students learning, to adjust instructions and to measure students’ academic growth (Lacireno-Paquet et al., 2014; Gill et al., 2014; NJDOE, 2015). The instructional planning represents the SGO’s main role. The inclusion of students’ outcomes from the SGO assessments in teacher evaluation conflicts with the SGO’s primarily role. 2.3 Teacher Evalaution in the State of New Jersey In the state of New Jersey, ACHIEVENJ teacher evaluation consists of three measures: The SGP scores, the SGO scores, and the classroom observation by the school principal using one of the teacher evaluation models approved by NJDOE. According to NJDOE (2015), the Danielson Model was the most popular in the state of New Jersey. One hundred and thirty six out of 571 school districts in the state of New Jersey utilized Danielson Model. The second most popular evaluation was Stronge Teacher and Leader Effectiveness Performance System, 65 school districts in the state of New Jersey utilized this model (NJDOE, 2015). The third most popular was Marzano Causal Teacher Evaluation Model, 53 school districts in the state of New Jersey used Marzano Model (NJDOE, 2015). The Danielson Model has a 4-tiered rubric: Highly effective, effective, partially effective and ineffective. The scores of a highly effective teachers are in the range from 3.5 to 4; the scores of effective teachers are in the range from 2.65 to 3.49; the scores of a partially effective teachers are in the range from 1.85 to 2.64 and for ineffective teachers the score are in the range from 1.0 to 1.84 (NJDOE, 2015). The SGO score has a 4-tiered rubric with the same range in scores for four levels of performance. Teachers without the SGP score receive a summative score which combines 20% of the SGO and 80% of the classroom observations. Teachers with the SGP scores receive a summative score of 10% of the SGP median, 20% of the SGO and 70% of classroom observations. Callahan and Sadeghi (2015) conducted a statewide study to investigate three phenomena: New Jersey teachers’ perceptions of ACHIEVENJ, the level of communication between teachers and administrators, and the availability, frequency and effectiveness of the professional development opportunities. Callahan and Sadeghi reported that teachers perceived ACHIEVENJ model as unfair that does not accurately evaluates their teaching abilities. According to Callahan and Sadeghi teachers reported that the number of classroom observations increased and resulted in increased professional dialog about instructions, students’ assessments, and students learning. Furthermore, teachers stated that the quality of the observations decreased because administrators spent more time entering the evidences and information into laptops than actually observing teachers (Callahan & Sadeghi, 2015). In 2014, 56% of teachers wanted more professional development related to their areas of need, and only 5% of teachers reported that they were satisfied with the training they received (Callahan & Sadeghi, 2015). According to New Jersey teachers the implementation of the ACHIEVENJ did not address poor practice, excellent teachers were not recognized, novice teacher did not receive support, and professional learning was not tailored to teacher’s needs (Callahan & Sadeghi, 2015). In support of the SGO method the NJDOE advocates that a thoughtfully developed and collaboratively implemented SGOs improve the quality of discussion about student growth, learning and instructions. The SGO method increases teacher engagement in the assessment practices, enriches teacher knowledge of curriculum standards and it fosters teacher leadership (Gill et al., 2014; NJDOE, 2015). 187 jel.ccsenet.org Journal of Education and Learning Vol. 7, No. 1; 2018 The induction of a new measure in teacher evaluation intensified the role of educational assessments in the process of improving learning outcomes. The teacher assessment literacy became a policy consideration. Popham (2011) underlined two reasons for teachers to become an assessment literate: To understand how the accountability assessments determine educator’s professional quality, and to understand how assessments improve students learning and instructions. The modern schools, according to Popham need educators who are competent in theory of assessments, and effectively use assessments to make instructional and administrative decisions. 2.4 Teacher Assessment Literacy Teacher assessment literacy continues to conquer public interest. The stakeholders in education want to know how teachers utilize assessments to evaluate students’ learning. Educational specialists defined an assessment literate teacher as an expert who is able to select or develop and administer different types of assessments, use the data from the assessments to inform instructional decisions, and to communicate the assessments results to students and their parents (Mertler & Campbell, 2005; Popham, 2011; Stiggins, 2002). Measuring teacher assessment literacy shortly began after the committee of Teaching Profession the American Federation of Teachers (AFT), National Council on Measurement in Education (NCME), and National Education Association (NEA) developed the Standards for Teacher Competence in Educational Assessment of Student in 1990 (American Federation of Teachers [AFT], National Council on Measurement in Education [NCME], & National Education Association [NEA], 1990). Plake et al. (2005) used the Standards to develop Teacher Assessment Literacy Questionnaire (TALQ) instrument. Five hundred and fifty-five teachers from 98 school districts in 45 states completed the TALQ survey. Plake et al. reported teachers’ score as 66% on overall instrument. According to Plake et al. teachers answered 23 items out of 35 correctly. In 2003, Mertler and Campbell (2005) replicated Plake et al.’s study using TALQ. Two hundred and twenty undergraduate preservice teachers participated in the study. Mertler and Campbell reported that teachers answered 21 items out of 35 correctly yielding to 60% of overall teachers’ score. Mertler and Campbell believed that TALQ’s questions were difficult and lengthy to read. Mertler and Campbell modified TALQ into Classroom Assessment Literacy Inventory (CALI) and later to Assessment Literacy Inventory (ALI). In 2013, Davidheiser (2013) measured 180 high school teachers’ assessment literacy in the state of Pennsylvania reporting that teachers answered less than 25 items out of 35 correctly. Simultaneously, Perry (2013) measured 14 teachers’ and 32 principal’s assessment literacy in the state of Montana. Perry reported that on average teachers answered 22 items out of 35 correctly while principals answered 21 items out of 35 correctly on the same instrument. Barone (2012) investigated Albany Middle and Albany High schools teachers’ assessment literacy before the professional development and after as measured by ALI. Barone concluded that as teachers became more knowledgeable in assessments their correct scores on ALI, on average, increased from 73% to 77%. Barone reported a strong correlation between two administrations of the test as 0.97 (r = 0.97, p < 0.05). Barone attributed the increase in teachers’ scores to the effect of the professional development. The research in the field of education produced a substantial knowledge related to preservice teachers’ assessment literacy. Siegel and Wissehr (2011) explored the assessment literacy of 11 preservice teachers enrolled in secondary science methods course of their teacher preparation program. Siegel and Wissehr found that during the methods course preservice teachers identified 19 assessments tools, described their advantages and disadvantages, and evidently demonstrated how to align instructional goals to specific assessments. Siegel and Wissehr concluded that the knowledge of the assessment tools attained by teachers does not constitute its realization in the future classrooms. The working conditions and reality of the classrooms may impede application of the acquired knowledge. Wallace and White (2015) investigated how secondary mathematics preservice teachers’ perspectives of the assessment practice evolved during one year of reform based preparation programs. Six preservice teachers in the state of California participated in the study. The findings of Wallace and White’s study showed that the perception of the assessment practices of preservice teachers evolved through three different stages. As course of the program continued to unfold the teachers’ viewpoints progressed from the test oriented to task oriented and to tool oriented (Wallace & White, 2015). According to Wallace and White during the test oriented stage teachers used the limited number of assessments only for one purpose to evaluate and grade students’ work. During this stage teachers utilized criterion referenced tests which included strictly procedural items with no connections between the concepts (Wallace & White, 2015). During the task oriented stage teachers began to recognize that the test could be utilized for different purposes. The test became criterion referenced and student referenced with 188 jel.ccsenet.org Journal of Education and Learning Vol. 7, No. 1; 2018 some connections between the concepts (Wallace & White, 2015). During the tool oriented stage teachers realized that the main purpose of the assessment is to improve teaching and learning. The assessments significantly transformed from the traditional, with only one purpose to measure, to reform-based for learning purposes. The format of the tool oriented tests included items that require students to apply exploration, reasoning and analysis to solve the problems (Wallace & White, 2015). Odo (2015) believed that developing teachers’ assessment literacy requires a significant attention during teacher preparation programs. According to Odo, teachers’ familiarity with the assessment assortment, available for their use, improves instructions in the diverse classrooms. Teachers’ understanding of fundamental characteristics of the assessments such as validity, reliability and bias will help teachers to interpret the standardized tests proliferating in public schools (Odo, 2015). The results of this study suggested that teacher education related to student assessments should continue after teachers enter classrooms and should be given a thoughtful attention during teacher entire career. 3. Theoretical Foundation The contemporary teacher evaluation models are based on Professional Teaching Standards developed by the National Board for Professional Teaching Standards (NBPTS) organization in 1987 (Darling-Hammond, 1999). The NBPTS’s policy statement, What Teachers Should Know and Be Able to Do, clearly communicates to stakeholders professional standards for teachers. The Professional Teaching Standards provide a common language to educators to discuss teaching and learning. The authors of the NBPTS formulated five core propositions as follows: • Teachers are committed to students and their learning. • Teachers know the subjects they teach and how to teach those subjects to students. • Teachers are responsible for managing and monitoring student learning. • Teachers think systematically about their practice and learn from experience. • Teachers are members of learning communities. Parallel to the Standard for Teaching Profession the AFT, NCME and NEA committee declared that good teaching occurs through the effective methods of assessing students learning (AFT, NCME, & NEA, 1990). In 1987 the AFT, NCME, and NEA committee developed the Standards for Teacher Competence in Educational Assessment of Student (AFT, NCME, & NEA, 1990). The goal was to establish standards that will guide educators in designing and implementing student assessments, in identifying needs for professional development and designing professional development for inservice teachers (AFT, NCEM, & NEA, 1990). The AFT, NCEM, and NEA committee suggested to incorporate the standards in teacher training and certification program before standards are included in teacher evaluation systems. The AFT, NCEM, and NEA committee formulated the Standards for Teacher Competence in Educational Assessment of Student as follows: • Standard 1: Teacher should be skilled in choosing assessment methods appropriate for instructional decisions. • Standard 2: Teacher should be skilled in developing assessment methods appropriate for instructional decisions. • Standard 3: Teachers should be skilled in administering, scoring, and interpreting the results of both externally-produced and teacher-produced assessment methods. • Standard 4: Teacher should be skilled in using assessment results when making decisions about individual students, planning teaching, developing curriculum, and school improvement. • Standard 5: Teacher should be skilled in developing valid pupil grading procedures that use pupil assessments. • Standard 6: Teachers should be skilled in communicating assessments results to students, parents, other lay audience, and other educators. • Standard 7: Teacher should be skilled in recognizing unethical, illegal, and otherwise inappropriate assessment methods and uses of assessment information. The Standards for Teaching Profession and the Standards for Teacher Competence in Educational Assessment of Student served as a theoretical framework for this study. The student educational assessments and teacher assessment literacy was discussed within the scope of the standardized teacher evaluation. The Danielson Model 189 jel.ccsenet.org Journal of Education and Learning Vol. 7, No. 1; 2018 for Teaching and Learning served as a theoretical foundation for this study because almost every domain of teacher practices in Danielson Model includes the assessment element as an effective instructional method. All domains in Danielson Model are rooted in Professional Teaching Standards and reflect the Standards for Teacher Competence in Educational Assessment of Student. According to Danielson (2013), teachers need to know how to design quality assessments that have the ability to provide wide range of evidences of students learning. The Danielson Model consists of four domains and 72 elements of teacher practices: Domain 1 Planning and Preparation, Domain 2 Classroom Environment, Domain 3 Instruction, and Domain 4 Professional Responsibilities (Danielson, 2013). The model incorporates four tier-rubric to rank teacher performance in each domain: Unsatisfactory, Basic, Proficient, and Distinguished (Danielson, 2013). 4. Methodology This study applied non-experimental quantitative methodology and followed causal-comparative design. The research’s questions, data collection, sample size, instrumentation and variables determined the method and design. Each research question had the following hypothesis: RQ1: What is the statistical comparison of assessment literacy level between teachers from high and low-achieving public schools in the state of New Jersey? Ho: No significant statistical difference exists in assessment literacy level between teachers from high and low achieving public schools in the state of New Jersey. Ha: A significant statistical difference exists in assessment literacy level between teachers from high and low achieving public schools in the state of New Jersey. RQ2: What is the statistical comparison of assessment literacy level between elementary, middle and high public school teachers in the state of New Jersey? Ho: No significant statistical difference exists in assessment literacy level between elementary, middle and high public school teachers in the state of New Jersey. Ha: A significant statistical difference exists in assessment literacy level between elementary, middle. RQ3: What is the statistical comparison of assessment literacy level between groups of teachers who taught 0-4 years, 5-10 years, 11-20 years, and more than 21 years? Ho: No significant statistical difference exists in assessment literacy level between groups of teachers who taught 0-4 years, 5-10 years, 11-20 years, and more than 21 years. Ha: A significant statistical difference exists in assessment literacy between groups of teachers who taught 0-4 years, 5-10 years, 11-20 years, and more than 21 years. RQ4: What is the statistical comparison of assessment literacy level between tested and nontested teachers? Ho: No significant statistical difference exists in assessment literacy level between tested and nontested teachers. Ha: A significant statistical difference exists in assessment literacy level between teachers of tested and nontested subjects. RQ5: Does a statistically significant difference exists in the level of assessment literacy between groups of teachers based on level of education attained? Ho: No significant statistical difference exists in assessment literacy level between groups of teachers based on level of education attained. Ha: A significant statistical difference exists in assessment literacy level between groups of teachers based on level of education attained. The inferences about the group differences, stated in research questions, required statistical analysis based on theory of probability. The hypothesis testing approach assisted the analysis of each question (Bock, Velleman, & DeVeaux, 2010). The null hypothesis for each question stated that there is no difference in assessment literacy level between the means among the groups of teachers (μ1 = μ2) while the alternative hypothesis stated that there is a difference in assessment literacy level between the means (μ1 ≠ μ2, two-tailed). The null hypothesis was rejected when the probability of occurrence of the value for the null hypothesis was less than 5% (at α = 0.05) concluding that there was an evidence that the statistically significant differences in groups means exist in population. 190 jel.ccsenet.org Journal of Education and Learning Vol. 7, No. 1; 2018 The other reason for the quantitative approach was the data collection method. In quantitative studies the data is gathered by structured instruments (Johnson & Christensen, 2014). The data collection for this study occurred by utilizing ALI instrument, developed by Mertler and Campbell (2005), specifically for quantitative analysis. Furthermore, the researcher of this study collected data from a large population of 789 school teachers which is another trait of quantitative approach. The sample size of 82 is considered to be a large enough to compute sample’s statistics that accurately reflect the population parameters (Bock et al., 2010). The other trait of the quantitative method was the nature of variables. The dependent variables were quantitative: Teachers’ composite scores on Standards. Finally, this study may be replicated by the future researchers which is considered as an important characteristic of a quantitative methodology (Bock et al., 2010; Johnson & Christensen, 2014). 4.1 Population and Sample The general population of this study was 139,699 certified school teachers employed in 694 operating school districts in the state of New Jersey (NJDOE, 2015). The target population combined school teachers from three school districts and one high school. The school districts and high school received pseudonyms as, SD#1, SD#2, SD#3 and HS#4 to maintain confidentiality regarding participation. Until 2015 in the state of New Jersey the students’ outcomes from two standardized tests, High School Proficiency Assessment (HSPA) and New Jersey Assessment of Skills and Knowledge (NJASK) in elementary and middle schools, determined the school district performance. At the time of the study, based on HSPA and NJASK results, the SD#1 and SD#2 were suburban high achieving school districts and SD#3 and HS#4 were low achieving urban schools. The SD#1 employed 201 teachers, the SD#2 employed 335 teachers, the SD#3 employed 175 teachers, and the HS#4 employed 69 teachers. The target population was 798 participants. In total, 169 teachers, 21%, responded to survey. Only 82 (N = 82) fully completed responses defined the purposive sample size. Sullivan and Feinn (2012) suggested calculating the effect size to estimate reasonable sample size before the study is carried out. The Cohen’s d value for the independent-sample t tests and ANOVA tests aided the analysis of the effect size. According to Sullivan and Feinn, Cohen classified effect sizes as small (d = 0.2), medium (d = 0.5), and large (d > 0.8). To estimate an effect size, Sullivan and Feinn suggested to pilot the study, or use the estimates from the similar work published by other researchers, or use the minimum difference that are considered important by experts. The utilization of G*Power 3.1 allowed to calculate the minimum sample size required to produce statistically significant results. Base on anticipated effect size d = 0.45 and confidence level of 95% (α = 0.05) the Power Analysis for the F-tests was 0.9541217 for the total sample size of 81 participants. The utilization of the Statistical Package for the Social Sciences IBM SPSS Statistics Base 21 (SPSS) software facilitated analysis of collected data. The applications of the descriptive statistics aided the description of the study sample. The applications of inferential statistics expedited the answers to the study questions. 4.2 Data Collection Method During the winter and spring of 2016, 165 superintendents of public school districts received invitations to participate in this study. The researcher made phone calls to school superintendent’s offices following alphabetical order in which districts were listed on the NJDOE publicly open website. Three superintendents and one high school principal agreed to participate in the study. In May 2016, 798 public school teachers from participating schools received an online survey, ALI (Mertler & Campbell, 2005) via school e-mails. Two weeks later the researcher e-mailed a reminder to teachers to complete the survey. The access to the survey was opened for five weeks. 4.3 Instrumentation The members of the American Educational Research Association (AERA) reviewed ALI instrument in Montreal in 2005 and concluded that the instrument is a reliable tool to measure teacher assessment literacy (Mertler & Campbell, 2005). After the AERA review the educational researchers widely used ALI instrument. Hamilton (2014) conducted a quantitative study to investigate to what extent teacher assessment literacy as measured by ALI associated with the teacher knowledge of Curriculum Based Measurement (CBM). The CBM is another research based instrument measuring educators skills in assessing students’ knowledge of curriculum taught in the classroom. The study took place in elementary schools in Rhode Island. Hamilton found positive significant correlation between two instruments (r = 0.505, p < 0.01). Hamilton reported that teachers with the high scores on ALI tended to have high scores on the CBM, and vice versa. Hamilton concluded that the two instruments measured the same construct. 191 jel.ccsenet.org Journal of Education and Learning Vol. 7, No. 1; 2018 After testing ALI on 152 preservice teachers, Mertler and Campbell (2005) reported instrument reliability, rKR20, as 0.75, the mean difficulty as 0.64, and mean item discrimination as 0.32. These values, according to Mertler and Campbell described ALI as a reliable instrument from the psychometric point of view. The administration of Cronbach Alpha test provides the measure of the instrument’s internal consistency, the extent to which all items in the instrument measure the same construct or concept (Tavakol & Dennick, 2011). Davidheiser (2013) reported the Cronbach Alpha coefficient for the ALI survey as 0.824 on 35 items. For this study the Cronbach Alpha was 0.772 on 35 items. For results see Figure 6 in Appendix P. Tavakol and Dennick (2011) suggested that the value of Cronbach Alpha should be in the range between 0.70 and 0.90. The ALI instrument consists of two sections. In the first section participants provided information regarding the number of years teaching, subject matter and grade level taught, and educational degree attained. The second part of the survey had five scenarios followed by seven questions. Teachers had to read each scenario and answer 35 questions. The 35 questions in ALI instrument reflected Seven Standards of Teacher Competence in Student Educational Assessments. Table 1 demonstrates an alignment between the Standards and ALI items. Table 1. Alignment of Standards with ALI Items Standards for Teacher Competence ALI Items Numbers Standard 1 Choosing Appropriate Assessment Methods Items 1,8,15,22,29 Standard 2 Developing Appropriate Assessment Methods Items 2,9,16,23,30 Standard 3 Administering, Scoring, and Interpreting the Results of Assessments Items 3,10,17,24,31 Standard 4 Using Assessments Results to Make Decisions Standard 5 Developing Valid Pupil Grading Procedures Items 4,11,18,25,32 Standard 6 Communicating Assessments Results Items 5,12,19,26, 33 Standard 7 Recognizing Unethical or Illegal Practices Items 6, 13, 20,27, 34 Items 7,14,21,28,35 5. Results and Discussions 5.1 Research Question 1 What is the statistical comparison of assessment literacy level between public school teachers from high and low achieving schools in the state of New Jersey? On average, teachers from high achieving schools performed better on every standard compared to teachers from low achieving schools. Figure 1 shows New Jersey teachers’ composite score based on school assignment. Figure 1. NJ teachers’ performance based on school assignment Based on descriptive statistics, the high achieving district, SD#1, had the highest average score of 55%. The SD#3, the low achieving district, demonstrated the next highest score of 54%. The lowest average score of 42% was for high school teachers from the low achieving school, HS#4. The high achieving district, SD#2, demonstrated the score of 52% for the entire instrument. 192 jel.ccsenet.org Journal of Education and Learning Vol. 7, No. 1; 2018 The pattern in performance related to standards emerged as follows: Regardless of the school assignment, the highest average score was for Standard 1. The Standard 1 refers to teacher skills and knowledge in selecting assessment methods pertinent for instructional decisions. Danielson (2013) defined a distinguished teacher as an expert who utilizes different methods to assess students’ learning and who uses assessments outcomes to design instructions. Importantly to note, that the SGO process requires teacher to select a high quality student assessments. On average, teachers’ score for Standard 1 was 61% (M = 0.61, SD = 0.23). The high achieving school district, SD#1, showed the highest average score of 71% (M = 0.71, SD = 0.24). The lowest average score of 56% occurred for HS#4 (M = 0.56, SD = 0.28) and SD#3 (M = 0.56, SD = 0.16). The average score for Standard 1 for SD#2 was 59% (M = 0.59, SD = 0.21). Regardless of the school assignment, the lowest average score was for Standard 2. To note, that the SGO process requires teachers to develop a quality assessments. The SD#2 demonstrated the highest score of 44% (M = 0.44, SD = 0.23). The next highest average score of 42% was for SD#1 (M = 0.42, SD = 0.22). The HS#4 demonstrated the lowest score of 25% (M = 0.25, SD = 0.26). The SD#3 score was 41% (M = 0.41, SD = 0.21). For results see Table 2 in Appendix A. The inferential analysis did not demonstrate evidences that the statistically significant differences exist between the teachers form high and low achieving schools. In comparison to previous studies, New Jersey teachers demonstrated different results. Perry (2013) reported that the highest performance was for Standard 4 (M = 4.07; maximum possible score = 5). The lowest performance was for Standard 7 (M = 1.29; maximum possible score = 5). Davidheiser’s (2013) findings related to Standard 2 were parallel to the findings of this study. Davidheiser reported that the lowest performance occurred for Standard 2 (M = 0.57; maximum possible score = 1). Davidheiser reported the highest performance for Standard 7 (M = 0.79; maximum possible score = 1). 5.2 Research Question 2 What is the statistical comparison of assessment literacy level between elementary, middle and high public school teachers in the state of New Jersey? On average, middle and elementary school teachers outperformed their peers from high schools. Figure 2 demonstrates New Jersey teachers’ composite score on standards based on grade level taught. Figure 2. NJ teachers’ performance based on grade level The pattern in teachers’ performance has emerged as follows: The middle school teachers performed higher on Standards 1, 2 and 6. The average score on Standard 1 was 65% (M = 0.65, SD = 0.22). The average score on Standard 2 was 49% (M = 0.49, SD = 0.25). The average score for Standard 6 was 63% (M = 0.63, SD = 0.26). The elementary school teachers performed higher on Standards 3 and 4. The average score on Standard 3 was 53% (M = 0.53, SD = 0.20). The average score on Standard 4 was 67% (M = 0.67, SD = 0.23). The high school teachers demonstrated the lowest scores for every standard accept for Standard 7. The average score for Standard 7 was 53% (M = 0.53, SD = 0.27). For results see Table 3 in Appendix B. Inferential analysis demonstrated an evidence that the statistically significant differences between the teachers from elementary, middle and high schools exist for Standard 2. The significance, p values at α = 0.05 level for ANOVA test was 0.019 (p = 0.019, p < 0.05). The significance, p value for the post hoc, Tukey’s HSD at α = 0.05 level was 0.016 (p = 0.016, p < 0.05). The Tukey’s HSD test detected the differences between the middle and high school teachers. The middle school teachers performed higher compared to high school teachers. For 193