Journal of Criminal Law and Criminology Volume 14|Issue 3 Article 5 1924 Aptitude Test for Policemen Edward M. Martin Follow this and additional works at:https://scholarlycommons.law.northwestern.edu/jclc Part of theCriminal Law Commons,Criminology Commons, and theCriminology and Criminal Justice Commons Recommended Citation Edward M. Martin, Aptitude Test for Policemen, 14 J. Am. Inst. Crim. L. & Criminology 376 (May 1923 to February 1924) This Article is brought to you for free and open access by Northwestern University School of Law Scholarly Commons. It has been accepted for inclusion in Journal of Criminal Law and Criminology by an authorized editor of Northwestern University School of Law Scholarly Commons. AN APTITUDE TEST FOR POLICEMEN EDWARD M. MARTIN' This study was undertaken by the National Institute of Public Administration with the purpose of determining, by actual experiment, the feasibility of applying to a large body of municipal employees, a method of personnel selection which has already demonstrated its ad- vantages in business, industry and in certain other branches of the civil service. 'Next to education, the largest item in the budget of the average American city is for the police department. To make avail- able a method for devising an entrance examination which will secure better se:ection of policemen is to make a contribution towards increas- ing the efficiency of one of the largest city departments. The study was carried out with the co-operation of Mr. Charles P. Messick, secretary and chief examiner of the New Jersey State Civil Service Commission, State Commissioner Edward H. .Wright in charge of the Newark district, Captain P. J. Troy of the Fifth Precinct, Newark, New Jersey, and Captain James Meehan of the Newark Police Train- ing School. . Further, it was made possible by the technical advice and direction of Dr. Herbert A. Toops of the Institute of Educational Research, Teachers' College, Coumbia University. The procedure for evaluating a selective scale, worked out by Dr. Toops, together with formulm and methods for faci'itating computation in the various steps, was followed. We wish gratefully to acknowledge and to express a deep sense of obligation for all such co-operation and direction. At the present time the se'ection of policemen by civil service commissions is accomplished through tests of physical and mental fit- ness. Physical qualifications are determined in terms of standards derived from experience, usually adopted as part of the regulations of the commission." Reliance for the determination of mental qualifica- tions is placed by most commissions in the so-called academic type of examination. While this form is of value as a method of determining one's knowledge of a limited range of specific facts and the ability to express one's self in writing, it has certain defects and disadvantages when used to measure trade aptitude in policemen. These limitations are revealed when the examination form is analyzed in certain important aspects. Whether such analysis is based on considerations of theory or experimental investigation, a very defi- "National Institute of Public Administration, 261 Broadway, New York City. TEST FOR POLICEMEN 377 nite check on the effectiveness of the form can be had by correlating the ranking of a group in the present entrance examination with a ranking of the same group in a criteri6n of police ability after a period of service in the department. Such a comparison in the present ex- perimental investigation involving 30 cases yielded a correlation co- efficient of - .03 between criterion and an entrance rating which includes credits aflowed by law for war service. A comparison which would be fairer to the present test form would be to correlate the criterion with the entrance rating with these war service credits omitted. The comparison in this particular instance yielded a co- efficient of - .01. One aspect of the problem which has undergone extensive analysis is the unreliability of scores in the present examination form. The researches of Starch,2 Kelley,' Inglis,4 and Ruggles5 have shown that wide variations in judgment are 6btained when a given paper is marked, presumably on the same scale, by two or more examiners. Even when conscious attempts have been made to eliminate these variations, it has been found most difficult to render the scoring of the present examination form objective. The seriousness of -this limitation becomes apparent when it is recalled that civil serVice examination papers are scored by several examiners, and that much depends on a proper dif- ferentiation between individuals. It can be said that the present form is often a better measure of handwriting, the physical form of the paper, punctuation and general grammatical form than of individual police aptitude. Men attracted to police work are frequently ill-qualified by previous occupation and training to express themselves in the essay form of answer. Men who otherwise may possess the desired traits may thus be put at an unfair disadvantage. The mental test form of examination, on the other hand, does not eliminate this feature entirely, but does provide a more common basis, since individuals are more alike in their speed of check- ing, crossing out, underlining, writing single words or short phrases than they are in penmanship or extended written expression. The range of subject matter covered b r examinations may be seen from the accompanying table. This compilation was made from official announcement of examinations or sample forms. Enough cities 2Starch, Daniel, "Educational Measurements," p. 9 ff. Kelley. F. J., "Teachers' Marks"--Teachers College Contributions to Edu- cation No. 66. 4Inglis, Alexander, "Variability of judgments in Equalizing Values in Grading," Educational Administration and Supervision, v. 2, pp. 25-30. 5Ruggles, Allen M., Study of Unreliability of Raters' Judgments conducted while Service Examiner of the Wisconsin State Civil Service Commission. 378 EDWARD M. MARTIN are represented to indicate the type of information which it is believed will indicate police ability. In none of the cities, so far as is known, has the validity of the examinations as measures of police aptitude been determined. It is assumed that some re'ationship exists between the recruit's knowledge of these subjects and his capacity to express that knowledge on paper and his police ability. Such a relationship may exist, but until it has ,been demonstrated, the basis remains sup- position and conjecture, instead of scientific evaluation. A commis- sion's only basis for judging an examination's merits is the judgments of police department officials as to the quality of the men certified. Such estimates are liable to the error inherent in purely subjective personal opinion. As such they are too indirect and too crude a means by which the commission may check the effectiveness of its selective scale when more exact and more effective methods are now available. TABLE I Subject "Contento f Civil Service Examinations for Policemen in Various Cities tCo d Cd c Subject 0 0 z City Information ..... X Xx x X Xx Xxt Xx¢ x x 9 X X 8 Police Duty, etc... X Practical Questions- X X X X X X 7 Arithmetic ............ X 4 Memory Test ......... X X X 3 Report Writing ....... X X X Knowledge of Laws x X1 2 and Ordinances ... 1x 2 Spelling .............. X Geography and Civil x 2 Government ...... Penmanship .......... 1 Rules and Regulations. General Intelligence ... *Examination for cities by state commission. #Includes duties, terminology, some of laws and ordinances. THE EXPERIMENTAL GROUP The co-operation of the Newark police department was enlisted to secure a group of policemen for the experimental evaluation. Being the second largest city in the metropolitan area, it was thought that a group could be secured which would be large enough to satisfy the conditions of the experiment. Also, in view of the fact that the New Jersey State Civil Service Commission was co-operating in the under- TEST FOR POLICEMEN taking as a trial of a new examination method, it was felt that evidence would have 6onsiderable value if secured from a representative pre- cinct ih the largest city of the state. The original objective was a group of 100 policemen. If service ratings for each member of the department had been available, it would have been possible to secure such a number. In order to carry out the study it was necessary to build up a criterion and the conditions en- countered precluded taking such a large group. Arrangements were made, however, through Chief of Police Michael F. Long, Captain James Meehan of the training school, and Captain Patrick Troy of the Fifth Precinct to secure groups from the school and from the precinct. Two different training school groups were tested. These data could not be included in the experimental study, but are valuable as additional information on the reliability of the tests. These groups, further, served an important function in allaying suspicion and in spreading knowledge as to the general nature and purport of the tests. The result was that the subjects in the experimental groups from the fifth precinct entered the examination with more-interest and a better spirit of co-operation than could otherwise have been secured. Forty men were chosen by Captain Troy from his command as subjects for the test group. They were tested in two groups, one on June 20 and the other on June 30, 1922, in the classroom of the training school. The school has quarters in the fifth precinct station house and the accommodations were both convenient and well adapted for examina- tion purposes. The size of the group was limited by several factors. The forty subjects examined comprised the entire night force of the precinct on regular patrol duty at the time the study was made; it was not ex- pedient for the department for the men from the day force to be included in the study; and a prime consideration in constructing the criterion was that all the subjects should be known intimately to each rating officer. This last condition prevented the inclusion of men from other precincts in the test groul . Later it was found advisable to limit the group further because of certain conditions revealed by an exami- nation of all available data. It was considered important to confine the study to the following: patrolmen in active service, men typical of a department, and those who had joined the force uAder civil service. One subject was found to be a "turnkey" who bad done street duty, but had been given a less rigorous assignment with advancing age; an atypical -case was found in one man who had been called upon the trial.board twelve times in ten years because of neglect of duty; and eight men were found to have joined the force before civil service 380 EDWARD M. MARTIN became effective. Data of these ten men were, therefore, discarded and the experimental group was reduced to thirty subjects. While it was recognized that conclusions of general application cannot be drawn from so limited a group, it was thought that the data could be used to fill a threefold purpose: (a) To indicate the feasibility, though on the basis of a limited random sample, of mental tests being used by civil service commissions as selective examinations for policemen; (b) To serve as material to demonstrate the particular method of research employed in this study; (c) To make a contribution to the subject of police personnel selection in the hope of leading others to study the problem so that ultimately sufficient data may be had on which to base sound principles of general application. Establishing the Criterion A criterion ranking may be secured with relative ease -where the individual trade product can be measured quantitatively. But the task is more difficult where the product comprises both quantity and quality. In police work, for example, a patrolman's relative value depends not only on how well he performs routine tasks such as making arrests, box "pulls" or reporting faulty conditions, but also on his attitude towards police work and on the performance of a varied list of func- tions recognized as police duties. The second group would include such things as: ability to size up a crime situation, influence of his physical presence and personal effectiveness in keeping the district "quiet," acts of service rendered citizens, instruction as to minor law violations, "big brother" influence on the neighborhood children, etc. Routine acts may easily be measured in definite amount; but although other phases of police duty can be measured quantitatively, they are so infrequent in occurrence, often, produce results so intangible in character and are so difficult to confirm, that it is only occasionally that they appear on the official records of the department. An account could be kept, for example, of the number of questions answered, in- structions given or boys kept "straight" through the friendly, inter- vention of the "op" on the "beat," but such a tabulation would not be feasible from an administrative point of view. Information, is usually available as to the number of arrests, regularity of box calls, or reports made by each patrolman, but these data give too incomplete a picture for them to be considered an accurate measure of police ability. When such official records are lacking, recourse must be had TEST FOR POLICEMEN to the judgment of commanding officers as the criterion of the ability of individual policemen. The Newark department maintains a record of charges preferred against individual members for neglect of duty or violation of regula- tions. This record is the nearest approach available in the department to a service rating on members of the force. It is an inadequate criterion of individual ability in that it takes account only of observed neglect of duty and does not consider acts or services of a constructive nature which are performed but do not come to official notice. Also it is only infrequently that charges are preferred against individual policemen so that the measure was not distributed over the group or existed in sufficient quantity to permit its use as a basis for .ranking individuals in an experimental group. It was necessary, therefore, to build up a criterion using the judgment of commanding officers as a basis for the scale. Judgments werg secured from four commanding officers-three desk lieutenants and the captain in charge of the precinct. The lieu- tenants had direct charge of the routine work of the precinct, con- ducted the daily roll calls and assigned patrolmen to. the various beats and received reports from the men on post. They rotated shifts every two weeks so that they had ample opportunity to get an intimate knowledge of the men's qualifications. The captain was continually in contact with the precinct and was in the best possible position to pass an opinion on the value of each man. Two types of estimates were secured on each of the forty subjects from each of the four rating officers. First, a rating scale was devised for judging the men on four distinct qualifications. Later, a simple ranking in order of ability was secured. The two methods were used as checks for accuracy and as a means of securing an index of relia- bility for the scale finally adopted. Moore0 has summarized under the following headings principles which should be followed in setting up a rating scale: 1. The ability being rated should be analyzed into its component essential abilities or traits and each trait rated independently. It is better to concentrat on a few essential traits rather than to have too many and have the scale break down from its own size and complexity. 2. The traits determined upon must really be different, and as distinct from one another as possible. 3. The rater must be acquainted with the one being rated. 4. The traits must be as sharply defined as possible, so that different raters will rate the same trait. 6B. V. Moore, "Personnel- Selection of Graduate Engineers," pp. 22-23. 382 EDWARD M. MARTIN 5. The basis of comparison used as a scale should be as concrete and as familiar as possible. 6. Where more than one individual is to be rated in more than one trait, more comparable results are obtained by rating all individuals in one trait before going to the next trait. 7. More reliable ratings result from ratings made independently by more than one person. The rating scale devised endeavored to apply these maxims and, at the same time, to provide a mode of expression which would secure a proper distribution of judgments. A service rating classification of traits used by the Detroit department was adapted to that purpose. It breaks up the composite "police ability" into the four component traits : I-Appearance- Physique: Athletic or corpulent. Neat-ness: Consider person and dress. Bearing: Military attitude and carriage. II-Intelligence- Ability to write a clear and legible report. Does he act with good judgment without instructions? Does he ans'ver questions intelligently? Ill-Discipline-- Is he punctual? Is he respectful to commanding officers? Does he obey orders -promptly and cheerfully? IV-Efficiency- Does he. keep his beat in good condition without arousing ill-feeling among the residents? Does he keep his head in an emergency? Is he courteous to the public? Does he notice violations of ordinances? To have enumerated an exhaustive list of qualities under each of the four main headings would have served only to confuse the rater; the scale would be too complex and would break down from its own weight. Instead, only outstanding phases were included in order to indicate to the rater the type of qualities to be considered. Thus the general capacity "police ability" is analyzed into four component traits which are not only sharply defined, but are points commonly used by officials in estimating police ability of individual patrolmen. The rater's estimate is commonly indicated on either a numerical or letter basis. When a percentage scale is used there is an inclination to place everyone in the upper range. It is felt that an injustice is done an individual if he is marked below 70, the usual "passing" grade. If the straight letter basis is used, a grouping results which is too TEST FOR POLICEMEN 383 coarse for accurate statistical purposes. Since neither method insures securing a proper distribution, and results free from preconceived assumptions, it was decided to work out a plan which would as nearly as possible achieve the desired ends. The "human scale" method of estimating individual ability, devised and used by the army for rating officers, was adopted and a stencil device was employed for registering the rater's estimates. The distinctive feature of the "human scale" is that it utilizes, in systematic and concrete form, the process used more or less consciously in estimating human traits or abilities. Comparisons may be made with abstract standards, but more commonly are made in terms of individ- uals. Evidence of this practice is found in the expressions, "He is taller than ...... ,. " she is prettier than *.... .," "he is more intelli- gent than ...... ," etc. In other words, the estimate is reached by comparing the person being judged with some one well known to the rater and whose possession of a particular trait is used by him as a standard in such judgments. In short, the values are determined on a human scale. In a casual judgment there are not likely to be more than one or two points on the scale, representing either a single instance or extremes of the trait in question. The scale, however, may be amplified or made precise to any extent desired. Five gradations were used in the present scale and designated as follows: lowest, low, mid- dle, high and highest. Each rater constructs his own scale by selecting from his acquaint- ances the individual for each of the five gradations who best repre- sents the particu'ar degree of the trait for which he was chosen. Then, in judging a person, the rater compares him with each of the five men on his own scale. The same scale is used in rating any number of individuals in a given trait. The ratings thus secured can be compared with one another on an equal basis and can be regarded as precise estimates (in terms of the given rater's judgment) of the individual abilities of the several members of a group. The device employed for registering judgments enabled the raters to disregard entirely any percentage or numerical scale. Each was provided with a cardboard stencil on which are spaces for the con- struction of a "human scale" for each of the component traits to be rated. The scales on the stencil are arranged to correspond with the listing of the four traits on an individual service rating sheet. The traits, designated I, II, III, IV, are listed down the left hand side of the sheet and on the right hand side, opposite each numeral, is a three- inch horizontal line upon which the rater registers his judgment. The EDWARD M. MARTIN stencil is just large enough to cover the right half of the rating sheet and has slots or windows three inches long and one-eighth inch wide cut in it so that when lined up with the rating sheet these slots exactly correspond with the four lines on the paper. Each slot is designated witli the proper numeral and is spaced off to represent varying grada- tions of the trait. The points to the extreme left and extreme right of the slot are designated "lowest" and "highest" respectively ;. the midpoint, "middle"; and the midpoints between the center and the extremes, "fow" and "high." A fulcrum is used to indicate each of these points on the scale. Spaces are then provided under each ful- crum for the name of the rater's choice for the particular degree of the trait. In order to rate an individual, the rater first constructs the four "human scales" on the cardboard stencil. The same individual might appear on more than one scale; the point is emphasized that only the best examples should be used for the respective scales. The stencil is then superimposed on the service rating sheet, care being taken to line it up so that the slots correspond exactly with the three-inch lines on the sheet. With the qualifications of the subject being rated in mind, the rater makes a man-to-man comparison to determine where to place him on the particular human scale. When the range has been nar- rowed to two men, the rater used his judgment as to just where he falls in the three-fourth inch space between the upper and lower limits of the range. When the exact spot has been determined, the rater in- serts a pencil in the slot and marks the distance along the horizontal line on the rating sheet beneath. The process is repeated for each of the other three traits. When the stencil is removed, the subject's degree of qualification in each of the traits is shown by the length of the horizontal line, measured from left to right. The length of line is then evaluated by measuring the distance over a three-inch gauge divided into twenty units. The length of line, as measured in units, thus represents the relative standing of the subject in the trait. The number of points given each trait is indicated along the right hand margin; the separate items for each trait are then totaled and entered at the bottom of the column. It will be noted that the rater does not need to be concerned with numerical scales of any sort. He has only to make the man-to-man comparison and indicate his judgment in the manner prescribed. The evaluation of points assigned each trait is made by the investigator, and once the rater has registered his judgment, the various traits may be weighted in any manner deemed justified by their relative importance.
Description: