9 G M ROUPING ECHANISMS IN M USIC D D IANA EUTSCH Department of Psychology University of California, San Diego La Jolla, California I. INTRODUCTION Music provides us with a complex, rapidly changing acoustic spectrum, often derived from the superposition of sounds from many different sources. Our audi- tory system has the task of analyzing this spectrum so as to reconstruct the origi- nating sound events. This is analogous to the task performed by our visual system when it interprets the mosaic of light impinging on the retina in terms of visually perceived objects. Such a view of perception as a process of “unconscious infer- ence” was proposed in the last century by Helmholtz (1909–1911/1925), and we shall see that many phenomena of music perception can be viewed in this way. Two types of issue can be considered here. First, given that our auditory system is presented with a set of first-order elements, we can explore the ways in which these are combined so as to form separate groupings. If all first-order elements were indiscriminately linked together, auditory shape recognition operations could not be performed. There must, therefore, be a set of mechanisms that enable us to form linkages between some elements and that inhibit us from forming link- ages between others. Simple mechanisms underlying such linkages are examined in the present chapter. The second issue concerns the ways in which higher order abstractions are derived from combinations of first-order elements so as to give rise to perceptual equivalences and similarities. This issue is explored in Chapter 10, and we shall see that higher-order abstractions are also used as bases for grouping. In considering the mechanisms whereby we combine musical elements into groupings, we can also follow two lines of inquiry. The first concerns the dimen- sions along which grouping principles operate. When presented with a complex pattern, the auditory system groups elements together according to some rule 299 Copyright © 1999 by Academic Press. The Psychology of Music, Second Edition All rights of reproduction in any form reserved. 300 DIANA DEUTSCH based on frequency, amplitude, temporal position, spatial location, or some multi- dimensional attribute such as timbre. As we shall see, any of these attributes can be used as a basis for grouping, but the conditions determining which attribute is used are complex ones. Second, assuming that organization takes place on the basis of some dimension such as frequency, we can inquire into the principles that govern grouping along this dimension. The early Gestalt psychologists proposed that we group elements into configurations on the basis of various simple rules (see, for example, Wert- heimer, 1923). One is proximity: closer elements are grouped together in prefer- ence to those that are spaced further apart. An example is shown in Figure 1a, where the closer dots are perceptually grouped together in pairs. Another is simi- larity: in viewing Figure 1b we perceive one set of vertical rows formed by the filled circles and another formed by the unfilled circles. A third, good continua- tion, states that elements that follow each other in a given direction are perceptu- ally linked together: we group the dots in Figure 1c so as to form the two lines AB and CD. A fourth, common fate, states that elements that change in the same way are perceptually linked together. As a fifth principle, we tend to form groupings so as to perceive configurations that are familiar to us. It has been shown that such laws operate in the perception of visual arrays, and we shall see that this is true of music also. It seems reasonable to assume—as argued by R. L. Gregory (1970), Sutherland (1973), Hochberg (1974), Deutsch (1975a), Bregman (1978, 1990), and Rock (1986)—that grouping in conformity with such principles enables us to interpret our environment most effectively. In the case of vision, elements that are close together in space are more likely to belong to the same object than are elements that are spaced further apart. The same line of reasoning holds for elements that are similar rather than those that are dissimilar. In the case of hearing, similar sounds are likely to have originated from a common source, and dissimilar sounds from different sources. A sequence that changes smoothly in frequency is likely to have originated from a single source, whereas an abrupt frequency transition may reflect the presence of a new source. Components of a complex spectrum that arise in synchrony are likely to have FIGURE 1 Illustrations of the Gestalt principles of proximity, similarity, and good continuation. 301 9. GROUPING MECHANISMS IN MUSIC emanated from the same source, and the sudden addition of a new component may signal the emergence of a new source. Another general question to be considered is whether perceptual grouping re- sults from the action of a single decision mechanism or whether multiple decision mechanisms are involved, each with its own grouping criteria. There is convincing physiological evidence that the subsystems underlying the attribution of various characteristics of sound become separate very early in the processing system (Edelman, Gall, & Cowan, 1988). Such evidence would lead us to hypothesize that auditory grouping is not carried out by a single mechanism but rather by a number of mechanisms, which at some stage act independently of each other. As we shall see, the perceptual evidence strongly supports this hypothesis. and fur- ther indicates that the different mechanisms often come to inconsistent conclu- sions. For example, the parameters that govern grouping to determine perceived pitch can differ from those that determine perceived timbre, location, or number of sources (Darwin & Carlyon, 1995; Hukin & Darwin, 1995a). Further evidence comes from various illusions that result from incorrect conjunctions of different attribute values (Deutsch, 1974, 1975a, 1975b, 1980a, 1981, 1983a, 1983b, 1987, 1995). From such findings we shall conclude that perceptual organization in music involves a process in which elements are first grouped together so as to assign values to different attributes separately, and that this is followed by a process of perceptual synthesis in which the different attribute values are combined—either correctly or incorrectly. II. FUSION AND SEPARATION OF SPECTRAL COMPONENTS In this section, we consider the relationships between the components of a sound spectrum that lead us to fuse them into a unitary sound image and those that lead us to separate them into multiple sound images. In particular, we shall be exploring two types of relationship. The first is harmonicity. Natural sustained sounds, such as produced by musical instruments and the human voice, are made up of components that stand in harmonic, or near-harmonic, relation (i.e., their frequencies are integer, or near-integer multiples of the fundamental). It is reason- able to expect, therefore, that the auditory system would exploit this feature so as to combine a set of harmonically related components into a single sound image. To take an everyday example, when we listen to two instrument tones playing simultaneously, we perceive two pitches, each derived from one of the two har- monic series that together form the complex. A second relationship that we shall be exploring is onset synchronicity. When components of a sound complex begin at the same time, it is likely that they have originated from the same source; conversely, when they begin at different times, it is likely that they have originated from different sources. As an associated issue, 302 DIANA DEUTSCH we shall be exploring temporal correspondences in the fluctuations of components in the steady-state portion of a sound. The importance of temporal relationships for perceptual fusion and separation was recognized by Helmholtz in his treatise On the Sensations of Tone (1859/ 1954), in which he wrote: Now there are many circumstances which assist us first in separating the musical tones arising from different sources, and secondly, in keeping together the partial tones of each separate source. Thus when one musical tone is heard for some time before being joined by the second, and then the second continues after the first has ceased, the separation in sound is facilitated by the succession in time. We have already heard the first musical tone by itself and hence know immediately what we have to deduct from the compound effect for the effect of this first tone. Even when several parts proceed in the same rhythm in polyphonic music, the mode in which the tones of the different instruments and voices commence, the nature of their increase in force, the certainty with which they are held and the manner in which they die off, are generally slightly different for each.… When a compound tone commences to sound, all its partial tones commence with the same comparative strength; when it swells, all of them generally swell uniformly; when it ceases, all cease simulta- neously. Hence no opportunity is generally given for hearing them separately and indepen- dently. (pp. 59–60). A. HARMONICITY Musical instrument tones provide us with many informal examples of percep- tual grouping by harmonicity. Stringed and blown instruments produce tones whose partials are harmonic, or close to harmonic, and these give rise to strongly fused pitch impressions. In contrast, bells and gongs, which produce tones whose partials are nonharmonic, give rise to diffuse pitch impressions (Mathews & Pierce, 1980). Formal experiments using synthesized tones have confirmed this conclusion. De Boer (1976) found that tone complexes whose components stood in simple harmonic relation tended to produce single pitches, whereas nonharmonic com- plexes tended instead to produce multiple pitches. Bregman and Doehring (1984) reported that placing simultaneous gliding tones in simple harmonic relation en- hanced their perceptual fusion. They presented subjects with three simultaneous glides and found that the middle glide was more easily captured into a separate melodic stream when its slope differed from that of the other two. Furthermore, when the slope of the middle glide was the same as the others, it was less easily captured into a separate melodic stream when it stood in harmonic relationship with them. How far can a single component of a complex tone deviate from harmonicity and still be grouped with the others to determine perceived pitch? Moore, Glas- berg, and Peters (1985) had subjects judge the pitches of harmonic complex tones and examined the effects of mistuning one of the components to various extents. When the component was mistuned by less than 3%, it contributed fully to the pitch of the complex. As the degree of mistuning increased beyond 3%, the contri- bution made by the mistuned component gradually decreased, and at a mistuning of 8%, the component made virtually no contribution to the pitch of the complex. 303 9. GROUPING MECHANISMS IN MUSIC Darwin and Gardner (1986) obtained analogous effects in the perception of vowel quality. Mistuning a harmonic in the first formant region of a vowel pro- duced shifts in its perceived quality, with increasing shifts as the amount of mis- tuning increased. For mistunings of around 8%, the direction of the shift was such as would be expected had the component been perceptually removed from the calculation of the formant. Other investigators have studied the perception of simultaneous complexes that were built on different fundamentals. They varied the relationships between the fundamentals, and examined how well listeners could separate out the complexes perceptually, as a function of these relationships. For example, Rasch (1978) used a basic pattern that consisted of a pair of two-tone chords that were presented in succession. All the tones were composed of a fundamental together with a series of harmonics. The lower tones of each chord were built on the same fundamental, whereas the higher tones differed by a fifth, in either the upward or the downward direction. The subject judged on each trial whether the higher tones formed an ascending or a descending pattern. The threshold amplitude for obtaining reliable judgments was taken as a measure of the degree to which the subject could sepa- rate out the tones forming each chord. As shown in Figure 2, as the higher tones were mistuned from simple harmonic relation with the lower ones, detection thresholds fell accordingly, reflecting an enhanced ability to separate out the pitches of the tones comprising the chords. Huron (1991b) has related such findings on harmonicity and spectral fusion to polyphonic music. One objective of such music is to maintain the perceptual inde- pendence of concurrent voices. In an analysis of a sample of polyphonic keyboard 0 B) d S ( NE –10 O T R E H G HI F –20 O L E V E L –30 –12.8 –3.2 –0.8 0 0.8 3.2 12.8 DEVIATION OF FREQUENCIES OF HIGHER TONES FROM 500 AND 750 HERTZ (%) FIGURE 2 Detection thresholds for higher tones in the presence of lower ones. Two chords were presented in sequence. The lower tones of the chords were identical while the higher tones differed by a fifth, in either the upward or the downward direction. Subjects judged whether the higher tones formed a “high-low” or a “low-high” sequence. Detection thresholds fell as the higher tones deviated from simple harmonic relation with the lower ones. (Adapted from Rasch, 1978.) 304 DIANA DEUTSCH works by J. S. Bach, Huron showed that harmonic intervals were avoided in pro- portion to the strength with which they promoted tonal fusion, and he concluded that Bach had used this compositional strategy in order to optimize the salience of the individual voices. Other composers have focused on the creation of perceptual fusion rather than separation. Particularly in recent times, there has been much experimentation with sounds that were produced by several instruments playing simultaneously, and were configured so that the individual instruments would lose their perceptual identities and together produce a single sound impression. For example, Debussy and Ravel in their orchestral works made extensive use of chords that approached timbres. Later composers such as Schoenberg, Stravinsky, Webern, and Varese often used highly individualized structures, which Varese termed “sound masses” (Erickson, 1975). Here the use of tone combinations that stood in simple harmonic relation proved particularly useful. To return to the laboratory experiments, findings related to those of Rasch (1978) have also been obtained for speech perception. A number of studies have shown that simultaneous speech patterns could be more easily separated out per- ceptually when they were built on different fundamentals—in general, the amount of perceptual separation reached its maximum when the fundamentals differed by roughly one to three semitones (Assmann & Summerfield, 1990; Brokx & Noote- bohm, 1982; Scheffers, 1983). Furthermore, formants built on the same funda- mental tended to be grouped together so as to produce a single phonetic percept, whereas a formant built on a different fundamental tended to be perceived as dis- tinct from the others (Darwin, 1981; see also Gardner, Gaskill, & Darwin, 1989) The number of sources perceived by the listener provides a further measure of grouping. Moore, Glasberg, and Peters (1986) reported that when a single compo- nent of a harmonic complex was mistuned from the others, it was heard as stand- ing apart from them. In other studies, simultaneous speech sounds were perceived as coming from a larger number of sources when they were built on different fun- damentals (Broadbent & Ladefoged, 1957; Cutting, 1976; Darwin, 1981; Gardner et al., 1989). Interestingly, less mistuning is required to produce the impression of multiple sources than to produce other effects. For example, a slightly mistuned component of a tone complex might be heard as distinct from the others, yet still be grouped with them in determining perceived pitch (Moore et al., 1986) or vowel quality (Darwin, 1981, Gardner et al., 1989). As argued by Darwin and Carlyon (1995), this type of disparity indicates that perceptual grouping involves a number of dif- ferent mechanisms, which depend on the attribute being evaluated, and these mechanisms do not necessarily use the same criteria. B. ONSET SYNCHRONICITY So far we have been considering sounds whose components begin and end at the same time, and we have explored the spectral relationships between them that 305 9. GROUPING MECHANISMS IN MUSIC are conducive to perceptual fusion. In real musical situations, temporal factors also come into play. One such factor is onset synchronicity. The importance of this factor can be shown in a simple demonstration, in which a harmonic series is presented in such a way that its components enter at different times. For example, take a series that is built on a 200-Hz fundamental. We can begin with the 200-Hz component sounding alone, then 1 sec later add the 400-Hz component, then 1 sec later add the 600-Hz component, and so on until all the components are sounding together. As each component enters, its pitch is initially heard as a distinct entity, and then it gradually fades from perception, so that finally the only pitch that is heard corresponds to the fundamental. Even a transient change in the amplitude of a component can enhance its per- ceptual salience. This was shown by Kubovy (1976) who generated an eight-tone chord whose components were turned off and on again abruptly, each at a different time. On listening to this chord, subjects perceived a melody that corresponded to the order in which the amplitude drops occurred. Darwin and Ciocca (1992) have shown that onset asynchrony can influence the contribution made by a mistuned harmonic to the pitch of a complex. They found that a mistuned component made less of a contribution to perceived pitch when it led the others by more than 80 msec, and it made no contribution when it led the others by 300 msec. Onset asynchrony can also affect the contribution of a component to perceived timbre. Darwin (1984) found that when a single harmonic of a vowel that was close in frequency to the first formant led the others by roughly 30 msec, there resulted an alteration in the way the formant frequency was perceived; this alter- ation was similar to the one that occurred when the harmonic was removed from the calculation of the formant (see also Darwin & Sutherland, 1984). Interestingly, Darwin and colleagues have found that the amount of onset asyn- chrony that was needed to alter the contribution of a component to perceived pitch was greater than was needed to alter its contribution to perceived vowel quality. Hukin and Darwin (1995a) showed that this discrepancy could not be attributed to differences in signal parameters, but rather to the nature of the perceptual task in which the listener was engaged; again arguing, as did Darwin and Carlyon (1995), that such disparities reflect the operation of multiple decision mechanisms in the grouping process. Onset asynchrony has been found to have higher level effects also. In one ex- periment, Bregman and Pinker (1978) presented listeners with a two-tone com- plex in alternation with a third tone, and they studied the effects of onset-offset asynchrony between the simultaneous tones. As the degree of onset asynchrony increased, the timbre of the complex tone was judged to be purer, and it became more probable that one of the tones in the complex would form a melodic stream with the third tone (see also Dannenbring & Bregman, 1978). Using yet a different paradigm, Deutsch (1979) presented subjects with rapid melodic patterns whose components switched from ear to ear, and with each com- ponent accompanied by a drone in the contralateral ear. An onset asynchrony of 15 306 DIANA DEUTSCH msec between the melody component and the drone significantly improved identi- fication of the melody, indicating that the melody components were more easily combined together sequentially when they did not occur synchronously with other tones. When two complex tones are played together, they are perceptually more dis- tinct when their onsets are asynchronous than when they begin to sound at the same time. Rasch (1978) demonstrated this effect using the basic patterns and detection task described earlier. He showed that detection of higher tones in the presence of lower ones was strongly affected by onset asynchrony: Each 10 msec of delay of the lower tones was associated with roughly a 10-dB reduction in de- tection threshold. At a delay of 30 msec, the threshold for perception of the higher tones was roughly the same as when they were presented alone. Rasch further observed that the subjective effect of this onset asynchrony was very pronounced. When the onsets of the tones were synchronous, a single fused sound was heard; however, when onset disparities were introduced, the tones sounded very distinct perceptually. This, as Rasch pointed out, is an example of the continuity effect (see Section II,C). Rasch (1988) later applied the results of this study to live ensemble perfor- mances. He made recordings of three different trio ensembles (string, reed, and recorder) and calculated the onset relations between tones when they were nomi- nally simultaneous. He found that asynchrony values ranged from 30 to 50 msec, with a mean asynchrony of 36 msec. Relating these findings to his earlier percep- tual ones, Rasch concluded that such onset asynchronies enabled the listener to hear the simultaneous sounds as distinct from each other. According to this line of argument, such asynchronies should not be considered as performance failures, but rather as characteristics that are useful in enabling listeners to hear concurrent voices distinctly. On this line of reasoning, larger amounts of asynchrony should produce even better and more reliable separation of voices. One might hypothesize, then, that compositional practice would exploit this effect—at least in polyphonic music, where it is intended that the individual voices should be distinctly heard. Evidence for this hypothesis was found by Huron (1993) in an analysis of J. S. Bach’s 15 two-part inventions. He found that for 11 of these inventions, values of onset asyn- chrony were such that there were no other permutations of the rhythms of the voices (with duration, rhythmic order, and meter controlled for) that produced more onset asynchrony than occurred in Bach’s actual music. For the remaining four inventions, values of asynchrony were still significantly higher than would be expected by chance. Huron concluded that Bach had deliberately produced such on- set asynchronies so as to optimize the perceptual salience of the individual voices. C. AUDITORY CONTINUITY Auditory continuity is perhaps the most dramatic effect to result from temporal disparities within tone complexes. Consider the visual analogue shown in the up- per portion of Figure 3, which was adapted from Vicario (1982). Line A could, in 307 9. GROUPING MECHANISMS IN MUSIC A FIGURE 3 Visual analogue of an auditory continuity effect. Line A in the upper illustration could, in principle, be seen as having three components (a line to the left of the rectangle, a line to its right, and a line that forms part of the rectangle itself). However, it is instead seen as a single, continu- ous line. This effect is weaker in the lower illustration, in which the rectangle is wider, and the lines to its left and right are shorter. (Adapted from Vicario, 1982.) principle, be viewed in terms of three components: a line to the left of the rect- angle, a line to its right, and a line that forms part of the rectangle itself. However, our visual system instead treats all three components as a single line, which is independent of the remaining parts of the rectangle. Vicario produced a musical equivalent of this demonstration. He generated a (cid:1) (cid:1) (cid:1) chord that consisted of components corresponding to C, D , F , A, C, D , and (cid:1) 4 4 4 4 5 5 F ; with A both preceding and following the other components of the chord. Just 5 4 as line A in Figure 3 is seen as continuing through the rectangle, so the listener heard a pitch corresponding to A continue right through the chord. 4 This continuity effect is sensitive to the precise temporal parameters of the various components. To return to Vicario’s visual analogue, when the lines form- ing the rectangle are lengthened and the lines to its left and right are shortened, as in the lower portion of Figure 3, the impression of continuity is reduced. Similarly, when the duration of the lengthened component of the chord is reduced, and the duration of the full chord is lengthened, the impression of auditory continuity is diminished. In general, demonstrations of auditory continuity have existed for some time (see Warren, 1984, for a review). In an early study, Miller and Licklider (1950) rapidly alternated a tone with a noise burst, and subjects reported that the tone appeared to continue right through the noise. The authors called this the “picket 308 DIANA DEUTSCH fence effect,” because in observing a landscape through a picket fence we see it as continuous rather than as broken up by the pickets. Vicario (1960) independently reported a similar phenomenon, which he called the “acoustic tunnel effect.” A different type of continuity effect was described by Warren, Obusek, and Ackroff (1972). When a broadband noise was repeatedly presented at different intensity levels, listeners heard the fainter noise as persisting without interruption, while the louder noise appeared to come on and off periodically. The authors found that analogous effects occurred with other signals also, such as narrowband noise, and pure and complex tones. More elaborate continuity effects have also been reported. Dannenbring (1976) generated a pure-tone glide that rose and fell repeatedly. In some conditions, the glide was periodically interrupted by a loud broadband noise; however, it was perceived as though continuous. In contrast, when the glide was periodically bro- ken, leaving only silent intervals during the breaks, listeners heard a disjunct series of rising and falling glides. Visual analogues of these two conditions, and their perceptual consequences, are shown in Figure 4. Sudden amplitude drops between signals and intervening noise bursts may re- duce, or even destroy, continuity effects. For example, Bregman and Dannenbring (1977) presented subjects with a gliding tone such as just described, and found that brief amplitude drops before and after the intervening noise bursts decreased the tendency to perceive the glide as continuous. Similarly, Warren et al. (1972), using noise bursts of alternating loudnesses, found that brief silences between the different bursts reduced the impression of continuity. FIGURE 4 Visual illustration of an auditory continuity effect using gliding tones. See text for details. (Adapted from Bregman, 1990, which illustrates an experiment by Dannenbring, 1976.)
Description: