From CNBH Acoustic Scale Wiki
Roy Patterson Etienne Gaudrain, Tom Walters,
The text and figures that appear on this page were subsequently published in: Patterson, R. D., Gaudrain, E. and Walters, T. C. (2010). “The perception of family and register in musical tones,” In: Music Perception. Jones, M.R., Fay, R.R. and Popper, A.N., (eds). New York, Springer-Verlag. [DOI 10.1007/978-1-4419-6114-3_2]13-50.
The discussion of pitch and timbre in the remainder of the chapter is more readily understood when presented in terms of 'melodies' in which the acoustic scale values of the notes, Ss and Sf, vary according to the diatonic scale of Western music. The melodies are shown in Figure 1; all of them have four bars containing a total of eight notes. The melodies are in ¾ time, with the fourth and eight notes extended to give the sequence a musical feel. The black and grey notes show the progression of intervals for Ss and Sf, respectively, as each melody proceeds. The Ss component of the melodies is presented in the key of C major as it is the simplest to read. The melodies in the demonstration waves are actually in the key of G major. The sound files for the melodies are available at http://www.acousticscale.org/wiki/index.php/SHAR_PGW_2009_Demo_Sounds.
Melody 1: The first example simulates a normal melody in which the VTL of the singer is fixed and the GPR varies, so in the figure, the grey notes are fixed and the black notes vary. The singer is an adult male and the pitch of the voice drops by an octave over the course of the melody from about 200 to 100 pps [Figure 1, Staff (1)]. This descending melody is within the normal range for a tenor, and the melody sounds natural. The melody is presented as a sequence of syllables to emphasize the speech-like qualities of the source; the 'libretto' is 'pi, pe, ko, kuuu; ni, ne, mo, muuu.' In auditory terms, this phonological song is a complex sequence of distinctive timbres produced by a sequence of different spectral envelope shapes. The timbre changes engage the phonological system and emphasize the role of envelope shape in conveying the libretto of a song. As the melody proceeds, the fine-structure of the spectrum (Ss) shifts, as a unit, with each change in GPR, and over the course of the melody, it shifts an octave towards the origin. The definition of timbre indicates that these relatively large GPR changes, which produce large pitch changes, do not produce timbre changes, and this seems entirely compatible with what we hear in this melody. This illustrates the claim that pitch is largely separable from timbre, much as duration and loudness are, and much as the definition of timbre implies.
Melody 2: But problems arise when we extend the example and synthesize a version of the same melody but with a singer that has a much shorter vocal tract, like that of a small child [Figure 1, Staff (2)]. [This can be accomplished with the vocoder STRAIGHT (Kawahara and Irino, 2004) which can manipulate the acoustic scale variables independently, and produce versions of notes with a wide range of combinations of Ss and Sf.] There is no problem at the start of the melody; it just sounds like a child singing the melody. The starting pulse rate is low for the voice of a small child but not impossibly so. As the melody proceeds, however, the pitch decreases by a full octave, which is beyond the normal range for a child. In this case, the voice quality seems to change and the child comes to sound rather more like a dwarf. The ANSI definition of timbre suggests that the voice quality change from a child to a dwarf is not a timbre change, it is just a pitch change. But traditionally, voice quality changes are thought to be timbre changes. This is the first form of problem with the standard definition of timbre - changes that are nominally pitch changes producing what would normally be classified as a timbre change.
Melody 3: The next example [Figure 1, Staff (3)] involves reversing the roles of the variables Ss and Sf, and using STRAIGHT to manipulate the position of the spectral envelope, Sf, while holding Ss, and thus the pitch, fixed. Over the course of the melody, the position of the envelope, Sf, shifts by an octave towards the origin. This simulates a doubling of the singer's VTL, from about 10 to 20 cm. As with the previous Ss melodies, the specific values of Sf are determined by the diatonic musical scale of Western music. In other words, the sequence of Sf ratios have the same numerical values as the sequence of Ss ratios used to produce the first two melodies. This effectively extends the domain of notes from a diatonic musical scale to a diatonic musical plane, like that illustrated in Figure 2. The abscissa of the plane is Ss or GPR; the ordinate is Sf or VTL. Both of the axes are logarithmic.
The syllables of the libretto were originally spoken by an adult male (author RP) with a VTL of about 16.5 cm and an average GPR of about 120 pps. Then STRAIGHT was used to generate versions of each syllable for each combination of Ss and Sf in the musical plane. The note corresponding to the original singer is [E, E] on this version of the Ss-Sf plane; so what we refer to as a 'C' is acoustically a 'G' (123 pps). Melody 1 was synthesized with the VTL of the original singer, that is, with notes from the E row of the plane. Melody 2 was synthesized with the VTL of a child, that is, with notes from the upper C row of the plane. Melody 3 was synthesized with a fixed pitch, upper C, and notes from the upper C column of the plane.
Perceptually, as Melody 3 proceeds and the envelope shifts down by an octave, the child seems to get larger and the voice comes to sound something like that of a counter tenor, that is, a tall person with an inordinately high pitch. The definition of timbre does not say anything specific about how changes in the spectral envelope affect timbre; the acoustic scale variable, Sf, was not recognized when the standard was written. Nevertheless, the definition gives the impression that any change in the spectrum that produces an audible change in the perception of the sound produces a change in timbre, provided it is not simply a change in duration, loudness or pitch, which are all fixed in the current example. Experiments with scaled vowels and syllables show that the just noticeable change in Sf is about 7% for vowels (Smith et al., 2005) and 5% for syllables (Ives et al., 2005), so all but the smallest intervals in the melody would be expected to produce perceptible Sf changes. Since traditionally, voice quality changes are thought to be timbre changes, the perception of the Sf melody in this example seems entirely compatible with the definition of timbre; both voice quality and timbre are changing. However, we are left with the problem that large changes in Ss and Sf both seem to produce changes in voice quality, but whereas the perceptual changes associated with large shifts of the fine-structure along the log-frequency axis are not timbre changes, the perceptual changes associated with large shifts of the envelope along the same log-frequency axis are timbre changes according to the standard definition. They both produce changes in the relative amplitudes of the spectral components, but neither changes the shape of the envelope and neither form of shift alters the libretto.
Melody 4: The problems involved in attempting to unify the perception of voice quality with the definition of timbre become more complex when we consider melodies where both Ss and Sf change as the melody proceeds. Consider the melody produced by co-varying Ss and Sf to produce the notes along the diagonal of the Ss-Sf plane. The musical notation for the melody is shown in Figure 1, Staff (4). The melody is perceived to descend an octave as the sequence proceeds, and there is a progressive increase in the perceived size of the singer from a child to an adult (with one momentary reversal at the start of the second phrase). It is as if we had a set of singers varying in age from 4 to 18 in a row on stage, and we had them each sing their assigned syllable in order, and in time, to produce the melody. The example makes it clear that there is an entire plane of singers with different vocal qualities defined by combinations of the acoustic scale variables, Ss and Sf. The realization that there is a whole plane of voice qualities makes it clear just how difficult it would be to produce a clean definition of timbre that excludes one of the acoustic scale variables, Ss, and not the other, Sf. If changes in voice quality are changes in timbre, then changes in pitch (Ss) can produce changes in timbre. This would seem to undermine the utility of the current definitions of pitch and timbre.
The independence of spectral envelope shape
There is one further aspect of the perception of these melodies that should be emphasized, which is that neither of the acoustic scale manipulations causes a change in the libretto; it is always 'pi, pe, ko, kuuu; ni, ne, mo, muuu.' That is, the changes in timbre that give rise to the perception of a sequence of syllables are unaffected by changes in Ss and Sf, even when those changes are large (Smith and Patterson, 2005; Ives et al., 2005). The changes in timbre that define the libretto are associated with changes in the shape of the envelope, as opposed to the position of the envelope or the position of the fine structure. Changes in the shape of the envelope produce changes in vowel type in speech and changes in instrument family in music. Changing the position of the envelope and changing the position of the fine structure both produce substantial changes in the relative amplitudes of the components of the magnitude spectrum, but they do not change the timbre category of these sounds, that is, they do not change the vowel type in speech or the instrument family in music.