From CNBH Acoustic Scale Wiki
Roy Patterson , Etienne Gaudrain, Tom Walters
2. Pulse-Resonance Sounds and Acoustic Scale
The tones that one hears in the natural environment are typically ‘pulse-resonance’ sounds (Patterson et al. 2008), for example, the calls that mammals, birds, frogs, and fish use to declare their territories or attract mates (e.g. Fitch and Reby 2001). The vowels of speech and the sustained tones of orchestra instruments are also pulse-resonance sounds. So they are the normal tones that one hears every day in the man-made environment and in the natural world.
2.1 Origin of Pulse-Resonance Sounds
The production of a pulse-resonance sound is conceptually simple. The animal just has to develop some means of producing an acoustic pulse which will, then, resonate in one or more of the structures in the animal’s body. Once the basic mechanism arises in response to the need for communication, evolution can refine the sound with successive modifications to make it more distinctive and efficient. In present-day animals, the pulse generating mechanism typically produces a stream of pulses that occur regularly in time, and in models of tone production, the mechanism that produces the stream of pulses is referred to as ‘the source’ of the sound. The resonances in the animal’s body are collectively referred to as a ‘the filter’, and in most animals, the filters have evolved to give the animal’s call a distinctive timbre. The stream of pulses with their resonances forms a tone, and these tones provide the basis for animal communication. They also broadcast the species of the caller.
In almost all mammals, the source mechanism is the vocal folds in the larynx at the base of the throat; they produce pulses by momentarily impeding the flow of air from the lungs. The pulses of air then excite resonant cavities in the airway between the larynx and the lips, and this filter of resonant cavities is referred to as the vocal tract. A short segment of a synthetic /a/ that sounds like the vowel in ‘car’ is presented in Figure 1a. The wave shows that the sound is periodic and each cycle contains an acoustic pulse followed by a decaying resonance with a complex shape. A vowel is normally on the order of 100-300 ms in duration, so the complete waveform for the /a/ in ‘car’ would contain 20-60 of the pulse-resonance cycles shown in Figure 1a. The waveform repeats every 5 ms so the ‘repetition rate’ of the tone is 200 cycles per second (cps), and this value is used to specify its pitch.
These are the main characteristics of pulse-resonance sounds as they appear in the time domain. Many birds and frogs also excite resonances in their air passages by momentarily interrupting the flow of air from the lungs, although the details of the source and filter mechanisms are somewhat different. Fish do not have air passages but many of them have swim bladders that resonate and function as the filter. The bladder is excited by muscles in the wall of the swim bladder (e.g. the weakfish, Cynoscion Regali) which produce brief mechanical pulses referred to as ‘sonic twitches.’ This muscle source produces twitches in regularly-timed streams (Sprague 2000). A brief introduction to the pulse-resonance sounds produced by animals is presented in Patterson et al. (2008).
Pulse-resonance tones are very different from environmental noises like wind in the trees or waves on the beach, or man-made noises like extractor fans, jet engines, or the boiling of a kettle. Noises arise from turbulent systems where the source vibrates randomly. Noise waveforms are not periodic and so they do not produce salient pitch perceptions. The filtering is incidental and evolution is not involved in tuning the filter to make the sound distinctive or improve communication. One continuous noise sounds much like another when they have the same loudness. Perceptually, pulse-resonance tones, with their pronounced pitch and distinctive timbre, tend to capture the listener’s attention, whereas continuous noises are commonly ignored.
Returning to musical sounds, the sustained tones that singers produce when the voice is used as an instrument are vowels, and so the singing voice produces pulse-resonance tones. The instruments of the brass, string and woodwind families also produce pulse-resonance tones (van Dinther and Patterson 2006). Each of the families has a source mechanism that produces regular streams of pulses which are filtered by resonances in the instrument’s body (Fletcher and Rossing 1998). Several examples are presented in Section 3. The remainder of this Section describes the acoustic properties of pulse-resonance tones as they appear in the magnitude spectra of the sounds, and how the properties do, or do not, vary with the size of the instrument or singer.
2.2 Acoustic Properties of Pulse-Resonance Sounds
The set of vertical lines in Figure 1b shows the long-term magnitude spectrum of the vowel, that is, the distribution of energy across frequency, averaged over 100 ms, or more, of time. The frequency axis is logarithmic in this case, similar to the place, or ‘tonotopic,’ dimension of the cochlea. The vertical lines show that the energy is restricted to frequencies which are integer multiples of a single, fundamental frequency, designated F0. The fundamental of this harmonic series, and the frequency spacing between the harmonics (Fig. 1b), are the spectral representation of the repetition rate of the sound, which is the inverse of the period observed in the waveform (Fig. 1a). In this example, all three of these acoustic variables have the value 200 cps. The dashed line connecting the tops of the harmonics in the lower panel shows the spectral envelope of the vowel.
The soft shouldered peaks that appear in the spectral envelopes of speech sounds are referred to as formants. Individual formants are normally designated by the frequency of the peak in the envelope, but the concept of a formant actually includes the shape and width of the envelope in the region of the peak, as well as the peak frequency. The shape that the set of formants collectively impart to the envelope in the spectral domain (Fig. 1b), is related to the shape of the damped resonance following each glottal pulse in the time domain (upper panel). The resonators in the bodies of musical instruments do not produce such distinctive formants as the resonances of the vocal tract, but the principles are the same for all pulse-resonance sounds. The shape of the spectral envelope corresponds to shape of the resonance in the waveform, and these shapes determine the distinctive sound quality, or timbre, of an instrument family. The set of harmonics that constitute the magnitude spectrum of a sound will be collectively referred to as the fine-structure of the spectrum to distinguish the magnitude spectrum (solid vertical lines) from its envelope (grey line).
Now consider the changes that occur in the tones of a specific instrument family as the size of the instrument increases. For example, consider what happens to vowel sounds as children grow up into adults. When children begin to speak they are about 0.85 meters tall, and their height increases by about a factor of two as they mature. In humans (and other animals) the source and filter are components of the body and both the source and the filter increase in size as young mature into adults. With regard to the source in humans, the glottal pulse rate (GPR) decreases by about an octave as the child grows up and the vocal cords become longer and more massive. The decrease in GPR is greater than an octave for males and less than an octave for females. With regard to the filter, vocal tract length increases in proportion to height (Fitch and Giedd 1999; Turner et al. 2009, their Fig. 4), and as a result, the formant frequencies of children’s vowels decrease by about an octave as they mature (Lee et al. 1999; Turner et al. 2009). The effects of growth on the fine-structure and envelope of the spectrum of a vowel are quite simple to characterize, provided the spectrum is plotted on a logarithmic frequency scale. In this case, the set of harmonics that define the fine structure of the spectrum (the vertical lines in Fig. 1b) moves, as a unit, towards the origin as the child matures into an adult. In speech, the pattern of formants that defines a given vowel type remains largely unchanged as people grow up (Peterson and Barney 1952; Lee et al. 1999; Turner et al. 2009). In other words, for a given vowel, the shape of the spectral envelope does not change as a child matures; rather, the spectral envelope just shifts slowly towards the origin, moving about an octave in total as a child matures into an adult. Thus, in the current example, the vowel remains an /a/, and does not change to an /e/, an /o/ or an /u/, as a child matures into an adult.
The ‘position of the spectral envelope of a sound on a logarithmic frequency axis’ is a property of a sound as it occurs in the air (Cohen 1993). For a pulse-resonance tone, this property is the acoustic scale of the filter that defines the resonances, and in the case of the human voice, it is closely related to vocal tract length (a physical variable). The ‘position of the fine-structure of the spectrum on a logarithmic frequency axis’ is also a property of a sound as it occurs in the air. For a pulse-resonance tone, it is the acoustic scale of the source, and in the case of the human voice, it is closely related to glottal pulse rate (a physical variable). The two acoustic scale variables are very useful for summarizing the effects of physical variables like mass and length on the perceptions produced by instruments, and, as a result, they play a prominent role in the remainder of the Chapter. For brevity, "the scale (S) of the source (s)" will be designated Ss, and "the scale (S) of the filter (f)" will be designated Sf. Turner et al. (2009) have recently reanalysed several large databases of spoken vowels and shown that almost all of the variability in formant frequency data that is not vowel-type information is Sf information. In order to reduce confusion between the two acoustic scale variables, Ss and Sf, we use cycles per second (cps) for the units of the scale of the source, Ss, and kiloHertz (Hz) for the scale of the filter, Sf, since it is the position of the spectral envelope and the unit for the frequency dimension of the magnitude spectrum is Hertz.
In summary, the important distinctions for the remainder of the Chapter are as follows:
1. The pulse rate of the source is a physical variable (e.g., GPR). It determines the repetition rate of the wave, which is known as the acoustic scale of the source, Ss. Repetition rate and Ss are both acoustic variables, and they in turn, determine the pitch of a pulse-resonance tone, which is a perceptual variable.
2. The size of a resonator in the body of a person or an instrument is a physical variable (like length or volume). It determines the rate at which the resonance oscillates in the waveform (van Dinther and Patterson 2006), and it determines the position of the spectral envelope along the frequency axis of the magnitude spectrum. It is known as the acoustic scale of the filter, Sf, and it is an acoustic variable that affects the perception of source size and the perception of register within an instrument family.
3. The shape of the spectral envelope determines the instrument family aspect of timbre.
4. Register is the term used to describe the joint action of the acoustic variables, Ss and Sf, on the perception of musical tones and instruments. The values of Ss and Sf reflect the physical sizes of the source and filter in the instrument, respectively, and so the perception of register is closely related to the perception of instrument size, or singer size. The vocal terms ‘soprano’, ‘alto’, ‘tenor’ and ‘bass’, are commonly used to specify register within families, as in ‘tenor sax’ or ‘bass fiddle.’
5. Finally, note that the voice differs from other instruments with respect to timbre, in one important regard. When vowel type changes, say, from /a/ to /i/, the shape of the envelope changes. The shape does not change with the size of the singer from child to adult, whereas the acoustic scale values, Ss and Sf, do. So, different vowels are like different instrument families in the perception of musical tones. One useful, and reasonable, way to think of vowels is that they form a cluster of instrument families (unified by the fact that they are perceived to come from humans) and that the differing timbres of the members of this family are somehow more similar to each other than they are to the timbres of other musical instrument families.
There are many meanings of the word ‘source’ in the description of sounds and how they are produced. To avoid confusion when reading this chapter, focus on what the source is a source of. So, when listening to an orchestra, one specific musician, and the instrument they are playing, jointly form the ‘source’ of one of the streams of musical tones that the orchestra is producing. In contrast, the ‘source’ of the energy in these tones is the arm of the musician, in the case of string instruments, and the diaphragm of the singer in the case of a vocalist. The ‘source’ in a source-filter system is a mechanism in between the source of the energy and the complete instrument in combination with the musician. In the source-filter description of tone production, the word ‘source’ means the mechanism that produces the stream of abrupt amplitude changes, or pulses, which subsequently excite the set of resonances in the body of the instrument, or the vocal tract of the singer. It is a specialised meaning of the word ‘source,’ but it is straightforward and it is the only use of the word ‘source’ in this chapter.
Throughout the current chapter, we use the word ‘noise’ as an acoustic term which refers to the fact that the waveform is aperiodic and the amplitude varies randomly with time. These sounds are typically heard as background sounds and do not draw your attention. There is, of course, another use of the word ‘noise’ which can occur in a musical context. For example, when there are competing sounds in an environment, perhaps a Mozart symphony on the radio and a rock concert on television, an individual listener might say, ‘Turn off that noise!’, referring to the source which is interfering with the source they are trying to hear. The current chapter is not concerned with multi-source environments and so the latter use of ‘noise’ does not arise in this chapter.
In the phrase ‘acoustic scale’ the word ‘scale’ is being used in the mathematical sense, rather than the musical sense. In mathematics ‘a scale factor’ is a number that tells you how big one value is relative to another. A musical scale is a set of frequency intervals within an octave. There is a connection between the two uses of scale inasmuch as the intervals of a musical scale (like a fifth) are defined by specific scale factors (~1.5 in the case of a fifth), but acoustic scale refers to a single value rather than a set of musical scale values.