Pulse-Resonance Sounds and Acoustic Scale
From CNBH Acoustic Scale Wiki
The sounds that animals use to communicate at a distance, to declare their territories and attract mates, are typically pulse-resonance sounds. These sounds are ubiquitous in the natural world and in the human environment. They are the basis of the calls produced by most vertebrates (mammals, birds, reptiles, frogs and fish); they are also the basis of many invertebrate communication sounds, such as those of the crustaceans (e.g., popping shrimp) and insects (e.g., grasshoppers and cicadas). Although the structures used to produce pulse-resonance sounds can be quite elaborate, the mechanism is conceptually very simple. The animal develops some means of producing an abrupt pulse of mechanical energy which causes structures in the body to resonate. From the signal processing perspective, an isolated pulse does not contain much information. It is important insofar as it excites the resonators of the sounder and it excites them simultaneously. The resonances produced by each pulse provide distinctive information about the shape and structure of parts of the sender’s body, and thus, they provide distinctive information about the species emitting the sound. The resonance has less energy than the pulse but more information; it follows directly after the pulse and acts as though it is attached to it. So the location of the species-specific information in communication sounds is very predictable; it is tucked in behind the pulses.
The pulse-resonance mechanism in human speech
For humans, the most familiar communication sound is speech and the voiced parts of speech are classic pulse-resonance sounds. The vocal folds in the larynx at the base of the throat produce pulses of acoustic energy by momentarily impeding the flow of air from the lungs, and each of these ‘glottal pulses’ then excites a complex of resonances in the vocal tract above the larynx. The principles of vowel production will be illustrated with a synthetic three formant vowel. The upper panel of Figure 1 shows a brief segment of the waveform of the vowel which sounds like the vowel in the word ‘paw’ as spoken by a child.
The figure illustrates what is meant by a pulse-resonance sound and the fact that vowels are streams of glottal pulses with complex resonances attached to them. The glottal pulse rate (GPR) is 200 pulses per second (pps) in this case, so the time between glottal pulses, which is the ‘period’ of the wave, is 5 milliseconds (ms). The message of this communication sound is that the vocal tract of the speaker is in the shape that produces the vowel in ‘paw,’ and that message is contained in the shape of the resonance which is the same in each cycle for as long as the vowel is sustained. The lower panel of Figure 1 shows the long-term magnitude spectrum of the sound (the set of green vertical lines) and the spectral envelope, which is the bold blue line connecting the tops of the vertical green lines. The peaks in the spectral envelope are the formants of the vowel; the shape of the envelope in the spectral domain corresponds to the shape of the resonance in the time domain. [Source-filter should be introduced here and the relationship between the wave and the spectrum should be described as in the 'scale in music' lecture]
Pulse-resonance mechanisms in vertebrates
Broadly speaking, the pulse-resonance mechanism is the same in all mammals; vocal cords provide the excitation pulses and the vocal tract provides the resonant filter. There is an analogous mechanism in birds and in frogs; they both excite their air passages by momentarily interrupting the flow of air from the lungs. Fish with swim bladders often have muscles in the wall of the swim bladder (e.g. the weakfish, Cynoscion Regali) that produce brief mechanical pulses, referred to as ‘sonic twitches’ (Sprague 2000), and these twitches resonate in the walls of the swim bladder which makes the combination distinctive. Note that the sound producing mechanisms in these four groups of vertebrates (fish, frogs, birds and mammals) probably all evolved separately; the swim bladder mechanism in the fish did not evolve into the vocal tract mechanism of the land animals, and the vocal tract mechanisms of the land animals do not appear to have developed one from another. It appears that a variety of pulse-resonance mechanisms have arisen by convergent evolution. That is, when acoustic communication is advantageous, nature develops a source and a filter from the parts of the animal as they exist in the animal at that time.
Most animals today produce their communication sounds in the form of what might be called pulse-resonance ‘syllables’, that is, streams of regularly timed pulses, each of which carries a copy of the resonance to the listener. The syllables are on the order of 200-800 ms in duration, with a pulse rate in the region 10 to 500 Hz. The pulse rate rises a little at the onset of the sound, remains fairly steady during the central portion of the sound, and drops off with amplitude during the offset of the sound, which is typically longer and more gradual than the onset. A selection of four of these animal syllables is presented in Figure 2; they are the calls of (a) a Mongolar drummer, or Jamaica weakfish (Cynoscion, jamaicensis), (b) a North American Bull frog (Lithobates, catesbeiana), (c) a macaque (Macaca, mulatta), and (d) a human adult saying /ma/. (xxx Footnote regarding the origin of the sounds xxx).The sounds may be played and/or downloaded.
Mongolar drummer fish
The notes of sustained-tone instruments are like animal syllables with fixed pulse rates and comparatively flat temporal envelopes. Both of these classes of communication sound are completely different from the sounds of inanimate sources like wind and rain which are forms of noise. In the natural world, the detection of a pulse-resonance sound in syllable form immediately signals the presence of an animate source in the local environment.
Pulse-resonance mechanisms in musical instruments
The sustained-tone instruments of the orchestra (brass, strings and woodwinds) are also excited by non-linear processes that produce sharp pulses, and these pulses resonate in the air columns, or air cavities, of the instruments (Fletcher and Rossing 1998); so they also produce pulse resonance sounds (van Dinther and Patterson 2006). Combustion engines produce mini-explosions that resonate in the engine block; so they are also pulse-resonance sounds. They are not communication sounds in the normal sense, but they show that the world around us is full of pulse-resonance sounds, which the auditory system analyses automatically and effortlessly.
Size Information and Acoustic Scale in Communication Sounds
In the majority of animals that communicate by sound, nature has adapted existing body parts to create the structures that produce the communication sounds. In humans, as in almost all mammals, the vocal apparatus is based on the tubes that carry food and air from the entrance of the mouth and the entrance of the nose to the stomach and lungs, respectively. As the animal grows, these tubes have to get longer to keep the nose and mouth connected to the lungs and stomach. As the tubes get longer, the resonances in the vocal tract ring more slowly. This is a general physical principle of sound production; as things get larger or more massive, they vibrate more slowly. Similarly, as the vocal tract gets wider, the vocal cords get longer and more massive, which means that the glottal pulse rate decreases as the animal grows. The sound producing mechanism typically maintains its overall shape and structure as the individual grows. As a result, the set of messages that a species uses to communicate remain more or less the same as the animal grows, but the message is carried by sounds that vary in their resonance rate and their pulse rate.
The size of the source and the size of the filter
The synthetic /a/ vowel of Figure 1, which is the kind of vowel produced by a child, is presented along with three other synthetic /a/ vowels in Figures 3 and 4 to illustrate what happens to the waveforms (Figure 3) and spectra (Figure 4) of communication sounds as a mammal grows up. The vowel for the child is in panel (b) of each figure; the vowel for an adult is shown in panel (c) of each figure.
Comparison of the waveforms in panels (b) and (c) of Figure 3 shows that the GPR of the adult is lower (the period between pulses is longer); this is because the vocal cords are longer and more massive. The increase in the length and the mass of the vocal cords causes an increase in the acoustic scale of the excitation component of the wave, that is, an increase in the period of the glottal oscillation. The comparison also shows that the resonance rate of the adult is slower (the cycles of the resonance are longer); this is because the vocal tract of the adult is longer than that of the child. The increase in the length of the vocal cords causes an increase in the acoustic scale of the resonance component in the wave of the communication sound. These are the main effects of the increase in body size as observed in the waveform.
Comparison of the spectra in panels (b) and (c) of Figure 4 shows that the set of harmonics that define the fine structure of the spectrum moves, as a unit, towards the origin (left), as the child grows up and becomes an adult (the fundamental frequency, F0, or the difference frequency between the components decreases); this is because the vocal cords are longer and more massive in the adult. This is the form of a change in the acoustic scale of the excitation component of a communication sound; it shifts the fine-structure of the spectrum (green lines), as a unit, towards the origin as the animal gets larger and vice verse. The comparison also shows that the spectral envelope shifts towards the origin at the same time. This is the form of a change in the acoustic scale of the resonance component of a communication sound; it shifts the envelope of the spectrum, as a unit, towards the origin as the animal gets larger and vice verse.
In summary, there are two aspects of acoustic scale in communication sounds, the scale of the pulse rate which provides information about the length, mass and tension of the excitation source (the vocal cords), and the collective scale of the resonances in the vocal tract that are excited by the glottal pulses. The two aspects of acoustic scale information are combined with stored knowledge about sources in the brain to determine our perception of the size of the speaker.
Acoustic scale versus physical scale
In Figures 2 and 3, the adult vowel (panel c) was produced by increasing the GPR and the VTL of the child’s vowel (panel b) by the same proportion (a factor of 2). It is as if we had recorded a child’s voice and simulated the adult’s voice simply by reducing the speed of the recording on playback by a half; the ratio of the pulse rates of the vowels in panels (b) and (c) is the same as the ratio of the resonance rates in these panels. If the adult sound wave is AdultSound(t) and the juvenile sound wave is ChildSound(t), then AdultSound(t) = ChildSound(at), where a is the ratio of the playback rates, and a = 2. The example illustrates that acoustic scale in animal calls is a property of sounds as they occur in air, rather than a property of the vocal cords (such as their mass or their length) or a property of the vocal tract (such as its length or shape). There is, of course, a close correspondence between the physical variables of sound production and the acoustic variables of the sounds produced. As a result, the acoustic scale values in a sound convey information about speaker size to the listener. The acoustic variables are separate from the physiological variables, however, and we need to distinguish these two classes of variables because the auditory system operates on a sound without reference to the physics of the excitation mechanism or the resonance system in the source of the sound.
In communication sounds, although the pulse rate and resonance rate are both correlated with the size of the source (inversely), they are independent variables (with the constraint that the lowest frequency component of the resonance must remain above the excitation pulse rate, if the character of the sound is to be preserved). The vowels in panels (a) and (d) of Figures 2 and 3 illustrate the independence. The waveform of the vowel in panel (a) of Figure 2 has the pulse rate of the adult vowel (as in panel c) and the resonance rate of the juvenile vowel (as in panel a); it sounds a bit like a male dwarf saying the vowel in ‘paw’. The vowel in panel (d) has the complimentary combination – the pulse rate of the juvenile vowel with the resonance rate of the adult vowel; it sounds a bit like a counter tenor saying the vowel in ‘paw’. So the acoustic scale of the vocal-cord oscillation is largely independent of the acoustic scale of the vocal tract resonance (again, within the constraint that the lowest component of the vocal-tract resonance must remain greater than the pulse rate). Figure 3 shows the changes in acoustic scale as they appear in the spectra of the sounds. In panel (a), the fine structure of the spectrum is in the same position as it is for the adult spectrum in panel (c), and the envelope of the spectrum is in the same position as it is for the juvenile spectrum in panel (b). The spectrum in panel (d) has the complimentary combination of features; the fine structure of the spectrum is in the same position as for the juvenile spectrum in panel (b), and the envelope is in the same position as for the adult spectrum in panel (c). So, this set of four vowels illustrate that the acoustic scale of the excitation source is largely independent of the acoustic scale of the vocal tract resonance.
Acoustic scale and body size
The pulse rate and resonance rate of a sound do not describe the size of a source in absolute terms. They are acoustic variables that describe properties of the sound wave as it travels from the sender to the listener. The acoustic variables change in a predictable way as the resonators in the sender’s body grow. However, the brain does not have the equations required to convert a pulse rate into a mass or a length, and even if it had the equation, there would still be difficulties. The information about all of the physical variables involved in the production of the sound has to be transmitted to the listener via only two acoustic variables, pulse rate and resonance rate. The acoustic variables often vary with the product of several physical variables like mass and length, so a given pulse rate could be produced by many different combinations of mass and length. So what the listener receives is one pulse rate value that summarizes the aggregate effect of all of the physical variables on the vibration source, and one resonance rate value that summarizes the aggregate effect of another set of physical variables on resonance rate.
Moreover, the brain is not actually interested in the mass, length or volume of the physical components of the sounder, such as the size of the vocal cords or the length of the vocal tract. What matters to the listener is the size of the sender’s body – some perceptual or cognitive combination of their height, mass and volume – and, for a given species, whether one sender is much bigger or smaller than another. In order to estimate the sender’s body size, a more central mechanism must combine the pulse-rate and resonance-rate information with stored knowledge about the distribution of body sizes for the species in question.