From CNBH Acoustic Scale Wiki
Roy Patterson , Etienne Gaudrain, Tom Walters
Recent research on the role of acoustic scale in the perception of sound suggests that the frequency information observed in the magnitude spectrum of a sound is segregated by the auditory system into three parts: the spectral envelope shape, the acoustic scale of the source, Ss, and the acoustic scale of the filter, Sf. The spectral envelope shape determines the basic timbre category of a sound, which in music is the instrument family, and in the singing voice expands to produce the different vowel types. These timbre categories are largely independent of the acoustic scale variables, Ss and Sf. In speech, these two acoustic scale variables jointly determine much of the static voice quality of the speaker, and thus our perception of a speaker’s sex and size (e.g., Smith and Patterson 2005). This suggests that it would be useful to distinguish between the ‘what’ and ‘who’ of timbre in speech, that is, what is being said, and who is saying it. With regard to the timbre of musical tones, the distinction between envelope shape and the acoustic scale variables provides an explanation for the distinction between family timbre (envelope shape) and register timbre (Ss and Sf). In both speech and music, Ss exhibits a limited degree of independence from timbre inasmuch as (a) variation of GPR to produce prosodic distinctions does not change the perception of who is speaking, and (b) variation of the pulse rate in musical instruments to produce a melody does not change the perception of the instrument that is playing. There are, however, limits to the independence; large changes in pulse rate produce changes in the perception of who is speaking or which member of an instrument family is playing.
The authors were supported by the UK Medical Research Council (G0500221; G9900369) during the preparation of this chapter. They would like to acknowledge useful discussions with Jim Woodhouse on the production of notes by the violin, and on acoustic scaling in the string family.
ANSI (1994) American national standard acoustical terminology, ANSI S1.1-1994 (R1999). New-York: American National Standard Institute.
Benade AH (1976) Fundamentals of Musical Acoustics. Oxford University Press.
Benade AH, Lutgen SJ (1988) The saxophone spectrum. J Acoust Soc Am 83:1900-1907.
de Cheveigné A (2005) Pitch Perception Models, In: Plack CJ, Oxenham AJ, Fay RR, Popper AN (eds), Pitch: Neural Coding and Perception. Springer, pp. 169-233.
Chiba T, Kajiyama M (1941) The vowel, its nature and structure. Tokyo: Tokyo-Kaiseikan Pub Co.
Cohen L (1993) The scale representation. IEEE Trans Sig Proc 41:3275-3292.
van Dinther R, Patterson RD (2006) Perception of acoustic scale and size in musical instrument sounds. J Acoust Soc Am 120:2158-76.
Fant G (1960) Acoustic Theory of Speech Production. The Hague: Mouton De Gruyter.
Fitch WT, Giedd J (1999) Morphology and development of the human vocal tract: A study using magnetic resonance imaging. J Acoust Soc Am 106:1511-1522.
Fitch WT, Reby D (2001) The descended larynx is not uniquely human. Proc R Soc London Ser B 268:1669-1675.
Fletcher NH (1978) Mode locking in nonlinearly excited inharmonic musical oscillators. J Acoust Soc Am 64:1566-1569.
Fletcher NH, Rossing TD (1998) The Physics of Musical Instruments. New York: Springer.
Glasberg BR, Moore BCJ (1990) Derivation of auditory filter shapes from notched-noise data. Hear Res 47:103-138.
Glasberg BR, Moore BCJ (2002) A model of loudness applicable to time-varying sounds. J Audio Eng Soc 50:331-342.
Helmholtz HLF (1875) On the Sensations of Tone as a Physiological Basis for the Theory of Music. London: Longmans, Green and Co.
Hutchins CM (1967) Founding a family of fiddles. Phys Today 20:23-37.
Hutchins CM (1980) The new violin family. In: Benade AH (ed), Sound Generation in Winds, Strings, Computers. The Royal Swedish Academy of Music, pp. 182-203.
Irino T, Patterson RD (2001) A compressive gammachirp auditory filter for both physiological and psychophysical data. J Acoust Soc Am 109:2008-2022.
Irino T, Patterson RD (2006) A Dynamic Compressive Gammachirp Auditory Filterbank. IEEE Trans Audio Speech Lang Processing 14:2222-2232.
Ives DT, Patterson RD (2008) Pitch strength decreases as F0 and harmonic resolution increase in complex tones composed exclusively of high harmonics. J Acoust Soc Am 123:2670-2679.
Ives DT, Smith DRR, Patterson RD (2005) Discrimination of speaker size from syllable phrases. J Acoust Soc Am 118:3186-3822.
Kawahara H, Irino T (2004) Underlying principles of a high-quality speech manipulation system STRAIGHT and its application to speech segregation. In: Divenyi PL (ed), Speech separation by humans and machines. Kluwer Academic, pp.167-180.
Kennedy M (1985) The Oxford dictionary of music. Oxford University Press.
Krumbholz K, Patterson RD, Pressnitzer D (2000) The lower limit of pitch as determined by rate discrimination. J Acoust Soc Am 108:1170-1180.
Krumbholz K, Patterson RD, Nobbe A, Fastl H (2003) Microsecond temporal resolution in monaural hearing without spectral cues?. J Acoust Soc Am 113:2790-2800.
Lee S, Potamianos A, Narayanan S (1999) Acoustics of children’s speech: developmental changes of temporal and spectral parameters. J Acoust Soc Am 105:1455-1468.
Licklider JCR (1951) A duplex theory of pitch perception. Experientia 7:128-133.
McIntyre ME, Schumacher RT, Woodhouse J (1983) On the oscillations of musical instruments. J Acoust Soc Am 74:1325-1345.
Meddis R, Hewitt M (1991) Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: Pitch identification. J Acoust Soc Am 89:2866-2882.
Molin NE, Lindgren L-E, Jansson EV (1988) Parameters of violin plates and their influence on the plate modes. J Acoust Soc Am 83:281-291.
Moore BCJ, Glasberg BR (1983) Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. J Acoust Soc Am 74:750-753.
Patterson RD (1994a) The sound of a sinusoid: Spectral models. J Acoust Soc Am 96:1409-1418.
Patterson RD (1994b) The sound of a sinusoid: Time-interval models. J Acoust Soc Am 96:1419-1428.
Patterson RD, Irino T (1998) Modeling temporal asymmetry in the auditory system. J Acoust Soc Am 104:2967-2979.
Patterson RD, Robinson K, Holdsworth J, McKeown D, Zhang C, Allerhand M (1992) Complex Sounds and Auditory Images. In: Y Cazals L. Demany, Horner K (eds), Auditory Physiology and Perception. Oxford: Pergamon Press.
Patterson RD, Allerhand MH, Giguère C (1995) Time-domain modeling of peripheral auditory processing: A modular architecture and a software platform. J Acoust Soc Am 98:1890-1894.
Patterson RD, Yost WA, Handel S, Datta AJ (2000) The perceptual tone/noise ratio of merged iterated rippled noises. J Acoust Soc Am 107:1578-1588.
Patterson RD, Unoki M, Irino T (2003) Extending the domain of center frequencies for the compressive gammachirp auditory filter. J Acoust Soc Am 114:1529-1542.
Patterson RD, Smith DDR, van Dinther R, Walters TC (2008). Size Information in the Production and Perception of Communication Sounds. In: Yost WA, Popper AN, Fay RR (eds), Auditory Perception of Sound Sources. New-York: Springer, pp. 43-75.
Peterson GE, Barney HL (1952) Control Methods Used in a Study of the Vowels. J Acoust Soc Am 24:175-184.
Pressnitzer D, Patterson RD, Krumbholtz K (2001) The lower limit of melodic pitch. J Acoust Soc Am 109:2074-2084.
Schelleng JC (1963) The Violin as a Circuit. J Acoust Soc Am 35:326-338.
Slaney M, Lyon RF (1990) A perceptual pitch detector. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 357-360.
Smith DRR, Patterson RD (2005) The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex, and age. J Acoust Soc Am 118:3177-3186.
Smith DRR, Patterson RD, Turner RE, Kawahara H, Irino T (2005) The processing and perception of size information in speech sounds. J Acoust Soc Am 117:305-318.
Sprague MW (2000) The single sonic muscle twitch model for the sound-production mechanism in the weakfish, Cynoscion regalis. J Acoust Soc Am 108:2430-2437.
Turner RE, Walters TC, Monaghan JJM, Patterson RD (2009) A statistical formant-pattern model for estimating vocal-tract length from formant frequency data. J. Acoust. Soc. Am. 125:2374-2386.
Unoki M, Irino T, Glasberg B, Moore BC, Patterson RD (2006) Comparison of the roex and gammachirp filters as representations of the auditory filter. J Acoust Soc Am 120:1474-1492.
Yost WA (1996) Pitch of iterated rippled noise. J Acoust Soc Am 100:511-518.
Yost WA (2009) Pitch Perception. Atten, Percept Psychophys (in press).
Yost WA, Patterson RD, Sheft S (1998) The role of the envelope in processing iterated rippled noise. J Acoust Soc Am 104:2349-2361.
Zwicker E (1974) On the psychophysical equivalent of tuning curves. In: Zwicker E, Terhardt E (eds), Facts and Models in Hearing. New-York: Springer-Verlag, pp. 132-140.