Category:Perception of Communication Sounds

From CNBH Acoustic Scale Wiki

Jump to: navigation, search
Introduction to the content of the wiki

Auditory perceptions are constructed in the brain from sounds entering the ear canal, in conjunction with current context and information from memory. It is not possible to make direct measurements of perceptions, so all descriptions of perceptions involve explicit, or implicit, models of how perceptions are constructed. The category Auditory Processing of Communication Sounds focuses on how the auditory system might construct your initial experience of a sound, referred to as the 'auditory image'. It describes a computational model of how the construction might be accomplished -- the Auditory Image Model (AIM). The category Perception of Communication Sounds focuses on the structures that appear in the auditory image and how we perceive them. These categories are intended to work as a pair, with the reader going back and forth as their interest shifts back and forth from the perceptions themselves and how the auditory system might construct our perceptions.

Roy Patterson

Contents

Introduction

This Perception category of the wiki focuses on our initial perception of a sound -- the auditory image that the sound produces (Patterson et al., 1992; Patterson, 1994). It is assumed that sensory organs and the neural mechanisms that process sensory data together construct internal, mental models of objects in the world around us; the visual system constructs a visual object from the light the object reflects and the auditory system constructs an auditory object from the sound the object emits, and these objects are combined with any tactile and/or olfactory information (which might possibly also be thought of as tactile and/or olfactory objects) to produce our experience of an external object. Our task as auditory neuroscientists is to characterize the auditory part of this object modelling process.

If the sound arriving at the ears is a noise, the auditory image is filled with activity, but it lacks organization and the details are continually fluctuating. If the sound has a pulse-resonance form, an auditory figure appears in the auditory image with an elaborate structure that reflects the phase-locked neural firing pattern produced by the sound in the cochlea (Patterson et al., 1992). Extended segments of sound, like syllables or musical notes, cause auditory figures to emerge, evolve, and decay in what might be referred to as auditory events (Patterson et al., 1992), and these events characterize the acoustic gestures of the external source. All of the processing up to the level of auditory figures and events can proceed without the need of top-down processing associated with context or attention (Patterson et al., 1995). It is assumed, for example, that auditory figures and events are produced in response to sounds when we are asleep. And, if we are presented with the call of a new animal that we have never encountered before, the early stages of auditory processing will still produce an auditory event, even though we (the listeners) might be puzzled by the event.

Subsequently, when alert, the brain may interpret the auditory event, in conjunction with events in other sensory systems, and in conjunction with contextual information that gives the event meaning. At this point, the event with its meaning becomes an auditory object, that is, the auditory part of the perceptual model of the external object that was the source of the sound. An introduction to auditory {objects, events, figures, images and scenes} is described in the paper entitled Homage à Magritte . It is a revised transcription of a talk presented at the Auditory Objects Meeting at the Novartis Foundation in London, 1-2 October 2007. It is intended to stimulate discussion of how we use, and should use, terms like auditory {images, figures, events, objects and scenes}.

An introduction to auditory objects, events, figures, images and scenes

Magritte's painting of a pipe with the famous inscription
Magritte's painting of a pipe with the famous inscription

The perception of acoustic scale in speech sounds

Discrimination of speaker size from syllable phrases

The JND for speaker size for five standard speakers and six syllable groups
The JND for speaker size for five standard speakers and six syllable groups

The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex and age

Perceived size a speaker as a function of GPR and VTL
Perceived size a speaker as a function of GPR and VTL

The processing and perception of size information in speech sounds

The robustness of vowel recognition to variation in GPR and VTL
The robustness of vowel recognition to variation in GPR and VTL

The robustness of speech communication to changes in acoustic scale

The robustness of bio-acoustic communication and the role of normalization

Scale-shift covariant auditory images
Scale-shift covariant auditory images

The robustness of human speech recognition to variation in vocal characteristics

Recognition performance: (A) for the training voice (0) and the new voices (1 – 8), (B) for CVs and VCs, and (C) for all voices. The wide bars show average performance across all syllable categories; the thin bars show performance separately for each consonant category.
Recognition performance: (A) for the training voice (0) and the new voices (1 – 8), (B) for CVs and VCs, and (C) for all voices. The wide bars show average performance across all syllable categories; the thin bars show performance separately for each consonant category.

Effects of voicing in the recognition of concurrent syllables

Recognition scores as a function of 1) SNR (top panels) and 2) SII (bottom panels). The left panels (A#) show syllable recognition; the middle panels (B#) show consonant recognition, and the right panels (C#) show vowel recognition. The solid lines show performance for voiced target syllables, and the dashed lines show performance for whispered syllables.
Recognition scores as a function of 1) SNR (top panels) and 2) SII (bottom panels). The left panels (A#) show syllable recognition; the middle panels (B#) show consonant recognition, and the right panels (C#) show vowel recognition. The solid lines show performance for voiced target syllables, and the dashed lines show performance for whispered syllables.

The interaction of the acoustic scale variables in speech perception

The interaction of vocal tract length and glottal pulse rate in the recognition of concurrent syllables

Surfaces showing how recognition performance improves as the GPR and VTL of the target speaker and the distracter diverge. The three surfaces show performance for three signal-to-noise ratios: +6, 0 and -6 dB
Surfaces showing how recognition performance improves as the GPR and VTL of the target speaker and the distracter diverge. The three surfaces show performance for three signal-to-noise ratios: +6, 0 and -6 dB

Comparison of relative and absolute judgements of speaker size

Size surface inferred from an experiment on speaker-size discrimination
Size surface inferred from an experiment on speaker-size discrimination

The perception of acoustic scale in musical tones

The perception of family and register in musical tones

Sixteen common instruments illustrating four registers within each of four instrument families
Sixteen common instruments illustrating four registers within each of four instrument families

The effect of phase in the perception of octave height  Access to this page is currently restricted

Attenuating the odd harmonics of complex tone shifts the pitch vertically up the pitch helix
Attenuating the odd harmonics of complex tone shifts the pitch vertically up the pitch helix

Reviewing the definition of timbre as it pertains to the perception of speech and musical sound - ISH 2009

The GPR-VTL plane with musical notation
The GPR-VTL plane with musical notation

The Domain of Tonal Melodies: Physiological limits and some new possibilities

The domain of melodic pitch
The domain of melodic pitch

Perception of acoustic scale and size in musical instrument sounds

Size surface inferred from an experiment on instrument-size discrimination
Size surface inferred from an experiment on instrument-size discrimination

The perception of the scale of the source as pitch

Pitch strength decreases as F0 and harmonic resolution increase in complex tones composed exclusively of high harmonics

Dual Profile images
Dual Profile images

Research projects

Revising the definition of timbre to make it useful for speech and musical sounds (BSA2008)

Vowel spectra illustrating the two components of acoustic scale in communication sounds
Vowel spectra illustrating the two components of acoustic scale in communication sounds

The role of GPR and VTL in the definition of speaker identity

Estimating the size and sex of a speaker from their speech sounds  Access to this page is currently restricted

Figure 1. Mechanisms involved in estimating speaker size. Bottom panel: Dual profile of a vowel showing the formant wavelengths and the pitch wavelength. Middle panel: Conversion of formant wavelengths to vowel type and acoustic scale of the vocal-tract filter. Top panel: conversion of acoustic scale values to a common code for height estimatation.
Figure 1. Mechanisms involved in estimating speaker size. Bottom panel: Dual profile of a vowel showing the formant wavelengths and the pitch wavelength. Middle panel: Conversion of formant wavelengths to vowel type and acoustic scale of the vocal-tract filter. Top panel: conversion of acoustic scale values to a common code for height estimatation.

Obligatory streaming based on acoustic scale difference

Size judgement iso-contour
Size judgement iso-contour


Published papers for the Category:Perception of Communication Sounds

Discrimination of Source Size

Discrimination of speaker size: Smith et al. (2005), Smith and Patterson (2005), Ives et al. (2005), Smith et al. (2007)

Discrimination of musical instrument size: van Dinther and Patterson (2006)

Robustness of Auditory Perception to Changes in Source Size

Robustness of speech recognition: Smith et al. (2005), Smith and Patterson (2005), Ives et al. (2005), Smith et al. (2007), Walters et al. (2007)

Robustness of music perception: van Dinther and Patterson (2006)

References

  • Ives, D.T., Smith, D.R.R. and Patterson, R.D. (2005). “Discrimination of speaker size from syllable phrases.” J. Acoust. Soc. Am., 118, p.3816-3822.  [1] [2]
  • Patterson, R.D. (1994). “The sound of a sinusoid: Time-interval models.” J. Acoust. Soc. Am., 96, p.1419-1428. [1]
  • Patterson, R.D., Allerhand, M.H. and Giguère, C. (1995). “Time-domain modeling of peripheral auditory processing: A modular architecture and a software platform.” J. Acoust. Soc. Am., 98, p.1890-1894. [1]
  • Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, C. and Allerhand, M. (1992). “Complex Sounds and Auditory Images”, in Auditory Physiology and Perception, Y Cazals L. Demany and Horner, K. editors (Pergamon Press, Oxford). [1] [2] [3]
  • Smith, D.R.R., Patterson, R.D., Turner, R.E., Kawahara, H. and Irino, T. (2005). “The processing and perception of size information in speech sounds.” J. Acoust. Soc. Am., 117, p.305-318. [1] [2]
  • Smith, D.R.R., Walters, T.C. and Patterson, R.D. (2007). “Discrimination of speaker sex and size when glottal-pulse rate and vocal-tract length are controlled.” J. Acoust. Soc. Am., 122, p.3628-3639.  [1] [2]
  • Smith, D.R.R. and Patterson, R.D. (2005). “The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex, and age.” J. Acoust. Soc. Am., 118, p.3177-3186.  [1] [2]
  • van Dinther, R. and Patterson, R.D. (2006). “Perception of acoustic scale and size in musical instrument sounds.” J. Acoust. Soc. Am., 120, p.2158-76.  [1] [2]
  • Walters, T.C., Gomersall, P.A., Turner, R.E. and Patterson, R.D. (2007). “Comparison of relative and absolute judgments of speaker size based on vowel sounds (A).” J. Acoust. Soc. Am., 121, p.3119-3119.  [1]

Pages in category "Perception of Communication Sounds"

The following 26 pages are in this category, out of 26 total.

A

C

D

E

M

O

P

R

S

T

T cont.

V

W

Personal tools