Source versus object and environment versus ‘scene’

An introduction to auditory objects, events, figures, images and scenes

In hearing, “source” would appear to be the best analogue of what we mean by “physical object” or “external object” in vision. Sources are physical and give the correct impression of something that is localized in the external world. Accordingly, whereas the heart of the optical event, eo, in Eq. (4) is an object, o, the heart of the acoustic event, ea, in Eq. (6) is an ‘acoustic source within an object,’ sa(o). The extra level of reference is required because it does not seem entirely natural to acousticians to refer to ‘an acoustic event produced by an object’ There is too much ambiguity about where the energy comes from in the acoustic domain. It is more natural to refer to ‘an acoustic event from a source within an object,’ and typically, the phrase ‘within an object’ would be omitted unless the discussion was about how the source functions within the object. Footnote: There is an important asymmetry in vision and hearing in terms of the source of the energy that gives rise to visual events and auditory events. Visual events are typically based on light that is reflected from the surface of an external object; whereas auditory events are based on sound that is emitted from within the object. The asymmetry is discussed below in the section entitled “The energy source: reflected light; emitted sound,” where the analogy is extended visual events based on light sources in objects (e.g., fire flies) and auditory events based on sound reflected from objects (e.g., echoes).

Similarly, the use of the word ‘scene’ to describe the acoustic part of the external world is not entirely natural in everyday language. It would be more natural to refer to an acoustic event in the environment. The distinction would appear to involve the difference between the proportion of light from the environment that enters the eye and the proportion of sound from the environment that enters the ear. The eyes are always focused on a small portion of the light in the external environment – for which the word, scene, which has the sense of a restricted part of environment, seems appropriate. The ears are always receiving sound from the entire external environment – for which the word ‘environment’ seems more appropriate than ‘scene’. Accordingly, from this point onwards, ‘scene’ will be used in the description of visual experience and ‘environment’ will be used in the description of auditory experience. The concept of the ‘auditory scene’ described by Bregmann (198XXX) and its counterpart, the acoustic scene are discussed in Section XXX. These terms appear to be used in an effort to direct the reader to use the analogy with vision to understand the problems associated with listening in multi-source environments.

So the notation for the call of an animal that we do not immediately recognize should probably be revised to read

    IA  [ EA{ FAn(ea)}]   |<=A|    ea [ ea{ pen(sa)}] , 					(6)

In words, the acoustic environment, ea [.], contains an acoustic event, ea{.}, which is a sequence of patterns, pen(.), emitted from an acoustic source, sa, (in an external object). The auditory system converts the external event, |<=A|, into an auditory image, IA[.], which contains an auditory event, EA{.}, which is a sequence of auditory figures, FAn(.) of an acoustic event.

The notation for the video of the auditory processing of the call would be as follows:

    IV  [ EV{ FVn(FA) } ]   |<=V|     sl [ eo{ pen(FA) } ] 					(7)

In words, beginning with the external world, a scene with light, sl [.], contains an optical event, eo{.}, which is a sequence of patterns, pen(.), emitted by a projector – each of which is a simulation of an auditory figure. The visual system converts the external event, |<=V|, into a visual image, IV[.], which contains a visual event, EV{.}, which is a sequence of visual figures, FVn(.), each of which is a simulation of the corresponding auditory figure, FA, in the brain.

The basic element of the acoustic event and the auditory figure
