Gammachirp Auditory Filters

From CNBH Acoustic Scale Wiki

Jump to: navigation, search

The processing of Temporal Fine Structure

The text and figures that appear on this page were subsequently published in:

Patterson, R.D., Unoki, M. and Irino, T. (2003). “Extending the domain of center frequencies for the compressive gammachirp auditory filter.” J. Acoust. Soc. Am., 114, p.1529-1542.

The gammatone function was introduced by Johannesma (1972) and de Boer (1975) to describe cochlear impulse responses measured in cats with the reverse-correlation, or ‘revcor’ technique. The envelope of the impulse response was approximated with the gamma distribution from statistics; the carrier, or fine structure, of the impulse response was a sinusoid, or tone, at the center frequency of the filter (Johannesma, 1972; de Boer, 1975). Subsequently, Schofield (1985) reported that the magnitude response of this gammatone filter could explain the masking data gathered by Patterson (1976) to derive the magnitude response, or ‘shape’, of the auditory filter psychophysically. Patterson et al. (1987) then showed that the magnitude response of the gammatone filter was very similar to that of the rounded exponential, or roex, auditory filter which had previously been used to explain human masking data by a number of modelers (for a review, see Patterson and Moore, 1986). In essence, this meant that the gammatone auditory filter could be expected to produce a reasonable time-domain simulation of cochlear filtering in humans, and this led to the development of a succession of gammatone auditory filterbanks to simulate cochlear filtering and study the effects of phase locking on auditory perception (e.g., Meddis and Hewitt, 1991; Patterson et al., 1992; Slaney, 1993; Cooke, 1993; and Patterson et al., 1995). This paper describes recent attempts to extend the dynamic range and frequency domain of the gammatone filterbank using a compressive gammachirp auditory filter.

A. Level-dependent versions of the gammatone auditory filter

The original implementations of the gammatone auditory filterbank were linear and the passband of the gammatone filter was essentially symmetric on a linear frequency scale, and so they were limited to applications involving moderate stimulus levels where the auditory filter is roughly symmetric on a linear frequency scale. The masking patterns produced by narrowband noises (e.g. Egan and Hake, 1950) showed that the auditory filter was asymmetric at high stimulus levels with the low-frequency side shallower than the high-frequency side. The asymmetry observed in masking patterns overestimates the asymmetry of the auditory filter (Patterson, 1974); thresholds above the masker frequency are higher than those below the masker partly just because the auditory filter is broader at frequencies above the masker. In general, the notched-noise method is used to restrict listening to a narrow range of auditory filters, and the asymmetric notched-noise method (Patterson and Nimmo-Smith, 1980) was extended to measure the asymmetry of the roex auditory filter at relatively high stimulus levels (e.g. Lutfi and Patterson, 1984; Moore et al., 1990; Rosen and Baker, 1994). The question, then, was how to introduce filter asymmetry within the gammatone framework and so produce a time-domain, level-dependent filterbank.

Physiological measurements of basilar membrane motion confirmed that frequency selectivity was asymmetric at high stimulus levels (e.g., Pickles, 1988; Ruggero, 1992), and several level-dependent versions of the gammatone filter have recently been developed to explain physiological data. Carney (1993) paired a linear gammatone filter with a parallel non-linear gammatone and used the system to explain revcor data. The linear gammatone had a fixed, wide bandwidth which was intended to simulate the high-level response of the passive basilar membrane; the level-dependent gammatone had a narrow bandwidth which was used to introduce a narrow passband into the composite filter at lower stimulus levels. It remained the case, however, that the magnitude response of the composite filter was largely symmetric in frequency. The latest version of this physiological gammatone filterbank is described in Zhang et al. (2001). Lyon (1996, 1997) developed a ‘one-zero’ version of the gammatone filter to introduce asymmetry in frequency and so explain the physiological tuning curves of Ruggero (1992). Meddis et al. (2001) and Lopez-Poveda and Meddis (2001) developed a Dual-Resonance, Non-Linear (DRNL) filter system that employs gammatone filters of different widths in the two routes to explain the compression and suppression observed in small mammals and some human masking data. And, Plack et al. (2002) have demonstrated that this DRNL system can explain the release from suppression observed in forward masking.

B. The gammachirp auditory filter

The impulse response of a filter with an asymmetric magnitude response has a frequency glide or ‘chirp’ in the carrier term, and the response of the basilar membrane is known to exhibit such a chirp (Møller and Nilsson, 1979). Accordingly, Irino and Patterson (1997) introduced a chirp into the carrier term of the gammatone function to produce a ‘gammachirp’ auditory filter which was then able to simulate the level-dependent asymmetry as it appears in the data of Lutfi and Patterson (1984), Moore et al. (1990) and Rosen and Baker (1994). However, in this ‘analytic’ gammachirp, the rate of chirp at the start of the impulse response varies with level, and recent physiological data (de Boer and Nuttal, 1997, 2000; Recio et al., 1998; Carney et al., 1999) have shown that the rate of chirp does not vary with stimulus level. Moreover, the analytic gammachirp filter cannot account for the level-dependent gain and compression observed physiologically around the peak frequency (Pickles, 1988; Recio et al., 1998). To overcome these problems, Irino and Patterson (2001) modified the architecture of the gammachirp filter to produce explicit changes in the gain and compression of the filter with stimulus level in the region of the passband of the filter, while at the same time ensuring that the form of the chirp did not change with level, as required by the physiological data. They showed that this ‘compressive’ gammachirp auditory filter could explain both the chirp in the physiological revcor data of Carney (1999) and the non-linearities in the human masking data of Rosen and Baker (1994). At this point, then, we turned to the two new studies on human masking by Baker et al. (1998) and Glasberg and Moore (2000).

It is currently very difficult to compare the advantages and disadvantages of the different descendants of the gammatone filterbank because of the fundamental differences in their architectures. The gammachirp is a cascade filter system; the DRNL is a parallel filter system, and Carney’s model is a parallel filter system in which one channel is a cascade filter system. It is also the case that the physiological models have a large number of parameters, which makes it difficult to do global fits and assess the variation of parameter values with frequency quantitatively as in the current study. By the same token, the physiological models are more flexible and so could undoubtedly describe some data with more accuracy than the compressive gammachirp filter.

The processing of Temporal Fine Structure
Personal tools