# The Gammatone Auditory Filterbank

The gammatone auditory filter is defined in the time domain and it is essentially a section of a cosine wave, cos(2πfct + Ø), whose rate of onset is specified by a power function, t(n-1), and whose rate of offset is determined by a decaying exponential function, e-2πbt. Thus the gammatone filter, gt(t), is,

• gt(t) = a . t(n-1) .cos(2πfct + Ø) . e-2πbt. (t > 0)

The frequency of the cosine wave, fc, is set to the centre frequency of the auditory filter. The first term, a, is simply a scalar that specifies the gain of the filter. The terms are normally rearranged as follows to emphasize the envelope and the carrier of the impulse response.

• gt(t) = a . [t(n-1)e-2πbt] . cos(2πfct + Ø), (t > 0) [Eq. 2.2.1]

The term in the square brackets is the envelope and it is the gamma function from statistics; the cosine term provides the fine structure of the impulse response. When the impulse response is convolved with a waveform, the wave at the output of the convolution emphasizes frequencies in the region of fc and it attenuates activity progressively as its frequency deviates from fc. Since the cosine term sounds like a tone when presented as an acoustical wave, the function is referred to as a "gammatone filter" (Aertsen and Johannesma, 1980). When the parameters of the variables in the function are chosen to reflect the operation of the cochlea at a point along the basilar membrane, then it is referred to as a "gammatone auditory filter".

## Contents

### Motivation

The original motivation for the gammatone function as model of auditory frequency selectivity was threefold: physiological, psychological and practical.

1. Physiological: The gammatone function provides an excellent fit to the impulse response of the basilar membrane measured physiologically in cats. The physiological impulse response is obtained with the revcor technique developed by de Boer and described in detail in de Boer and de Jongh (1978). Briefly, the cat is presented with a wideband noise and the response of a primary fiber in the auditory nerve is recorded with a micro-electrode. The noise waveform is then correlated with the stream of neural impulses that constitute the response to the noise, and the result of this "reverse correlation," or "revcor", provides a measure of the impulse response of the basilar partition at the point where the primary fibre is located. A concise description of the technique appears in Pickles (1988, pp 95-99). Carney and Yin (1988) fitted the gammatone function to "revcor" data from more than 150 individual fibers in cats and showed that the gammatone function does indeed provide a very good fit to "revcor" data over a wide range of centre frequencies and levels.

The dynamic range of the "revcor" technique is limited to about 25 dB and so it does not provide reliable information about the tails of the filter outside the passband. However, for everyday sounds, the output of the filterbank is largely determined by the shape of the passband of the filter, and so it is applicable to a large range of sounds. The revcor technique has the distinct advantage of being able to measure the passband of the filter at the stimulus levels where we listen to music and speech. Both physiological data (Evans, 1977; Carney et al., 1999) and psychological data (Unoki et al., 2006) indicate that the passband of the auditory filter is reasonably independent of level. The revcor technique also has the advantage of eliciting the data with a sound that has a uniform distribution of energy. The tuning curve is elicited with a point source, a sinusoid, whose level is confounded with its distance from the centre frequency of the filter.

2. Psychological: The amplitude characteristic of the passband of the gammatone filter is very similar to that of the roex filter commonly used to summarise frequency-selectivity data measured psychoacoustically in humans. [[[work mark Sat, 8 May 2010]]]

The amplitude characteristic of the human auditory filter is commonly measured with a notched-noise technique. The listener is required to detect a sinusoidal signal presented with a broadband noise masker which has a deep notch in the region of the signal. Signal threshold is measured as the width of the notch is varied to assess the selectivity of the auditory filter centred on the signal (Patterson, 1976). Patterson and Nimmo-Smith (1980) showed that the shape of the auditory filter is well described by a pair of back-to-back exponential functions, if the sharp peak at the centre frequency of the function is rounded off, and the sharp descent of the exponentials is rounded up at frequencies outside the passband. They developed a family of rounded-exponential filters which have, subsequently, been used to predict noise masking over a wide range of frequencies and levels in simultaneous and forward masking conditions; for a review see Patterson and Moore (1986).

In order to implement an auditory filterbank, one must have a phase characteristic as well as an amplitude characteristic for the filters. Although the notched-noise technique provides a good measure of the amplitude characteristic of the auditory filter, it does not provide any information concerning the phase characteristic. In an effort to overcome this problem, Schofield (1985) noted that the amplitude characteristic of the gammatone filter provided a good fit to the data from the original notched-noise experiment (Patterson, 1976), and suggested the gammatone filter as a model of the human auditory filter. The gammatone filter has a 'minimum' phase characteristic which seemed a reasonable assumption for the human auditory system at the time. It now seems clear that the phase characteristic is not strictly minimum phase (Kohlrausch and xxx, 1992) but it remains a reasonable assumption in a wide range of conditions. Following Schofield's lead, Patterson, Nimmo-Smith, Holdsworth and Rice (1988) compared the amplitude characteristic of the gammatone filter with that of the most comprehensive roex filter, roex(p, w, t). They found that a gammatone with a low order (2-3) provides the best fit over a large dynamic range (60 dB) but that a gammatone with a slightly higher order (n=4) provides the best fit to the passband of the roex filter which is more important for explaining masking in notched noise.

3. Computational: There is a very efficient recursive, digital filter which is highly stable and which provides a particularly good approximation to the gammatone filter both in amplitude and phase.

While investigating the form of the gammatone filter in the frequency domain Holdsworth realised a ) that an nth-order gammatone filter can be approximated by a cascade of n, identical, first-order gammatone filters, and b) that the first-order gammatone filter can be approximated by a particularly efficient, recursive digital filter. The implementation of the gammatone filterbank in AIM is described in Holdsworth, Nimmo-Smith, Patterson and Rice (1988). On average, across the frequency range of speech, the recursive gammatone filter is about an order of magnitude quicker than convolution of the sound with the gammatone impulse response. The computational load of AICAP is dominated by the filterbank; the combined load of the remaining stages is less than that of the filterbank; so an order of magnitude saving in the filterbank stage has a large effect on the overall performance of the model. Extensive reviews of the gammatone function as a filter are presented in Slaney (1993 xxx), Darling (199 xxx ). The relationship between the gammatone filter and cochlear mechanics is described in and Lyon (1996) xxx.

### The Bandwidth of the Auditory Filter

There is a wealth of information in the literature concerning the bandwidth of the roex auditory filter: it increases monotonically with filter centre frequency and it is greater at high stimulus levels; it increases slowly with age and it is broader in listeners with hearing impairment of cochlear origin. A review of the roex filter in simultaneous masking is presented in Patterson and Moore (1986). The main effects are similar in forward masking conditions but the bandwidth is usually found to be a little narrower than in simultaneous masking. A review of the roex filter in forward masking is presented in O'Loughlin and Moore (1986). Glasberg and Moore (1990) have recently reviewed existing data on the roex filter for normal listeners and concluded that there is a broad middle range of stimulus levels and ages where the relationship between the Equivalent Rectangular Bandwidth of the filter and its centre frequency is well represented by

ERB = 24.7 + 0.108 . fc (2)

In words, the bandwidth is roughly 25 Hz plus a little over 10% of the centre frequency. The relationship can also be written as

ERB = 24.7 + fc/9.65 (2a)

to stress the fact that the auditory system is nearly a 'constant Q' system, that is, a system in which the bandwidth is a fixed proportion of the centre frequency. Physical systems often have this characteristic. In engineering texts the proportion, Q, is specified as fc/bandwidth so that more selective systems are associated with larger Q's. Thus, in engineering terms, the auditory filterbank is a constant Q system with a restriction on minimum bandwidth at low centre frequencies, and the characterisitic of the system, Q, has a value of about 10.

Returning to the gammatone filter, Holdsworth et al (1988) have shown that, when the centre frequency is large relative to the bandwidth (which it is in the case of the auditory system), the bandwidth of the gammatone filter is proportional to b, the decay parameter in the exponential term of Equation 1, and the proportionality constant, a, depends solely on n, the order of the filter. That is,

ERB = anb (3)

When the bandwidth of the gammatone filter is matched to that of the roex filter, the amplitude characteristcs of their passbands are essentially indistiguishable. Thus, to tune the gammatone filterbank for use with normal human listeners we need only calculate b from equations 2 and 3. Specifically,

b = 24.7/an + 0.108.fc/an (4)

Holdsworth et al (1988) provide an analytical expression for the ERB of the gammatone filter and a table of proportionality constants for n in the range 1-9. When the order is 4, an is 0.982, b is 1.019 ERB, and

b = 25.2 + 0.110.fc (4a)

Holdsworth et al (1988) also provide an analytical expression for the 3-dB bandwidth of the gammatone filter, and proportionality constants for calculating b from 3-db bandwidths. When the order is 4, the 3-dB bandwidth of the gammatone filter is 0.87 times the ERB. Equations 2 and 4a provide a complete specification of a gammatone filterbank for order 4, if we include the common assumption that the filters are distributed across frequency in proportion to filter bandwidth. The resulting gammatone filterbank will predict threshold for signals masked by stationary noises in the majority of cases encountered in the everyday world.

### The frequency dimension of auditory representations

The ordinate on the figures portraying simulated basilar motion specifies the centre frequencies of the gammatone filters in the filterbank. Specifically, the centre frequency is the value where the zero line of the filtered wave intersects the ordinate. The unit of measurement on the frequency axis is ERB's, so the the auditory filters are distributed across frequency in proportion to their bandwidth. This is a traditional assumption introduced by Fletcher (1953) and Zwicker et al. (1957). It is based on physiological data relating the frequency of a sinusoidal stimulus to the position of maximum response on the basilar partition, and to the selectivity of the partition at that point. Greenwood (1961) reviewed the early frequency/position and selectivity data for mammals ranging in size from mice, to humans, to elephants and concluded that the integral of the critical-band curve did, indeed, provide a good fit to the frequency/position data. As a result, he used the existing selectivity data to obtain his now classic frequency/position curve.

Subsequently, Zwicker and Terhardt (1980) and Moore and Glasberg (1983) reviewed sets of selectivity data for humans and integrated their respective versions of the critical-band scale to generate what are referred to as 'critical-band-rate' functions; that is, Bark-rate and ERB-rate functions that relate auditory-filter centre frequency to frequency. These critical-band-rate functions provide, arguably, the best scales for the frequency dimension in auditory representations of sounds. Recently, Greenwood (1990) updated his review of mammalian data on frequency/position and selectivity, and the paper includes a section devoted to the new bandwidth estimates obtained with humans in filter-shape experiments. Greenwood concludes that the ERB-rate scale provides a slightly better fit to the physiological data. He also agrees with Moore and Glasberg (1983) that, in humans, each ERB corresponds to about 0.9 mm of length along the basilar partition.

Glasberg and Moore (1990) have also updated their review of auditory filter bandwidths with new data at low and high centre frequencies, and this has led them to remove the quadratic term from their earlier ERB function. The result is the very simple crtical-band function presented in Equation (2) which they integrated to obtain the following ERB-rate function

It is this frequency scale that appears on the ordinate in figures displaying output from the normal gammatone filterbank.

### The response of a filterbank with Bark-scale bandwidths

The Fidelity/Useability Tradeoff in the Spectral Analysis: The origins of AICAP should be introduced vry briefly at the end of the Intro, to establish the concept of a psychological model of hearing, and it should be noted that we will briefly discuss the fidelity/useability tradeoff at the end of each chapter. The auditory filterbank is a classic functional model; it is a signal processing module that transforms a sound into an output that is similar to the output that our physiological models tell us would come from the basilar membrane and outer hair cells in response to that sound. It performs a function that is similar to that performed by the basilar membane, although the internal architecture of the auditory filterbank, and our conception of spectral analysis as the action of a set of digital filters, are radically different from the physiological processes they simulate. Thinking physiologically, sounds produce a travelling wave in the basilar partition. In this case, it is natural to think of the system's response in terms of partition displacement as a function of distance along the partition, and to employ a displacement/position representation of the response. Thinking functionally, the system breaks a sound down into its frequency components. In this case, it is natural to think of the system's response in terms of a spectrally ordered set of waves, and to use an amplitude/time representation of the response.

The gammatone auditory filterbank is particularly appropriate for simulating the cochlear filtering of broadband sounds like speech and music provided the sound level is in the broad middle range of hearing. The gammatone is a linear filter and the magnitude characteristic is approximately symmetric on a linear frequency scale. The auditory filter is roughly symmetric on the same scale when the sound level is moderate; however, at high levels the highpass skirt of the filter becomes shallower and the lowpass skirt becomes sharper. For broadband sounds, the shape of the surface of the filterbank output is largely determined by sound energy that passes through the passbands of the individual auditory filters. In this case, the effect of filter asymmetry is only noticeable at the very highest levels (over 85 dBA) where it causes a gradual smearing of the surface features. For narrowband sounds, the precise details of the filter shape can become important. For example, when a tonal signal is presented with a narrowband masker some distance away in frequency, the accuracy of the simulation will deteriorate as the frequency separation increases.

This is where we put the Transmission-line filterbank (Lyon, 1982). It provides a great alternative which was better motivated at the time it was introduced but which now needs modification.

## References

• Fletcher, H. (1953). Speech and hearing in communication. (Van Nostrand). [1]
• Glasberg, B.R. and Moore, B.C.J. (1990). “Derivation of auditory filter shapes from notched-noise data.” Hear. Res., 47, p.103-138.  [1]
• Greenwood, D.D. (1961). “Critical bandwidth and the frequency coordinates of the basilar membrane.” J. Acoust. Soc. Am., 33, p.1344-1356. [1]
• Greenwood, D.D. (1990). “A cochlear frequency-position function for several species - 29 years later.” J. Acoust. Soc. Am., 87, p.2592-2605. [1]
• Irino, T. and Patterson, R.D. (1997). “A time-domain, level-dependent auditory filter: The gammachirp.” J. Acoust. Soc. Am., 101, p.412-419. [1]
• Irino, T. and Patterson, R.D. (2001). “A compressive gammachirp auditory filter for both physiological and psychophysical data.” J. Acoust. Soc. Am., 109, p.2008-2022. [1]
• Moore, B.C.J. and Glasberg, B.R. (1983). “Suggested formulae for calculating auditory-filter bandwidths and excitation patterns.” J. Acoust. Soc. Am., 74, p.750-753.  [1] [2]
• Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, C. and Allerhand, M. (1992). “Complex Sounds and Auditory Images”, in Auditory Physiology and Perception, Y Cazals L. Demany and Horner, K. editors (Pergamon Press, Oxford). [1]
• Patterson, R.D., Unoki, M. and Irino, T. (2003). “Extending the domain of center frequencies for the compressive gammachirp auditory filter.” J. Acoust. Soc. Am., 114, p.1529-1542. [1]
• Unoki, M., Irino, T., Glasberg, B., Moore, B.C. and Patterson, R.D. (2006). “Comparison of the roex and gammachirp filters as representations of the auditory filter.” J. Acoust. Soc. Am., 120, p.1474-1492.  [1]
• Zwicker, E., Flottorp, G. and Stevens, S.S. (1957). “Critical bandwidth in loudness summation.” J. Acoust. Soc. Am., 29, p.548-557. [1]
• Zwicker, E. and Terhardt, E. (1980). “Analytical expressions for critical band rate and critical bandwidth as a function of frequency.” J. Acoust. Soc. Am., 68, p.1523-1525. [1]