AIM2006ModulesNAP

From CNBH Acoustic Scale Wiki

Jump to: navigation, search
NAP: hl(dcgc)
Resonance Rate (scale) 122 AIM2006NAPhl(dcgc)-110-122.jpg AIM2006NAPhl(dcgc)-256-122.jpg
89 AIM2006NAPhl(dcgc)-110-89.jpg AIM2006NAPhl(dcgc)-256-89.jpg
110 256
Pulse rate (pitch)
Figure 8a. Neural Activity Patterns for the four example vowels produced by the dynamic compressive gammachirp filterbank and half-wave rectification (main panel in each subfigure). The tonotopic profile in the right-hand panel of each subfigure shows the average of the neural activity over time; this representation is often referred to as an excitation pattern. The subfigures are presented in the same format as in Figure 3. These plots are generated by choosing gm2002 in the PCP column and dcgc in the BMM column; the NAP column defaults to hl.

The BMM is converted into a simulation of the neural activity pattern (NAP) observed in the auditory nerve using one of the ‘neural transduction’ modules in the NAP column (Patterson et al., 1995).

The default NAP depends upon the choice of filterbank. For the default filterbank, dcgc, it is hl, and for the traditional filterbank, gt, it is hcl.

There are four NAP modules available:

  1. hl: Half-wave rectification and lowpass filtering (for the dcgc filterbank),
  2. hcl: Half-wave rectification, logarithmic compression and lowpass filtering (for the gt filterbank),
  3. 2dat: Two-dimensional adaptive threshold (Patterson & Holdsworth, 1996),
  4. none: No neural transduction.

Background

The default module for gt filterbank is hcl which consists of three sequential operations: half-wave rectification, compression and low-pass filtering.

The half-wave rectification makes the response to the BMM uni-polar like the response of the hair cell, while keeping it phase-locked to the peaks in the wave. Experiments on pitch perception indicate that the fine structure is required to predict the pitch shift of the residue (Yost, Patterson, & Sheft, 1998). Other rectification algorithms like squaring, full-wave rectification and the Hilbert transform only preserve the envelope.

The compression is intended to simulate the cochlear compression that is absent in the gammatone filters; the compression reduces the slope of the input/output function. Compression is essential to cope with the large dynamic range of natural sounds. It does, however, reduce the contrast of local features such as formants. This is evident in the output of dcgc filterbank which has internal compression, and also in the output of the gt/hcl combination where the compression follows the filterbank. The default form of compression is square root since that appears to be close to what the auditory system applies. Aim2006 also offers logarithmic compression to be compatible with earlier versions of AIM, and to support speech recognition systems that benefit from the level-independence that this imparts to the NAP.

The low-pass filtering simulates the progressive loss of phase-locking as frequency increases above 1200 Hz. The default version of hcl applies a four stage low-pass filter which completely removes phase locking by about 5 kHz.

Hcl is widely used because it is simple, but it lacks adaptation and as such is not realistic. A more sophisticated solution is provided by the 2dat module (2-dimensional adaptive-thresholding) (Patterson & Holdsworth, 1996). It applies global compression like hcl but then it restores local contrast about the larger features with adaptation, both in time and in frequency. When using compressive auditory filters like the dcGC there is no need for compression in the NAP stage, and therefore the default NAP for the dcgc filter option is hl.

Figure 8a shows the NAPs for the four example vowels with the dcgc/hl combination of options. The tonotopic profile on the right of the NAP shows the sum of the activity in the window across time; this representation is often referred to as an excitation pattern (Glasberg & Moore, 1990). The formants can be seen as local maxima in this profile. A comparison of the NAPs in Figure 8a with the BMMs in Figure 7a shows that the representation of the higher formants is actually sharper than in the BMM. Figure 8b shows the NAPs for the four example vowels with the gt/hcl combination of options. A comparison of the NAPs in Figure 8b with the BMMs in Figure 7b shows that compression reduces the contrast of the representation around the formants.

Personal tools
Namespaces
Variants
Views
Actions
Navigation