<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/css" href="http://www.acousticscale.org/wiki/skins/common/feed.css?164"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
	<channel>
		<title>CNBH Acoustic Scale Wiki  - Recent changes [en]</title>
		<link>http://www.acousticscale.org/wiki/index.php/Special:RecentChanges</link>
		<description>Track the most recent changes to the wiki in this feed.</description>
		<language>en</language>
		<generator>MediaWiki 1.13.4</generator>
		<lastBuildDate>Wed, 08 Sep 2010 20:09:30 GMT</lastBuildDate>
		<item>
			<title>Auditory-Based Processing of Communication Sounds/Compressive Auditory Filtering</title>
			<link>http://www.acousticscale.org/wiki/index.php?title=Auditory-Based_Processing_of_Communication_Sounds/Compressive_Auditory_Filtering&amp;diff=5223&amp;oldid=prev</link>
			<description>&lt;p&gt;&lt;span class=&quot;autocomment&quot;&gt;Results&lt;/span&gt;&lt;/p&gt;

			&lt;table style=&quot;background-color: white; color:black;&quot;&gt;
			&lt;col class='diff-marker' /&gt;
			&lt;col class='diff-content' /&gt;
			&lt;col class='diff-marker' /&gt;
			&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;←Older revision&lt;/td&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 17:25, 8 September 2010&lt;/td&gt;
			&lt;/tr&gt;
		&lt;tr&gt;&lt;td colspan='4' align='center' class='diff-multi'&gt;(4 intermediate revisions not shown.)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 10:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 10:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;'Zooming in' to look at another level of detail, in Chapter 3 the process of strobed temporal integration was investigated and placed on firmer theoretical ground. In doing this, it was hypothesised that the auditory images generated using strobed temporal integration should be more robust to noise than features generated from more simple, purely spectral models. This hypothesis was investigated in Chapter 4 by adapting the auditory features developed in Chapter 2.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;'Zooming in' to look at another level of detail, in Chapter 3 the process of strobed temporal integration was investigated and placed on firmer theoretical ground. In doing this, it was hypothesised that the auditory images generated using strobed temporal integration should be more robust to noise than features generated from more simple, purely spectral models. This hypothesis was investigated in Chapter 4 by adapting the auditory features developed in Chapter 2.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Having modelled &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;an &lt;/del&gt;observed large-scale behaviour of the auditory system, and then developed a particular model of the post-cochlear neural processing, &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;in &lt;/del&gt;this chapter &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;I &lt;/del&gt;'&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;zoom &lt;/del&gt;in' again to look in &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;a &lt;/del&gt;finer &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;level of &lt;/del&gt;detail at the behaviour of the cochlea, and in particular its response at very short timescales. The cochlea is perhaps one of the more well-understood components of the auditory system, and there is a wealth of data on the spectral shape of the human auditory filter, and the fine-timing properties of the mammalian cochlea.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Having modelled &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;and &lt;/ins&gt;observed &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;the &lt;/ins&gt;large-scale behaviour of the auditory system, and then developed a particular model of the post-cochlear neural processing, this chapter '&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;zooms &lt;/ins&gt;in' again to look in finer detail at the behaviour of the cochlea, and in particular its response at very short timescales. The cochlea is perhaps one of the more well-understood components of the auditory system, and there is a wealth of data on the spectral shape of the human auditory filter, and the fine-timing properties of the mammalian cochlea.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The dynamic range of audio signals is &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;many &lt;/del&gt;orders of magnitude larger than the dynamic range available to encode those signals in the auditory nerve. This means that the auditory system has to perform some sort of compression on the incoming signal in order to represent it effectively with a neural code. It is perhaps surprising to find that the auditory system &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;has &lt;/del&gt;this compression within the auditory filter itself&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;, using &lt;/del&gt;mechanical feedback from the outer hair cells (OHCs) to dynamically modify the motion of the basilar membrane, and so the signal encoded by the inner hair cells. One important advantage of this approach is that it makes it possible to perform the dynamic range compression with an extremely fast time-constant. The auditory filter is able to compress the peaks of the waveform within a single cycle, and leave the zero-crossings unchanged. &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The dynamic range of audio signals is orders of magnitude larger than the dynamic range available to encode those signals in the auditory nerve. This means that the auditory system has to perform some sort of compression on the incoming signal in order to represent it effectively with a neural code. It is perhaps surprising to find that the auditory system &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;performs &lt;/ins&gt;this compression within the auditory filter itself&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;; it uses &lt;/ins&gt;mechanical feedback from the outer hair cells (OHCs) to dynamically modify the motion of the basilar membrane, and so the signal encoded by the inner hair cells. One important advantage of this approach is that it makes it possible to perform the dynamic range compression with an extremely fast time-constant. The auditory filter is able to compress the peaks of the waveform within a single cycle, and leave the zero-crossings &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;effectively &lt;/ins&gt;unchanged. &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Models of auditory filtering attempt to describe mathematically the processing performed by the cochlea on an incoming sound, which ultimately leads to a neural response. They are a mathematical abstraction of the response of the complex physiological systems in the cochlea to stimulation by an incoming pressure wave. There exist a number of excellent descriptions of various parts of the history of these models, for example &amp;lt;citet r=&amp;quot;lyon:1996&amp;quot;/&amp;gt; and &amp;lt;citet r=&amp;quot;patterson:2003&amp;quot;/&amp;gt;. The introduction to this chapter briefly covers the major points of the various models, and introduces a set of increasingly more complex criteria that an auditory model must fulfil in order to accurately model the human auditory system.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Models of auditory filtering attempt to describe mathematically the processing performed by the cochlea on an incoming sound, which ultimately leads to a neural response. They are a mathematical abstraction of the response of the complex physiological systems in the cochlea to stimulation by an incoming pressure wave. There exist a number of excellent descriptions of various parts of the history of these models, for example &amp;lt;citet r=&amp;quot;lyon:1996&amp;quot;/&amp;gt; and &amp;lt;citet r=&amp;quot;patterson:2003&amp;quot;/&amp;gt;. The introduction to this chapter briefly covers the major points of the various models, and introduces a set of increasingly more complex criteria that an auditory model must fulfil in order to accurately model the human auditory system.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 188:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 188:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;===== Comparison to human pitch perception =====&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;===== Comparison to human pitch perception =====&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;In order to bear any relevance to human pitch perception, it is also necessary that this &lt;/del&gt;normalised pitch strength &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;measure does a reasonable job of modelling human perception&lt;/del&gt;. &amp;lt;citet r='Patterson:1996a'/&amp;gt; performed a series of experiments on the human perception of IRN by comparing the pitch strength of IRN stimuli and tonal stimuli masked with noise &amp;lt;citep r='Yost:1996, Yost:1998, Patterson:2000, Handel:2000'/&amp;gt;. Subjects compared IRN with different numbers of iterations to a tonal stimulus (256-iteration IRN) masked with noise. Subjects were asked to select the stimulus with the stronger pitch strength as the SNR of the noise-masked tonal stimulus was changed. The pitch strength measure described above was used to model the data of &amp;lt;citet r='Patterson:1996a'/&amp;gt;. The original results from the perceptual &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;experiments &lt;/del&gt;are plotted in &amp;lt;figureRef name='pat_yost_predictions '/&amp;gt;, and the results using the current measure &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;and &lt;/del&gt;plotted in &amp;lt;figureRef name='irn_ps_model_results'/&amp;gt;. [[Image:Pat Yost Predictions.png|thumb|400px|right|&amp;lt;figure name='pat_yost_predictions'/&amp;gt; --- Pitch strength predictions for the perception of IRN in noise, from &amp;lt;citet r='patterson:1996a'/&amp;gt;.]] [[Image:Pzfc ps 800Hz plot prop.eps|thumb|400px|right|&amp;lt;figure name='irn_ps_model_results'/&amp;gt; --- Predicted pitch strengths from the normalised pitch strength measure used in the experiments in this section. The results follow the same pattern as those reported in &amp;lt;citet r='patterson:1996a'/&amp;gt;, but the measure is slightly noisier. ]]&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;. &lt;/del&gt;The &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;pitch strength &lt;/del&gt;predictions made by the normalized pitch strength measure have the same form as the perceptual results reported in &amp;lt;citet r='patterson:1996a'/&amp;gt;. The curves predicted by this model are a little noisier than those from the model of &amp;lt;citet r='patterson:1996a'/&amp;gt;, but the model has the advantage of being normalised &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;so that &lt;/del&gt;it &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;is easy &lt;/del&gt;to compare &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;the &lt;/del&gt;results across &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;different filterbank, which may have differing &lt;/del&gt;output levels.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;The &lt;/ins&gt;normalised pitch&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;-&lt;/ins&gt;strength &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;measured for IRN stimuli was compared with the pitch strength measured for IRN in perceptual experiments&lt;/ins&gt;. &amp;lt;citet r='Patterson:1996a'/&amp;gt; performed a series of experiments on the human perception of IRN by comparing the pitch strength of IRN stimuli and tonal stimuli masked with noise &amp;lt;citep r='Yost:1996, Yost:1998, Patterson:2000, Handel:2000'/&amp;gt;. Subjects compared IRN with different numbers of iterations to a tonal stimulus (256-iteration IRN) masked with noise. Subjects were asked to select the stimulus with the stronger pitch strength as the SNR of the noise-masked tonal stimulus was changed. The pitch strength measure described above was used to model the data of &amp;lt;citet r='Patterson:1996a'/&amp;gt;. The original results from the perceptual &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;experiment &lt;/ins&gt;are plotted in &amp;lt;figureRef name='pat_yost_predictions '/&amp;gt;, and the results using the current measure &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;are &lt;/ins&gt;plotted in &amp;lt;figureRef name='irn_ps_model_results'/&amp;gt;. [[Image:Pat Yost Predictions.png|thumb|400px|right|&amp;lt;figure name='pat_yost_predictions'/&amp;gt; --- Pitch strength predictions for the perception of IRN in noise, from &amp;lt;citet r='patterson:1996a'/&amp;gt;.]] [[Image:Pzfc ps 800Hz plot prop.eps|thumb|400px|right|&amp;lt;figure name='irn_ps_model_results'/&amp;gt; --- Predicted pitch strengths from the normalised pitch strength measure used in the experiments in this section. The results follow the same pattern as those reported in &amp;lt;citet r='patterson:1996a'/&amp;gt;, but the measure is slightly noisier. ]] The predictions made by the normalized pitch strength measure have the same form as the perceptual results reported in &amp;lt;citet r='patterson:1996a'/&amp;gt;. The curves predicted by this model are a little noisier than those from the model of &amp;lt;citet r='patterson:1996a'/&amp;gt;, but the model has the advantage of being normalised &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;making &lt;/ins&gt;it &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;easier &lt;/ins&gt;to compare results across &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;filterbanks when their &lt;/ins&gt;output levels &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;differ&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;====Experiments====&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;====Experiments====&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 219:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 219:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|}&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|}&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;This suggests &lt;/del&gt;that the processing performed by the dcGC is fundamentally different to that performed by the PZFC for these stimuli. Since IRN is inherently noisy, I now move to using an alternative stimulus: a harmonic complex. This stimulus is used to test the effect of modifying various parameters related to the automatic gain control in the PZFC.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;The results suggest &lt;/ins&gt;that the processing performed by the dcGC is fundamentally different to that performed by the PZFC for these stimuli. Since IRN is inherently noisy, I now move to using an alternative stimulus: a harmonic complex. This stimulus is used to test the effect of modifying various parameters related to the automatic gain control in the PZFC.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;===== Harmonic complexes =====&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;===== Harmonic complexes =====&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;!-- diff generator: internal 2010-09-08 20:09:31 --&gt;
&lt;/table&gt;</description>
			<pubDate>Wed, 08 Sep 2010 17:25:51 GMT</pubDate>			<dc:creator>Rdp1</dc:creator>			<comments>http://www.acousticscale.org/wiki/index.php/Talk:Auditory-Based_Processing_of_Communication_Sounds/Compressive_Auditory_Filtering</comments>		</item>
		<item>
			<title>Auditory-Based Processing of Communication Sounds/Scale-shift Invariant Auditory Features</title>
			<link>http://www.acousticscale.org/wiki/index.php?title=Auditory-Based_Processing_of_Communication_Sounds/Scale-shift_Invariant_Auditory_Features&amp;diff=5218&amp;oldid=prev</link>
			<description>&lt;p&gt;&lt;span class=&quot;autocomment&quot;&gt;Results:&amp;#32;&lt;/span&gt; &lt;/p&gt;

			&lt;table style=&quot;background-color: white; color:black;&quot;&gt;
			&lt;col class='diff-marker' /&gt;
			&lt;col class='diff-content' /&gt;
			&lt;col class='diff-marker' /&gt;
			&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;←Older revision&lt;/td&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 07:08, 7 September 2010&lt;/td&gt;
			&lt;/tr&gt;
		&lt;tr&gt;&lt;td colspan='4' align='center' class='diff-multi'&gt;(One intermediate revision not shown.)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 100:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 100:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Performance of the HMM-based recogniser on the two sets of features was markedly different, leading to overall differences in recognition rate which were in general much larger than those due to the changes in the HMM parameters. Optimum performance, of 72.3% syllable accuracy on the MFCC features, was achieved with an HMM with 2 emitting states and with 4 components in the output distribution. This performance was achieved after 15 iterations of the training algorithm.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Performance of the HMM-based recogniser on the two sets of features was markedly different, leading to overall differences in recognition rate which were in general much larger than those due to the changes in the HMM parameters. Optimum performance, of 72.3% syllable accuracy on the MFCC features, was achieved with an HMM with 2 emitting states and with 4 components in the output distribution. This performance was achieved after 15 iterations of the training algorithm.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;For the AIM features, the optimum recognition performance was 93.2% across all syllables using an HMM with 2 emitting states, 6 output distribution components and after 10 training iterations.&amp;nbsp; However, &amp;lt;figureRef name='mfcc_vs_nap_performance'/&amp;gt; shows the overall syllable recognition rate for the MFCCs and AIM features respectively as a function of the various HMM parameters. For the MFCCs, the performance for all HMM parameters is between 64% and 72.3%. For the same set of parameters with the AIM features, performance is between 78% and 93.2%. Performance is consistently better for the AIM features across all tested HMM configurations. (In addition to the HMM configurations shown in this graph, odd-numbered values of the HMM parameters were also tried but, for simplicity of representation, they are not plotted here. These results cluster in the same way.) [[Image:Nap vs mfcc performance.pdf|thumb|300px|right|&amp;lt;figure name='mfcc_vs_nap_performance'/&amp;gt; --- Performance of HMMs with varying parameters, trained using AIM features (black) and MFCCs (red). The vertical axis shows overall recognition performance. The horizontal axis shows the number of HMM training iterations. Solid lines denote HMMs with 2 emitting states, dashed lines 4 emitting states and dotted lines 8 emitting states. Stars show performance with 2 Gaussian components in the output distribution, circles are for &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;four &lt;/del&gt;components and plus symbols show performance with 6 components).]]&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;For the AIM features, the optimum recognition performance was 93.2% across all syllables using an HMM with 2 emitting states, 6 output distribution components and after 10 training iterations.&amp;nbsp; However, &amp;lt;figureRef name='mfcc_vs_nap_performance'/&amp;gt; shows the overall syllable recognition rate for the MFCCs and AIM features respectively as a function of the various HMM parameters. For the MFCCs, the performance for all HMM parameters is between 64% and 72.3%. For the same set of parameters with the AIM features, performance is between 78% and 93.2%. Performance is consistently better for the AIM features across all tested HMM configurations. (In addition to the HMM configurations shown in this graph, odd-numbered values of the HMM parameters were also tried but, for simplicity of representation, they are not plotted here. These results cluster in the same way.) [[Image:Nap vs mfcc performance.pdf|thumb|300px|right|&amp;lt;figure name='mfcc_vs_nap_performance'/&amp;gt; --- Performance of HMMs with varying parameters, trained using AIM features (black) and MFCCs (red). The vertical axis shows overall recognition performance. The horizontal axis shows the number of HMM training iterations. Solid lines denote HMMs with 2 emitting states, dashed lines 4 emitting states and dotted lines 8 emitting states. Stars show performance with 2 Gaussian components in the output distribution, circles are for &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;4 &lt;/ins&gt;components and plus symbols show performance with 6 components).]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;nbsp; &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;nbsp; &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;For the AIM features, 92.8% accuracy was achieved with the 2 emitting state, 4 component HMM model after 15 training iterations. Since this performance is almost at the ceiling for the AIM features, and the same HMM parameters lead to optimal performance for the MFCC features, all further comparisons are made using these parameters. This allows for a direct comparison of the features using an otherwise identical recognition system.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;For the AIM features, 92.8% accuracy was achieved with the 2 emitting state, 4 component HMM model after 15 training iterations. Since this performance is almost at the ceiling for the AIM features, and the same HMM parameters lead to optimal performance for the MFCC features, all further comparisons are made using these parameters. This allows for a direct comparison of the features using an otherwise identical recognition system.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 110:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 110:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;unlike the MFCC recogniser, performance along most of the spokes is near ceiling after optimisation. The worst performance, for the speaker with the longest VTL, was 66.5%, which compares with 3.8% in the MFCC case. There is a drop in performance at the extremes of VTL, although the drop is small in comparison to that seen in the MFCC case.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;unlike the MFCC recogniser, performance along most of the spokes is near ceiling after optimisation. The worst performance, for the speaker with the longest VTL, was 66.5%, which compares with 3.8% in the MFCC case. There is a drop in performance at the extremes of VTL, although the drop is small in comparison to that seen in the MFCC case.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;==== Comparison with standard VTLN ====&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;==== Comparison with standard VTLN ====&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 154:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 151:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;==== Conclusions ====&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;==== Conclusions ====&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;The scale&lt;/del&gt;-shift invariant Gaussian features derived from AIM spectral profiles proved &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;extremely &lt;/del&gt;effective for a small-scale size-invariant syllable recognition task. This result demonstrates that size-invariant features are useful in allowing recognition systems to generalise over a range of speakers. The &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;fact that the AIM &lt;/del&gt;features consistently outperform &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;the &lt;/del&gt;MFCC features &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;on &lt;/del&gt;the &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;syllable &lt;/del&gt;recognition &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;task shows &lt;/del&gt;that the &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;exact &lt;/del&gt;parameters of the HMM &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;have &lt;/del&gt;only a &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;secondary &lt;/del&gt;effect &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;to &lt;/del&gt;the feature &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;change&lt;/del&gt;.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Scale&lt;/ins&gt;-shift invariant Gaussian features derived from AIM spectral profiles &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;have &lt;/ins&gt;proved effective for a small-scale size-invariant syllable recognition task. This result demonstrates that size-invariant features are useful in allowing recognition systems to generalise over a range of speakers. The &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;scale-shift invariant &lt;/ins&gt;features &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;derived from AIM &lt;/ins&gt;consistently outperform &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;standard &lt;/ins&gt;MFCC features&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;. This ability to outperform the MFCCs appears to be an intrinsic property of the features themselves, rather than an effect of &lt;/ins&gt;the recognition &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;system being used, evidenced by the observation &lt;/ins&gt;that &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;changing &lt;/ins&gt;the parameters of the HMM &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;recognition system has &lt;/ins&gt;only a &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;minor &lt;/ins&gt;effect &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;compared with that of changing &lt;/ins&gt;the feature &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;itself.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;However, when optimal vocal tract length normalisation is performed on the MFCCs, their performance is extremely good - becoming better than the performance with the scale shift invariant features. Such performance could be achieved in practice if the recognition system was able to correctly identify the correct scaling factor for every speaker. With current speech recognition systems, performing vocal tract length normalisation involves pre-calculating features for a range of scalings of the input sound &amp;lt;citep r=&amp;quot;welling:2002&amp;quot;/&amp;gt; which adds complexity to the recognition system. As they stand, the scale-shit invariant AIM features are approximately 30 times slower to compute than MFCC features with vocal tract length normalisation. It is not clear how many discrete vocal tract warping values would have to be used to generate the full range of inputs for a vocal tract length normalising speech recognition system, but it is likely to be of the same order of magnitude, so for short segments of speech, where it is not possible to TODO (put this back-of-the-envelope calculation on firmer ground, and put it in an earlier section)&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;It is interesting to note that the performance of the HMM-based recogniser with the MFCCs is considerably improved when the system is trained on a wide range of speakers - suggesting that the utility of VTL-invariant features may be limited to the case where there is only data from a single training speaker. &lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;When the frequency axis is warped optimally, it is possible to achieve performance which exceeds that of the AIM features when using the MFCC features. However, such performance could only be achieved in practice if the recognition system was able to correctly identify the correct scaling factor for every speaker. With current systems this would involve pre-calculating features for a range of scalings of the input sound &amp;lt;citep r=&amp;quot;welling:2002&amp;quot;/&amp;gt;. The&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;There is still some sensitivity to change in VTL in the scale-shift invariant feature vectors. Since it affects only the extreme VTL conditions, it seems likely that it is due to edge effects at the Gaussian fitting stage. That is, when a formant occurs near the edge of the spectrum, the tail of the Gaussian used to fit the formant prevents it from shifting sufficiently to centre the Gaussian on the formant. If this proves to be the reason, it suggests that performance is not limited by the underlying auditory representation but rather by a limitation in the feature extraction process. &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;There is still some sensitivity to change in VTL in the scale-shift invariant feature vectors. Since it affects only the extreme VTL conditions, it seems likely that it is due to edge effects at the Gaussian fitting stage. That is, when a formant occurs near the edge of the spectrum, the tail of the Gaussian used to fit the formant prevents it from shifting sufficiently to centre the Gaussian on the formant. If this proves to be the reason, it suggests that performance is not limited by the underlying auditory representation but rather by a limitation in the feature extraction process. &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;It is interesting to note that the performance of the HMM-based recogniser with the MFCCs is considerably improved when the system is trained on a wide range of speakers - suggesting that the utility of VTL-invariant features may be limited to the case where there is only data from a single training speaker. &lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Based on this small-scale set of recognition experiments, there are many possibilities for future work. With the AIM-C software that I developed and the processing pipeline that Jess Monaghan and I worked on, it is now possible to easily create features from large databases of sounds. For these recognition experiments, the Amazon EC2 cloud computing service was used to generate the features quickly, as it is simple to parallelise the task over many compute cores.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Based on this small-scale set of recognition experiments, there are many possibilities for future work. With the AIM-C software that I developed and the processing pipeline that Jess Monaghan and I worked on, it is now possible to easily create features from large databases of sounds. For these recognition experiments, the Amazon EC2 cloud computing service was used to generate the features quickly, as it is simple to parallelise the task over many compute cores.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;The simplest addition to the system would be to test the Gaussian features with other filterbanks, including compressive filters. I have carried out some preliminary studies in this area, and with some modification to the Gaussian fitting procedure, similar features can be calculated using the dcGC and PZFC filterbanks. Beyond the simple database of scaled syllables, there is potential to extend the use of AIM Gaussian features to more general speech recognition tasks, such as the TIMIT database.&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The features used in the study described above are computed simply from the output of the NAP stage of AIM. In the following chapter, the process of strobed temporal integration is investigated, and In Chapter 4, the feature generation system is extended to create features from &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;the &lt;/del&gt;spectral &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;profile &lt;/del&gt;of the AIM stabilised auditory image&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;. By using the SAI, the features generated are found to be more robust to interfering noise&lt;/del&gt;.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The features used in the study described above are computed simply from the output of the NAP stage of AIM. In the following chapter, the process of strobed temporal integration is investigated, and In Chapter 4, the feature generation system is extended to create features from &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;various &lt;/ins&gt;spectral &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;profiles &lt;/ins&gt;of the AIM stabilised auditory image.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;=== Notes ===&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;=== Notes ===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;!-- diff generator: internal 2010-09-08 20:09:31 --&gt;
&lt;/table&gt;</description>
			<pubDate>Tue, 07 Sep 2010 07:08:50 GMT</pubDate>			<dc:creator>Tcw24</dc:creator>			<comments>http://www.acousticscale.org/wiki/index.php/Talk:Auditory-Based_Processing_of_Communication_Sounds/Scale-shift_Invariant_Auditory_Features</comments>		</item>
		<item>
			<title>Auditory-Based Processing of Communication Sounds/Strobes and Stabilised Auditory Images</title>
			<link>http://www.acousticscale.org/wiki/index.php?title=Auditory-Based_Processing_of_Communication_Sounds/Strobes_and_Stabilised_Auditory_Images&amp;diff=5216&amp;oldid=prev</link>
			<description>&lt;p&gt;&lt;/p&gt;

			&lt;table style=&quot;background-color: white; color:black;&quot;&gt;
			&lt;col class='diff-marker' /&gt;
			&lt;col class='diff-content' /&gt;
			&lt;col class='diff-marker' /&gt;
			&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;←Older revision&lt;/td&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 06:42, 7 September 2010&lt;/td&gt;
			&lt;/tr&gt;
		&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 5:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 5:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;{{Underconstruction}}&amp;lt;/noinclude&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;{{Underconstruction}}&amp;lt;/noinclude&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;The last chapter introduced a feature representation based on the smoothed output from a simulation of the cochlea and a simple hair-cell model. The signal was temporally averaged over a short window by means of a low-pass filter. Strobed temporal integration, leading to a stabilised auditory image (SAI), is an alternative, more complex, system for processing the signal leaving the cochlea.&amp;nbsp; A set of strobe points are identified in each channel of the filterbank output, and these strobe points act as triggers for a temporal integration process in which shifted copies of the signal are overlaid on one another. &lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;Since strobe points tend to occur at or near the pulses in a pulse-resonance sound, representing a signal containing a pulse-resonance sound as an SAI will tend to accentuate the periodic pulse-resonance signal relative to any background noise. Noise-robustness is an extremely useful property in any machine hearing system, and it would be beneficial if it were possible to make use of some of the inherent noise-robustness in the SAI. In order to do this, it is first necessary to understand the optimise the process by which strobe points are identified in a signal. &lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;In this chapter, various existing systems for performing strobed temporal integration are assessed and compared, and a new model is proposed based on a simple criterion for optimal strobe generation. The goal is to create a good stabilised auditory image which can be used as the basis for a noise-robust machine hearing system.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;=== Strobe-finding in AIM ===&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;=== Strobe-finding in AIM ===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 295:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 301:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| || 0.94 || 0.90 || 1.01 || 0.91&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| || 0.94 || 0.90 || 1.01 || 0.91&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;|-&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;! ''lyon''&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;! ''lyon''&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| 0.0% || 0.0% || 0.1% || 0.1%&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| 0.0% || 0.0% || 0.1% || 0.1%&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| || 4.23 || 4.23 || 4.23 || 4.23&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| || 4.23 || 4.23 || 4.23 || 4.23&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;|-&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|}&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|}&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 342:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 350:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| || 0.94 || 0.92 || 1.01 || 0.91&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| || 0.94 || 0.92 || 1.01 || 0.91&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;|-&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;! ''lyon''&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;! ''lyon''&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| 5.5% || 5.1% || 5.4% || 5.5%&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| 5.5% || 5.1% || 5.4% || 5.5%&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| || 3.99 || 3.99 || 3.99 || 3.99&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| || 3.99 || 3.99 || 3.99 || 3.99&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;|-&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|}&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|}&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 389:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 399:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| || 1.24 || 0.91 || 1.10 || 0.94&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| || 1.24 || 0.91 || 1.10 || 0.94&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;|-&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;! ''lyon''&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;! ''lyon''&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| 0.0% || 0.0% || 0.1% || 0.1%&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| 0.0% || 0.0% || 0.1% || 0.1%&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| || 4.23 || 4.23 || 4.23 ||4.23&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;| || 4.23 || 4.23 || 4.23 ||4.23&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;|-&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|}&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|}&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 460:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 472:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;An alternative strobe detection system was also introduced that compares the NAP to the impulse response of the filterbank at every time step. While this system is extremely effective at 'back-projecting' to find the original strobe time accurately, it is not at all computationally efficient and so cannot be used in a complete machine hearing system. Further work would also be required in tuning the parameters of this system in order to make it work correctly with the PZFC filterbank.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;An alternative strobe detection system was also introduced that compares the NAP to the impulse response of the filterbank at every time step. While this system is extremely effective at 'back-projecting' to find the original strobe time accurately, it is not at all computationally efficient and so cannot be used in a complete machine hearing system. Further work would also be required in tuning the parameters of this system in order to make it work correctly with the PZFC filterbank.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;Having investigated the properties of strobed temporal integration in the auditory image model, in &lt;/del&gt;the next chapter, the &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;original syllable recognition system described &lt;/del&gt;in Chapter 2 &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;is extended to make use of features &lt;/del&gt;computed from the stabilized auditory image&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;. The hypothesis is that&lt;/del&gt;, &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;due &lt;/del&gt;to the strobed temporal integration process, these features will be more noise-robust than those generated from the NAP alone. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt; &lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;In &lt;/ins&gt;the next chapter, the &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;features developed &lt;/ins&gt;in Chapter 2 &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;are &lt;/ins&gt;computed from the stabilized auditory image, &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;rather than from a simple filterbank output. Due &lt;/ins&gt;to the strobed temporal integration process, &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;we hypothesise &lt;/ins&gt;these features will be more noise-robust than those generated from the NAP alone. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;The syllable recognition system developed in chapter 2 provides an excellent test-bed for the alternative features.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;noinclude&amp;gt;&amp;lt;bibliography t=&amp;quot;== Bibliography ==&amp;quot;/&amp;gt;&amp;lt;/noinclude&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;noinclude&amp;gt;&amp;lt;bibliography t=&amp;quot;== Bibliography ==&amp;quot;/&amp;gt;&amp;lt;/noinclude&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;!-- diff generator: internal 2010-09-08 20:09:31 --&gt;
&lt;/table&gt;</description>
			<pubDate>Tue, 07 Sep 2010 06:42:36 GMT</pubDate>			<dc:creator>Tcw24</dc:creator>			<comments>http://www.acousticscale.org/wiki/index.php/Talk:Auditory-Based_Processing_of_Communication_Sounds/Strobes_and_Stabilised_Auditory_Images</comments>		</item>
		<item>
			<title>Auditory-Based Processing of Communication Sounds/Compressive Auditory Filtering</title>
			<link>http://www.acousticscale.org/wiki/index.php?title=Auditory-Based_Processing_of_Communication_Sounds/Compressive_Auditory_Filtering&amp;diff=5215&amp;oldid=prev</link>
			<description>&lt;p&gt;&lt;span class=&quot;autocomment&quot;&gt;Comparison to human pitch perception:&amp;#32;&lt;/span&gt; &lt;/p&gt;

			&lt;table style=&quot;background-color: white; color:black;&quot;&gt;
			&lt;col class='diff-marker' /&gt;
			&lt;col class='diff-content' /&gt;
			&lt;col class='diff-marker' /&gt;
			&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;←Older revision&lt;/td&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 06:12, 7 September 2010&lt;/td&gt;
			&lt;/tr&gt;
		&lt;tr&gt;&lt;td colspan='4' align='center' class='diff-multi'&gt;(One intermediate revision not shown.)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 10:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 10:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;'Zooming in' to look at another level of detail, in Chapter 3 the process of strobed temporal integration was investigated and placed on firmer theoretical ground. In doing this, it was hypothesised that the auditory images generated using strobed temporal integration should be more robust to noise than features generated from more simple, purely spectral models. This hypothesis was investigated in Chapter 4 by adapting the auditory features developed in Chapter 2.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;'Zooming in' to look at another level of detail, in Chapter 3 the process of strobed temporal integration was investigated and placed on firmer theoretical ground. In doing this, it was hypothesised that the auditory images generated using strobed temporal integration should be more robust to noise than features generated from more simple, purely spectral models. This hypothesis was investigated in Chapter 4 by adapting the auditory features developed in Chapter 2.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Having modelled an observed large-scale behaviour of the auditory system, and then developed a particular model of the post-cochlear neural processing, in this chapter I 'zoom in' again to look in a finer level of detail at the behaviour of the cochlea, and in particular its response at very short timescales. The cochlea is perhaps one of the more well-understood components of the auditory system, and there is a wealth of data on the spectral shape of the human auditory filter, and the fine-timing properties of the mammalian cochlea. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;If we are to accurately model the human auditory system, it is clear that&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Having modelled an observed large-scale behaviour of the auditory system, and then developed a particular model of the post-cochlear neural processing, in this chapter I 'zoom in' again to look in a finer level of detail at the behaviour of the cochlea, and in particular its response at very short timescales. The cochlea is perhaps one of the more well-understood components of the auditory system, and there is a wealth of data on the spectral shape of the human auditory filter, and the fine-timing properties of the mammalian cochlea.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The dynamic range of audio signals is many orders of magnitude larger than the dynamic range available to encode those signals in the auditory nerve &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;(TODO: good reference for this - Moore's book, perhaps)&lt;/del&gt;. This means that the auditory system has to perform some sort of compression on the incoming signal in order to represent it effectively with a neural code. It is perhaps surprising to find that the auditory system has this compression within the auditory filter itself, using mechanical feedback from the outer hair cells (OHCs) to dynamically modify the motion of the basilar membrane, and so the signal encoded by the inner hair cells &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;(TODO: citation)&lt;/del&gt;. One important advantage of this approach is that it makes it possible to perform the dynamic range compression with an extremely fast time-constant. The auditory filter is able to compress the peaks of the waveform within a single cycle, and leave the zero-crossings unchanged &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;(TODO: citation)&lt;/del&gt;. &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The dynamic range of audio signals is many orders of magnitude larger than the dynamic range available to encode those signals in the auditory nerve. This means that the auditory system has to perform some sort of compression on the incoming signal in order to represent it effectively with a neural code. It is perhaps surprising to find that the auditory system has this compression within the auditory filter itself, using mechanical feedback from the outer hair cells (OHCs) to dynamically modify the motion of the basilar membrane, and so the signal encoded by the inner hair cells. One important advantage of this approach is that it makes it possible to perform the dynamic range compression with an extremely fast time-constant. The auditory filter is able to compress the peaks of the waveform within a single cycle, and leave the zero-crossings unchanged. &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Models of auditory filtering attempt to describe mathematically the processing performed by the cochlea on an incoming sound, which ultimately leads to a neural response. They are a mathematical abstraction of the response of the complex physiological systems in the cochlea to stimulation by an incoming pressure wave. There exist a number of excellent descriptions of various parts of the history of these models, for example &amp;lt;citet r=&amp;quot;lyon:1996&amp;quot;/&amp;gt; and &amp;lt;citet r=&amp;quot;patterson:2003&amp;quot;/&amp;gt;. The introduction to this chapter briefly covers the major points of the various models, and introduces a set of increasingly more complex criteria that an auditory model must fulfil in order to accurately model the human auditory system.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Models of auditory filtering attempt to describe mathematically the processing performed by the cochlea on an incoming sound, which ultimately leads to a neural response. They are a mathematical abstraction of the response of the complex physiological systems in the cochlea to stimulation by an incoming pressure wave. There exist a number of excellent descriptions of various parts of the history of these models, for example &amp;lt;citet r=&amp;quot;lyon:1996&amp;quot;/&amp;gt; and &amp;lt;citet r=&amp;quot;patterson:2003&amp;quot;/&amp;gt;. The introduction to this chapter briefly covers the major points of the various models, and introduces a set of increasingly more complex criteria that an auditory model must fulfil in order to accurately model the human auditory system.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 18:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 18:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;An important feature of the more recent models of the auditory filter is their ability to deal with dynamic compression performed by the cochlea.&amp;nbsp; In this chapter, two recent models of the auditory filter that perform dynamic compression are discussed and analysed. The models are the dynamic, compressive gammachirp (dcGC) &amp;lt;citepL r=&amp;quot;irino:2006,irino:2007&amp;quot;/&amp;gt; and the pole-zero filter cascade (PZFC) &amp;lt;citep r=&amp;quot;lyon:2010&amp;quot;/&amp;gt;. The two filter models are compared in their response to&amp;nbsp; a number of test stimuli to assess the response of the dynamic, time-varying compression that they both implement.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;An important feature of the more recent models of the auditory filter is their ability to deal with dynamic compression performed by the cochlea.&amp;nbsp; In this chapter, two recent models of the auditory filter that perform dynamic compression are discussed and analysed. The models are the dynamic, compressive gammachirp (dcGC) &amp;lt;citepL r=&amp;quot;irino:2006,irino:2007&amp;quot;/&amp;gt; and the pole-zero filter cascade (PZFC) &amp;lt;citep r=&amp;quot;lyon:2010&amp;quot;/&amp;gt;. The two filter models are compared in their response to&amp;nbsp; a number of test stimuli to assess the response of the dynamic, time-varying compression that they both implement.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The studies presented in this chapter provide some evidence that dynamic, within-cycle, compression is a feature of auditory processing which is important for correctly modelling human perception of certain stimuli. The stimuli used in this chapter are iterated rippled noise (IRN) and high-pass filtered harmonic complexes in which the fundamental and lower harmonics of the stimulus are not present. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;Both of these &lt;/del&gt;stimuli &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The studies presented in this chapter provide some evidence that dynamic, within-cycle, compression is a feature of auditory processing which is important for correctly modelling human perception of certain stimuli. The stimuli used in this chapter are iterated rippled noise (IRN) and high-pass filtered harmonic complexes in which the fundamental and lower harmonics of the stimulus are not present. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;These &lt;/ins&gt;stimuli &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;illustrate well the ability of the auditory system to process temporal regularity in a signal despite the lack of a strong fundamental harmonic, and thus provide a good test for temporal models of audition.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;The study in this &lt;/del&gt;chapter does not directly address the problem of whether compressive filtering is a crucial element for a good machine-hearing system, but rather looks at the application of compressive filtering to a particular class of stimuli. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;The detailed nature &lt;/del&gt;of &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;the &lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;This &lt;/ins&gt;chapter does not directly address the problem of whether compressive filtering is a crucial element for a good machine-hearing system, but rather looks at the application of compressive filtering to a particular class of stimuli&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;, the correct representation of which is required in a system which accurately models human auditory perception&lt;/ins&gt;. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;This is an incremental step towards understanding exactly which aspects &lt;/ins&gt;of &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;human auditory processing are necessary for effective machine hearing.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;=== A short history of models of auditory filtering ===&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;=== A short history of models of auditory filtering ===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 188:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 188:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;===== Comparison to human pitch perception =====&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;===== Comparison to human pitch perception =====&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;In order to bear any relevance to human pitch perception, it is also necessary that this normalised pitch strength measure does a reasonable job of modelling human perception. &amp;lt;citet r='Patterson:1996a'/&amp;gt; performed a series of experiments on the human perception of IRN by comparing the pitch strength of IRN stimuli and tonal stimuli masked with noise&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;. Subjects compared&amp;nbsp; TODO &lt;/del&gt;&amp;lt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;citet &lt;/del&gt;r='Yost:1996, Yost:1998, Patterson:2000, Handel:2000'/&amp;gt;. The pitch strength measure described above was used to model the data of &amp;lt;citet r='Patterson:1996a'/&amp;gt;. The original results from the perceptual experiments are plotted in &amp;lt;figureRef name='pat_yost_predictions '/&amp;gt;, and the results using the current measure and plotted in &amp;lt;figureRef name='irn_ps_model_results'/&amp;gt;. [[Image:Pat Yost Predictions.png|thumb|400px|right|&amp;lt;figure name='pat_yost_predictions'/&amp;gt; --- Pitch strength predictions for the perception of IRN in noise, from &amp;lt;citet r='patterson:1996a'/&amp;gt;.]] [[Image:Pzfc ps 800Hz plot prop.eps|thumb|400px|right|&amp;lt;figure name='irn_ps_model_results'/&amp;gt; --- Predicted pitch strengths from the normalised pitch strength measure used in the experiments in this section. The results follow the same pattern as those reported in &amp;lt;citet r='patterson:1996a'/&amp;gt;, but the measure is slightly noisier. ]]. The pitch strength predictions made by the normalized pitch strength measure have the same form as the perceptual results reported in &amp;lt;citet r='patterson:1996a'/. The curves predicted by this model are a little noisier than those from the model of &amp;lt;citet r='patterson:1996a'/&amp;gt;, but the model has the advantage of being normalised so that it is easy to compare the results across different filterbank, which may have differing output levels.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;In order to bear any relevance to human pitch perception, it is also necessary that this normalised pitch strength measure does a reasonable job of modelling human perception. &amp;lt;citet r='Patterson:1996a'/&amp;gt; performed a series of experiments on the human perception of IRN by comparing the pitch strength of IRN stimuli and tonal stimuli masked with noise &amp;lt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;citep &lt;/ins&gt;r='Yost:1996, Yost:1998, Patterson:2000, Handel:2000'/&amp;gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;. Subjects compared IRN with different numbers of iterations to a tonal stimulus (256-iteration IRN) masked with noise. Subjects were asked to select the stimulus with the stronger pitch strength as the SNR of the noise-masked tonal stimulus was changed&lt;/ins&gt;. The pitch strength measure described above was used to model the data of &amp;lt;citet r='Patterson:1996a'/&amp;gt;. The original results from the perceptual experiments are plotted in &amp;lt;figureRef name='pat_yost_predictions '/&amp;gt;, and the results using the current measure and plotted in &amp;lt;figureRef name='irn_ps_model_results'/&amp;gt;. [[Image:Pat Yost Predictions.png|thumb|400px|right|&amp;lt;figure name='pat_yost_predictions'/&amp;gt; --- Pitch strength predictions for the perception of IRN in noise, from &amp;lt;citet r='patterson:1996a'/&amp;gt;.]] [[Image:Pzfc ps 800Hz plot prop.eps|thumb|400px|right|&amp;lt;figure name='irn_ps_model_results'/&amp;gt; --- Predicted pitch strengths from the normalised pitch strength measure used in the experiments in this section. The results follow the same pattern as those reported in &amp;lt;citet r='patterson:1996a'/&amp;gt;, but the measure is slightly noisier. ]]. The pitch strength predictions made by the normalized pitch strength measure have the same form as the perceptual results reported in &amp;lt;citet r='patterson:1996a'/&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;gt;&lt;/ins&gt;. The curves predicted by this model are a little noisier than those from the model of &amp;lt;citet r='patterson:1996a'/&amp;gt;, but the model has the advantage of being normalised so that it is easy to compare the results across different filterbank, which may have differing output levels.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;====Experiments====&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;====Experiments====&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 316:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 316:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;One interesting approach to this problem would be to modify the AGC of the PZFC so that its architecture was more like that of the dcGC. The AGC activity could be shifted so that the state of an AGC stage which takes input from one frequency is used to affect the PZFC filter stage at a lower frequency. This would align the PZFC AGC architecture more closely with models of&amp;nbsp; compression in the cochlea. &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;One interesting approach to this problem would be to modify the AGC of the PZFC so that its architecture was more like that of the dcGC. The AGC activity could be shifted so that the state of an AGC stage which takes input from one frequency is used to affect the PZFC filter stage at a lower frequency. This would align the PZFC AGC architecture more closely with models of&amp;nbsp; compression in the cochlea. &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The experiments presented in this chapter do not, on their own, justify the use of a compressive filterbank in a machine hearing system. However, they provide &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;some &lt;/del&gt;insight into the &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;behaviour &lt;/del&gt;of compressive filterbanks in &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The experiments presented in this chapter do not, on their own, justify the use of a compressive filterbank in a machine hearing system. However, they provide &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;an &lt;/ins&gt;insight into the &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;utility &lt;/ins&gt;of compressive filterbanks in &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;the processing of stimuli in which the temporal fine structure is important.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;A potential continuation of this work would be to use the compressive filterbanks described above in combination with the features generated from the Gaussian mixture model used in the earlier chapters of this thesis.&amp;nbsp; Preliminary experiments to this end, which directly swapped the gammatone filterbank for the PZFC filterbank, led to recognition results which were significantly worse than the results gained with the gammatone filterbank&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;. (TODO: find the ball-park percentages for this and put them in here)&lt;/del&gt;. However the exact parameters of the Gaussian fitting procedure were tuned to the output of a simple gammatone filterbank and these parameters were not modified in the initial experiments. The tuning of these parameters for use with the&amp;nbsp; PZFC and dcGC filterbanks, ideally using a complete search of the parameter space for the Gaussian fitting system, is another potential direction for future work. &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;A potential continuation of this work would be to use the compressive filterbanks described above in combination with the features generated from the Gaussian mixture model used in the earlier chapters of this thesis.&amp;nbsp; Preliminary experiments to this end, which directly swapped the gammatone filterbank for the PZFC filterbank, led to recognition results which were significantly worse than the results gained with the gammatone filterbank. However the exact parameters of the Gaussian fitting procedure were tuned to the output of a simple gammatone filterbank and these parameters were not modified in the initial experiments. The tuning of these parameters for use with the&amp;nbsp; PZFC and dcGC filterbanks, ideally using a complete search of the parameter space for the Gaussian fitting system, is another potential direction for future work. &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;While the modification of the PZFC AGC circuits, and the evaluation of compressive filterbanks in a speech recognition task are both fruitful potential avenues of further study, at this stage, the opportunity presented itself to work with auditory features at a much larger scale. The research team of Dick Lyon at Google had been investigating the use of MFCC features in a large-scale sound effects recognition task, making use of the PAMIR machine learning system which is optimized for use on large datasets. The team was working to extend the model to work with features generated from a version of the auditory image; I joined the team for an internship, working with them on the evaluation of the auditory features within the sound effects ranking task.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;While the modification of the PZFC AGC circuits, and the evaluation of compressive filterbanks in a speech recognition task are both fruitful potential avenues of further study, at this stage, the opportunity presented itself to work with auditory features at a much larger scale. The research team of Dick Lyon at Google had been investigating the use of MFCC features in a large-scale sound effects recognition task, making use of the PAMIR machine learning system which is optimized for use on large datasets. The team was working to extend the model to work with features generated from a version of the auditory image; I joined the team for an internship, working with them on the evaluation of the auditory features within the sound effects ranking task.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;!-- diff generator: internal 2010-09-08 20:09:31 --&gt;
&lt;/table&gt;</description>
			<pubDate>Tue, 07 Sep 2010 06:12:00 GMT</pubDate>			<dc:creator>Tcw24</dc:creator>			<comments>http://www.acousticscale.org/wiki/index.php/Talk:Auditory-Based_Processing_of_Communication_Sounds/Compressive_Auditory_Filtering</comments>		</item>
		<item>
			<title>Auditory-Based Processing of Communication Sounds/Scale-shift Invariant Auditory Features</title>
			<link>http://www.acousticscale.org/wiki/index.php?title=Auditory-Based_Processing_of_Communication_Sounds/Scale-shift_Invariant_Auditory_Features&amp;diff=5213&amp;oldid=prev</link>
			<description>&lt;p&gt;&lt;span class=&quot;autocomment&quot;&gt;Spectral resolution:&amp;#32;&lt;/span&gt; &lt;/p&gt;

			&lt;table style=&quot;background-color: white; color:black;&quot;&gt;
			&lt;col class='diff-marker' /&gt;
			&lt;col class='diff-content' /&gt;
			&lt;col class='diff-marker' /&gt;
			&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;←Older revision&lt;/td&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 02:42, 7 September 2010&lt;/td&gt;
			&lt;/tr&gt;
		&lt;tr&gt;&lt;td colspan='4' align='center' class='diff-multi'&gt;(One intermediate revision not shown.)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 100:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 100:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Performance of the HMM-based recogniser on the two sets of features was markedly different, leading to overall differences in recognition rate which were in general much larger than those due to the changes in the HMM parameters. Optimum performance, of 72.3% syllable accuracy on the MFCC features, was achieved with an HMM with 2 emitting states and with 4 components in the output distribution. This performance was achieved after 15 iterations of the training algorithm.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Performance of the HMM-based recogniser on the two sets of features was markedly different, leading to overall differences in recognition rate which were in general much larger than those due to the changes in the HMM parameters. Optimum performance, of 72.3% syllable accuracy on the MFCC features, was achieved with an HMM with 2 emitting states and with 4 components in the output distribution. This performance was achieved after 15 iterations of the training algorithm.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;For the AIM features, the optimum recognition performance was 93.2% across all syllables using an HMM with 2 emitting states, 6 output distribution components and after 10 training iterations.&amp;nbsp; However, &amp;lt;figureRef name='&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;mfcc_performance&lt;/del&gt;'/&amp;gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;and &amp;lt;figureRef name='aimfeat_performance'/&amp;gt; show &lt;/del&gt;the overall syllable recognition rate for the MFCCs and AIM features respectively as a function of the various HMM parameters. For the MFCCs, the performance for all HMM parameters &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;(apart from the 1-emitting-state HMM with a single Gaussian in the output distribution) clusters &lt;/del&gt;between 64% and 72.3%. For the same set of parameters with the AIM features, performance &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;clusters &lt;/del&gt;between 78% and 93.2% &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;accuracy&lt;/del&gt;. Performance is consistently better for the AIM features&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;, &lt;/del&gt;across all tested HMM configurations. &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;For the AIM features, the optimum recognition performance was 93.2% across all syllables using an HMM with 2 emitting states, 6 output distribution components and after 10 training iterations.&amp;nbsp; However, &amp;lt;figureRef name='&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;mfcc_vs_nap_performance&lt;/ins&gt;'/&amp;gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;shows &lt;/ins&gt;the overall syllable recognition rate for the MFCCs and AIM features respectively as a function of the various HMM parameters. For the MFCCs, the performance for all HMM parameters &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;is &lt;/ins&gt;between 64% and 72.3%. For the same set of parameters with the AIM features, performance &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;is &lt;/ins&gt;between 78% and 93.2%. Performance is consistently better for the AIM features across all tested HMM configurations. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;(In addition to the HMM configurations shown in this graph, odd-numbered values of the HMM parameters were also tried but, for simplicity of representation, they are not plotted here. These results cluster in the same way.) [[Image:Nap vs mfcc performance.pdf|thumb|300px|right|&amp;lt;figure name='mfcc_vs_nap_performance'/&amp;gt; --- Performance of HMMs with varying parameters, trained using AIM features (black) and MFCCs (red). The vertical axis shows overall recognition performance. The horizontal axis shows the number of HMM training iterations. Solid lines denote HMMs with 2 emitting states, dashed lines 4 emitting states and dotted lines 8 emitting states. Stars show performance with 2 Gaussian components in the output distribution, circles are for four components and plus symbols show performance with 6 components).]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt; &lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;For the AIM features, 92.8% accuracy was achieved with the 2&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;-&lt;/del&gt;emitting&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;-&lt;/del&gt;state, 4&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;-&lt;/del&gt;component HMM model after 15 training iterations. Since this performance is almost at the ceiling for the AIM features, and the same &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;model provides &lt;/del&gt;optimal performance for the MFCC features, all further comparisons are made using these &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;HMM &lt;/del&gt;parameters. This allows for a &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;more &lt;/del&gt;direct comparison of the features.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;For the AIM features, 92.8% accuracy was achieved with the 2 emitting state, 4 component HMM model after 15 training iterations. Since this performance is almost at the ceiling for the AIM features, and the same &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;HMM parameters lead to &lt;/ins&gt;optimal performance for the MFCC features, all further comparisons are made using these parameters. This allows for a direct comparison of the features &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;using an otherwise identical recognition system&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[Image:MFCC Results 2.png|thumb|300px|right|&amp;lt;figure name='mfcc_results_2'/&amp;gt; --- Recognition results for the MFCC features for a 2-emitting-sate HMM with 4 output distribution components after 15 training iterations.]] &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[Image:MFCC Results 2.png|thumb|300px|right|&amp;lt;figure name='mfcc_results_2'/&amp;gt; --- Recognition results for the MFCC features for a 2-emitting-sate HMM with 4 output distribution components after 15 training iterations.]] &amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;With these parameters, recognition performance with the MFCC feature vectors was (TODO:check numbers here) ??% correct on the training voices, as shown in &amp;lt;figureRef name='mfcc_results_2'/&amp;gt;. Performance held up well along the spokes where VTL does not vary much from that of the reference speaker. This subset of the results illustrates the standard finding that MFCCs are robust to changes in GPR, primarily because the process of extracting MFCCs eliminates most of the GPR information from the features. As VTL varies further from the training values, performance degrades rapidly, particularly on the spokes with large VTL change, where recognition falls close to a minimum of ??% for the extreme VTL values. This provides a practical demonstration of the known lack of robustness to changes in VTL associated with the lack of scale-shift covariance in MFCCs.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;With these parameters, recognition performance with the MFCC feature vectors was (TODO:check numbers here) ??% correct on the training voices, as shown in &amp;lt;figureRef name='mfcc_results_2'/&amp;gt;. Performance held up well along the spokes where VTL does not vary much from that of the reference speaker. This subset of the results illustrates the standard finding that MFCCs are robust to changes in GPR, primarily because the process of extracting MFCCs eliminates most of the GPR information from the features. As VTL varies further from the training values, performance degrades rapidly, particularly on the spokes with large VTL change, where recognition falls close to a minimum of ??% for the extreme VTL values. This provides a practical demonstration of the known lack of robustness to changes in VTL associated with the lack of scale-shift covariance in MFCCs.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Recognition performance with the auditory feature vectors was &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;??&lt;/del&gt;% &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;correct &lt;/del&gt;on the training voices. As with the MFCCs, performance remained high along the spokes associated with major changes in GPR. However, performance along all of the spokes also remains near ceiling, and only drops off significantly at the extremes of VTL. Performance at the longest VTL was &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;??&lt;/del&gt;% and performance at the shortest was &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;??&lt;/del&gt;%.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Recognition performance with the auditory feature vectors was &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;100&lt;/ins&gt;% on the training voices. As with the MFCCs, performance remained high along the spokes associated with major changes in GPR. However, performance along all of the spokes also remains near ceiling, and only drops off significantly at the extremes of VTL. Performance at the longest VTL was &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;68.1&lt;/ins&gt;% and performance at the shortest was &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;73.0&lt;/ins&gt;%.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;===== Spectral resolution =====&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;TODO (A further experiment was carried out using MFCCs produced from a 200 channel mel filterbank to check that &lt;/del&gt;the performance of the &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;recogniser was not being limited by a lack of spectral resolution&lt;/del&gt;. The performance &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;using these MFCCs was 67.7% &lt;/del&gt;for the &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;initial topology &lt;/del&gt;with &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;three emitting states and 73&lt;/del&gt;.3% &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;using &lt;/del&gt;the &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;best topology from &lt;/del&gt;the &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;previous experiments&lt;/del&gt;, &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;indicating &lt;/del&gt;that &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;26-channel resolution was not a serious limitation&lt;/del&gt;.&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;)&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;unlike &lt;/ins&gt;the &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;MFCC recogniser, &lt;/ins&gt;performance &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;along most &lt;/ins&gt;of the &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;spokes is near ceiling after optimisation&lt;/ins&gt;. The &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;worst &lt;/ins&gt;performance&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;, &lt;/ins&gt;for the &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;speaker &lt;/ins&gt;with &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;the longest VTL, was 66&lt;/ins&gt;.&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;5%, which compares with &lt;/ins&gt;3&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;.8&lt;/ins&gt;% &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;in &lt;/ins&gt;the &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;MFCC case. There is a drop in performance at &lt;/ins&gt;the &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;extremes of VTL&lt;/ins&gt;, &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;although the drop is small in comparison to &lt;/ins&gt;that &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;seen in the MFCC case&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;===== Scale-shift invariant features =====&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;As with the MFCC features, a `baseline' recogniser was first used. This was based on an HMM with a topology with three emitting states and a single Gaussian output distribution for each state. Recognition performance using this HMM topology was 84.6% over all speakers. Once again, the HMM topology was varied to find the best performance. This was found when using an HMM with two emitting states and a mixture of four Gaussians in the output distribution. After optimisation of the topology and nine iterations of the training algorithm, performance rose to 90.7%. This level of performance is well above the 73.5% achieved after similar optimisation with the MFCC feature vectors. &lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;[[Image:AIM-Features Results.png|thumb|300px|right|&amp;lt;figure name='gmm_results'/&amp;gt; --- Recognition results for the AIM GMM features using a gammatone filterbank.]]&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;Performance obtained using this topology for the individual speakers across the GPR-VTL plane, is shown in &amp;lt;figureRef name='gmm_results'/&amp;gt;. As with the MFCC recogniser, performance is best when only GPR varies. However, unlike the MFCC recogniser, performance along most of the spokes is near ceiling after optimisation. The worst performance, for the speaker with the longest VTL, was 66.5%, which compares with 3.8% in the MFCC case. There is a drop in performance at the extremes of VTL, although the drop is small in comparison to that seen in the MFCC case.&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;==== Comparison with standard VTLN ====&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;==== Comparison with standard VTLN ====&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;!-- diff generator: internal 2010-09-08 20:09:31 --&gt;
&lt;/table&gt;</description>
			<pubDate>Tue, 07 Sep 2010 02:42:16 GMT</pubDate>			<dc:creator>Tcw24</dc:creator>			<comments>http://www.acousticscale.org/wiki/index.php/Talk:Auditory-Based_Processing_of_Communication_Sounds/Scale-shift_Invariant_Auditory_Features</comments>		</item>
		<item>
			<title>Image:Nap vs mfcc performance.pdf</title>
			<link>http://www.acousticscale.org/wiki/index.php?title=Image:Nap_vs_mfcc_performance.pdf&amp;diff=0&amp;oldid=prev</link>
			<description>&lt;p&gt;uploaded &quot;[[&lt;a href=&quot;/wiki/index.php/Image:Nap_vs_mfcc_performance.pdf&quot; title=&quot;Image:Nap vs mfcc performance.pdf&quot;&gt;Image:Nap vs mfcc performance.pdf&lt;/a&gt;]]&quot; Performance of HMMs with varying parameters, trained using AIM features (black) and MFCCs (red). The vertical axis shows overall recognition performance. The horizontal axis shows the number of HMM training iterations. Solid lines denote HMMs with 2 emitt&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&lt;/div&gt;</description>
			<pubDate>Tue, 07 Sep 2010 01:29:08 GMT</pubDate>			<dc:creator>Tcw24</dc:creator>			<comments>http://www.acousticscale.org/wiki/index.php/Image_talk:Nap_vs_mfcc_performance.pdf</comments>		</item>
		<item>
			<title>Auditory-Based Processing of Communication Sounds/Compressive Auditory Filtering</title>
			<link>http://www.acousticscale.org/wiki/index.php?title=Auditory-Based_Processing_of_Communication_Sounds/Compressive_Auditory_Filtering&amp;diff=5210&amp;oldid=prev</link>
			<description>&lt;p&gt;&lt;span class=&quot;autocomment&quot;&gt;Comparison to human pitch perception:&amp;#32;&lt;/span&gt; &lt;/p&gt;
&lt;a href=&quot;http://www.acousticscale.org/wiki/index.php?title=Auditory-Based_Processing_of_Communication_Sounds/Compressive_Auditory_Filtering&amp;amp;diff=5210&amp;amp;oldid=5203&quot;&gt;Show changes&lt;/a&gt;</description>
			<pubDate>Tue, 07 Sep 2010 00:35:09 GMT</pubDate>			<dc:creator>Tcw24</dc:creator>			<comments>http://www.acousticscale.org/wiki/index.php/Talk:Auditory-Based_Processing_of_Communication_Sounds/Compressive_Auditory_Filtering</comments>		</item>
		<item>
			<title>Image:Pat Yost Human Results.png</title>
			<link>http://www.acousticscale.org/wiki/index.php?title=Image:Pat_Yost_Human_Results.png&amp;diff=0&amp;oldid=prev</link>
			<description>&lt;p&gt;uploaded &quot;[[&lt;a href=&quot;/wiki/index.php/Image:Pat_Yost_Human_Results.png&quot; title=&quot;Image:Pat Yost Human Results.png&quot;&gt;Image:Pat Yost Human Results.png&lt;/a&gt;]]&quot; Psychometric functions for the perception of the pitch strength of a tonal stimulus in noise compared with IRN with various parameters. Figure 1 from &amp;lt;citet r='patterson:1996a'/&amp;gt;.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&lt;/div&gt;</description>
			<pubDate>Sat, 04 Sep 2010 23:56:57 GMT</pubDate>			<dc:creator>Tcw24</dc:creator>			<comments>http://www.acousticscale.org/wiki/index.php/Image_talk:Pat_Yost_Human_Results.png</comments>		</item>
		<item>
			<title>Image:Pat Yost Predictions.png</title>
			<link>http://www.acousticscale.org/wiki/index.php?title=Image:Pat_Yost_Predictions.png&amp;diff=5204&amp;oldid=prev</link>
			<description>&lt;p&gt;&lt;/p&gt;

			&lt;table style=&quot;background-color: white; color:black;&quot;&gt;
			&lt;col class='diff-marker' /&gt;
			&lt;col class='diff-content' /&gt;
			&lt;col class='diff-marker' /&gt;
			&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;←Older revision&lt;/td&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 23:55, 4 September 2010&lt;/td&gt;
			&lt;/tr&gt;
		&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;Psychometric &lt;/del&gt;functions for pitch strength of IRN. Figure &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;1 &lt;/del&gt;from Patterson, Handel, Yost and Datta, 1996.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Predicted psychometric &lt;/ins&gt;functions for pitch strength of &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;an &lt;/ins&gt;IRN &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;stimulus compared with an tonal stimulus masked with noise&lt;/ins&gt;. Figure &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;5 &lt;/ins&gt;from Patterson, Handel, Yost and Datta, 1996.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;!-- diff generator: internal 2010-09-08 20:09:31 --&gt;
&lt;/table&gt;</description>
			<pubDate>Sat, 04 Sep 2010 23:55:49 GMT</pubDate>			<dc:creator>Tcw24</dc:creator>			<comments>http://www.acousticscale.org/wiki/index.php/Image_talk:Pat_Yost_Predictions.png</comments>		</item>
		<item>
			<title>Auditory-Based Processing of Communication Sounds/Compressive Auditory Filtering</title>
			<link>http://www.acousticscale.org/wiki/index.php?title=Auditory-Based_Processing_of_Communication_Sounds/Compressive_Auditory_Filtering&amp;diff=5203&amp;oldid=prev</link>
			<description>&lt;p&gt;&lt;span class=&quot;autocomment&quot;&gt;Comparison to human pitch perception:&amp;#32;&lt;/span&gt; &lt;/p&gt;

			&lt;table style=&quot;background-color: white; color:black;&quot;&gt;
			&lt;col class='diff-marker' /&gt;
			&lt;col class='diff-content' /&gt;
			&lt;col class='diff-marker' /&gt;
			&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;←Older revision&lt;/td&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 23:43, 4 September 2010&lt;/td&gt;
			&lt;/tr&gt;
		&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 176:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 176:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;===== Comparison to human pitch perception =====&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;===== Comparison to human pitch perception =====&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;-&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;In order to bear any relevance to human pitch perception, it is also necessary that this normalised pitch strength measure does a good job of modelling human perception. &amp;lt;citet r='Patterson:1996a'/&amp;gt; performed a series of experiments on the human perception of IRN by comparing the pitch strength of IRN stimuli and tonal stimuli masked with noise. TODO: add more description of the experiments here see &amp;lt;citet r='Yost:1996, Yost:1998, Patterson:2000, Handel:2000'/&amp;gt;. The pitch strength measure described above was used to model the data of &amp;lt;citet r='Patterson:1996a'/&amp;gt;. The original results are plotted in &amp;lt;figureRef name='pat_yost_predictions '/&amp;gt;, and the results using the current measure and plotted in &amp;lt;figureRef name=' '/&amp;gt;. [[Image:Pat Yost Predictions.png|thumb|400px|right|&amp;lt;figure name='pat_yost_predictions'/&amp;gt; --- Pitch strength predictions for the perception of IRN in noise, from &amp;lt;citet r='patterson:1996a'/&amp;gt;.]]&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;In order to bear any relevance to human pitch perception, it is also necessary that this normalised pitch strength measure does a good job of modelling human perception. &amp;lt;citet r='Patterson:1996a'/&amp;gt; performed a series of experiments on the human perception of IRN by comparing the pitch strength of IRN stimuli and tonal stimuli masked with noise. TODO: add more description of the experiments here see &amp;lt;citet r='Yost:1996, Yost:1998, Patterson:2000, Handel:2000'/&amp;gt;. The pitch strength measure described above was used to model the data of &amp;lt;citet r='Patterson:1996a'/&amp;gt;. The original results are plotted in &amp;lt;figureRef name='pat_yost_predictions '/&amp;gt;, and the results using the current measure and plotted in &amp;lt;figureRef name='&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;irn_ps_model_results&lt;/ins&gt;'/&amp;gt;. [[Image:Pat Yost Predictions.png|thumb|400px|right|&amp;lt;figure name='pat_yost_predictions'/&amp;gt; --- Pitch strength predictions for the perception of IRN in noise, from &amp;lt;citet r='patterson:1996a'/&amp;gt;.]] &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[Image:Pzfc ps 800Hz plot prop.eps|thumb|400px|right|&amp;lt;figure name='irn_ps_model_results'/&amp;gt; --- Predicted pitch strengths from the normalised pitch strength measure used in the experiments in this section. The results follow the same pattern as those reported in &amp;lt;citet r='patterson:1996a'/&amp;gt;, but the measure is slightly noisier. ]]. The pitch strength predictions made by the normalized pitch strength measure have the same form as the perceptual results reported in &amp;lt;citet r='patterson:1996a'/. The curves predicted by this model are a little noisier than those from the model of &amp;lt;citet r='patterson:1996a'/&amp;gt;, but the model has the advantage of being normalised so that it is easy to compare the results across different filterbank, which may have differing output levels.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;====Experiments====&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;====Experiments====&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;!-- diff generator: internal 2010-09-08 20:09:31 --&gt;
&lt;/table&gt;</description>
			<pubDate>Sat, 04 Sep 2010 23:43:14 GMT</pubDate>			<dc:creator>Tcw24</dc:creator>			<comments>http://www.acousticscale.org/wiki/index.php/Talk:Auditory-Based_Processing_of_Communication_Sounds/Compressive_Auditory_Filtering</comments>		</item>
	</channel>
</rss>