Seminar on audio information processing

Lecturer: Prof. Dr.-Ing. Bernhard U. Seeber
Assistants: Payman Azaripasand, M.Sc.; Norbert Bischof, M.Sc. and AIP employees
Turnus: Summer semester
Target group: Wahlmodul Hauptseminar, Master EI, EI7764
Breadth: 2 SWS
Exam: Graded presentation, discussion contribution and written summary (homework)
Time & Location: Wednesday, 14:00 hours - 16:00 hours, N0116
Start: Start on 18.10.2023 (Topic selection)


Changing subjects from current topics in audio information processing, e.g.  signal processing of music and speech, psychoacoustics, auditory models, room acoustics etc.
Students prepare a talk with slides on a selected topic, present it and practice answering questions similar to being at a conference. The aim of the seminar is not only to educate students in audio information processing, but particularly to teach presentations and literature search skills. Material is provided as a start for the literature search. Participants learn about current topics in audio information processing and practice reading scientific publications in the English language. Additionally, students compose a written report about the topic which is submitted for grading along with a commented version of the presentation slides.

Presentations in English

Some presentations will preferably (not compulsorily) be delivered in English. The student will not be marked on his/her English speaking capabilities, but rather on how well he/she understands the topic and how well the presentation is structured. Please contact the individual advisor to discuss further details.

Notes on the procedure

It is recommended to meet with the respective supervisor at an early stage, as well as regularly, in order to receive important feedback for a successful and appropriate presentation.

Topics for the summer semester 2020

Advisor: Ľuboš Hládek, PhD

When listening in real-life situations the listener or the source constantly moves. Whether and how movement affects speech perception remains an open scientific debate. Speech becomes more audible if the interfering signal is spatially separated, known as spatial release from masking (SMR). This benefit however varies with head orientation relative to the target and the interferer. The research has been focused to understanding whether people are motivated to turn their heads since it provides acoustic benefits for speech understanding. Understanding these strategies might be important for people hard of hearing such as cochlear implant users or hearing aid wearer.


Brimijoin, W. O., McShefferty, D., & Akeroyd, M. A. (2012). Undirected head movements of listeners with asymmetrical hearing impairment during a speech-in-noise task. Hearing Research, 283(1–2), 162–168.

Grange, J.A., Culling, J.F., Bardsley, B., Mackinney, L.I., Hughes, S.E., Backhouse, S.S. (2018). Turn an Ear to Hear: How Hearing-Impaired Listeners Can Exploit Head Orientation to Enhance Their Speech Intelligibility in Noisy Social Settings. Trends in Hearing 22(1-13), 162-1688. doi: 10.1177/2331216518802701.

Shen, Y., Folkerts, M.L., Richards, V.M. (2017). Head movements while recognizing speech arriving from behind. The Journal of the Acoustical Society of America 141(2), EL108-EL114. doi: 10.1121/1.4976111.

Frissen, I., Scherzer, J., Yao, H.-Y. (2019). The Impact of Speech Irrelevant Head Movements on Speech Intelligibility in Multi-Talker Envrionments. Acta Acustica United with Acustica, 105(6), 1286–1290. doi: 10.3813/AAA.919408 .

Advisor: Ľuboš Hládek, PhD

Self-motion provides essential cues for orienting behavior, such as sound localization and speech perception, in everyday acoustic scenes. For instance, self-motion helps to organize sounds in the median plane because of the pinna filtering effects. This has been researched in so called front-back illusion paradigm in which frequency specific sounds can restrict the ability to resolve the front-back signals, which can lead to an illusory percept of a sound in front of a person, despite the sound source is behind. However, this happens only if the sound moves with the head in a specific way.


Wallach, H. (1940). The role of head movements and vestibular and visual cues in sound localization. Journal of Experimental Psychology, 27, 339-368.

Wightman, F.L., Kistler, D.J. (1999). Resolution of front-back ambiguity in spatial hearing by listener and source movement. The Journal of the Acoustical Society of America, 105(5), 2841-2853. doi: 10.1121/1.426899

Brimijoin, W. O., Akeroyd, M. A. (2012). The role of head movements and signal spectrum in an auditory front/back illusion. I-Perception, 3(3), 179–181.

Brimijoin, W. O. (2018). Angle-Dependent Distortions in the Perceptual Topology of Acoustic Space. Trends in Hearing, 22, 1–11. doi: 10.1177/2331216518775568 .

Advisor: Ľuboš Hládek, PhD

Audio-visual spatial integration is a perceptual mechanism when a sound and a visual stimulus at two different are perceived as one object. The perception of such stimuli depends on the spatial and temporal proximity of stimuli and other perceptual parameters such as salience of the individual components of the audio-visual stimulus. One way of understanding this phenomenon is to assume that brain combines these stimuli in the optimal way only in certain conditions. Causal inference modeling is a framework inspired by Bayesian statistics which takes into account perceptual noise of the underlying stimuli (or their salience) which inherently determines whether the two components of the AV complex will be perceived as one or two events. The presentation and the report will be in English language.


Körding, K. P., Beierholm, U. R., Ma, W. J., Quartz, S. R., Tenenbaum, J. B., & Shams, L. (2007). Causal inference in multisensory perception. PLoS One, 2(9), e943.

Wozny, D. R., & Shams, L. (2011). Computational characterization of visually induced auditory spatial adaptation. Frontiers in Integrative Neuroscience, 5(November), 75.

Odegaard, B., Wozny, D. R., & Shams, L. (2016). The effects of selective and divided attention on sensory precision and integration. Neuroscience Letters, 614, 24–28.

Mendonça, C., Mandelli, P., & Pulkki, V. (2016). Modeling the perception of audiovisual distance: Bayesian causal inference and other models. PLoS ONE, 11(12), 1–18.

Advisor: Norbert Kolotzek, M.Sc.

Understanding speech is one of the most important parts of communication. Speech understanding can be negatively affected by loud and noisy environments with a couple of noise sources. In the literature there are a few models which try to predict speech understanding in specific hearing situations for both, normal hearing and hearing impaired listeners.

The aim of this presentation should be to give an overview of physiological motivated models for speech perception and compare results of the models with measured results from hearing experiments for different hearing situations.


Hauth, C.F., Brand, T. (2018). Modeling Sluggishness in Binaural Unmasking of Speech for Maskers With Time-Varying Interaural Phase Differences., Trends in Hearing, 22, 1-10.

Lavandier, M. and Culling, J. F. (2010). Prediction of binaural speech intelligibility against noise in rooms. J. Acoust. Soc. Am., 127, 387-399.

Lavandier, M., Jelfs, S., Culling, J.F., Watkins, A.J., Raimond, A.P., and Makin, S.J. (2012) Binaural prediction of speech intelligibility in reverberant rooms with multiple noise sources. J. Acoust. Soc. Am., 131, 218-231.

Rhebergen, K.S., Lyzenga, J., Dreschler, W.A., and Festen, J.M. (2010). Modelling speech intelligibility in quiet and noise in listeners with normal and impaired hearing. J. Acoust. Soc. Am., 127, 1570-1583.

Advisor: Norbert Kolotzek, M.Sc.

In a dynamic environment, it is not easy to focus on a moving sound source or to understand a moving speaker compared to a static situation. The speed of a moving source or the angular displacement from the midline are some parameters which affect speech recognition. Also the listening situation is important for speech understanding, sometimes the speech signal is the signal of interest, sometimes an additional speech signal is distracting the target signal. The main goal of this presentation is whether motion of the target or distractor disturb the speech recognition or if it doesn't matter which of the signals is dynamic. Therefore, a literature review should be given in the presentation and compared for both situations, dynamic target and dynamic masker.


Chandler, D. W., and Grantham, D. W. (1992). Minimum audible movement angle in the horizontal plane as a function of stimulus frequency and bandwidth, source azimuth, and velocity. The Journal of the Acoustical Society of America 91, 1624-1636.

Hauth, C. F., and Brand, T. (2018). Modeling Sluggishness in Binaural Unmasking of Speech for Maskers With Time-Varying Interaural Phase Differences. Trends in Hearing 22, 1-10.

Culling, J. F., and Mansell, E. R. (2013). Speech intelligibility among modulated and spatially distributed noise sources. The Journal of the Acoustical Society of America 133, 2254-2261.

Davis, T. J., Grantham, D. W., and Gifford, R. H. (2016). Effect of motion on speech recognition. Hearing Research 337, 80-88.

Advisor: Norbert Kolotzek, M.Sc.

Auditory localization by human listeners is incredibly precise, especially in the azimuthal plane. This was especially shown for static sound sources in a free field environment. Humans are also able to localize and follow the trajectory of a moving sound source. But we are not that precise as for static sound sources, e.g. localized positions on a moving trajectory are shifted in the direction of motion (Getzmann & Lewald, 2007). The goal of the presentation should be to give an overview of occurring localization changes between static and moving sound sources. Additionally, different explanatory approaches for the occurring shifts should be summarized from the literature with focus on binaural mechanisms of the auditory system.


Getzmann, S., and Lewald, J. (2007). Localization of moving sound. Perception & Psychophysics 69, 1022-1034.

Perrott, D. R., and Musicant, A. D. (1977). Minimum auditory movement angle: Binaural localization of moving sound sources. The Journal of the Acoustical Society of America 62, 1463-1466.

Rosenblum, L. D., Carello, C., and Pastore, R. E. (1987). Relative effectiveness of three stimulus variables for locating a moving sound source. Perception 16, 175-186.

Saberi, K., and Perrott, D. R. (1990). Minimum audible movement angles as a function of sound source trajectory. The Journal of the Acoustical Society of America 88, 2639-2644.

Grantham, D. W. (1986). Detection and discrimination of simulated motion of auditory targets in the horizontalplane. The Journal of the Acoustical Society of America 79, 1939-1949.

Roman, N., and Wang, D. L. (2008). Binaural Tracking of Multiple Moving Sources. IEEE Transaction on audio, speech, and language processing 16, 728-739.

Advisor: Dipl.-Ing. Matthieu Kuntz

Sound quality can easily be estimated by an individual. However, it is more difficult to define models and measures to qualify the quality of a sound source, since it needs to take the perception of sounds into account. Some monaural measures that can be translated into a sound quality measurement (e.g. roughness, loudness in different frequency bands) have already been defined and are used fairly commonly. This presentation should focus on binaural models to estimate sound quality, giving an overview of the different models and sound quality definition and comparing them critically.


Pulkki, V., Karjalainen, M., & Huopaniemi, J. (1999). Analyzing Virtual Sound Source Attributes Using a Binaural AUditory Model. AES: Journal of the Audio Engineering Society, 47 (4), 203–217.

Schoeffler, M., & Herre, J. (2014). Towards a listener model for predicting the overall listening experience. ACM International Conference Proceeding Series, 2014

van Dorp Schuitman, J., de Vries, D., & Lindau, A. (2013). Deriving content-specific measures of room acoustic perception using a binaural, nonlinear auditory model. The Journal of the Acoustical Society of America, 133 (3), 1572–1585.

Osses Vecchi, A., Kohlrausch, A., Lachenmayr, W., & Mommertz, E. (2017). Predicting the perceived reverberation in different room acoustic environments using a binaural auditory model. The Journal of the Acoustical Society of America, 141 (4), EL381–EL387.

Advisor: Dipl.-Ing. Matthieu Kuntz

In the last years, virtual acoustic environments, both for reasearch and entertainment purposes, have become quite common. Conventional sound quality measurements such as roughness and coloration are not enough to characterize the quality of the spatial sound reproduction. Some other measures such as enveloppment become important to evaluate different reproduction systems. This presentation should focus on the evaluation of spatial reproduction systems and the definition of sound quality measures to qualify them.


George, S., Zielinski, S., Rumsey, F., Jackson, P., Conetta, R., Dewhirst, M., Meares, D., & Bech, S. (2010). Development and validation of an unintrusive model for predicting the sensation of envelopment arising from surround sound recordings. AES: Journal of the Audio Engineering Society, 58 (12), 1013–1031.

Rumsey, F., Zielinski, S., Jackson, P. J. B., Dewhirst, M., Conetta, R., George, S., Bech, S., & Meares, D. (2008). QESTRAL (Part 1): Quality Evaluation of Spatial Transmission and Reproduction using an Artificial Listener. In Aes 125th convention.

Rumsey, F., Zielinski, S., Kassier, R., & Bech, S. (2005). On the relative importance of spatial and timbral fidelities in judgments of degraded multichannel audio quality. The Journal of the Acoustical Society of America, 118 (2), 968–976 .

Advisor: Han Li, B.E.

Speech perception in the presence of another competing voice is one of the most challenging tasks, which is regarded as a “cocktail party”. It relies on our ability to organize the incoming sounds into coherent perceptual streams. Auditory stream segregation has been extensively investigated for simple tones, but this topic focuses on that for realistic speech. Differences in fundamental frequency (F0) are known to be a strong cue for auditory stream segregation. Speech consists of both voiced (vowel) and unvoiced (fricative consonant) sounds. The contribution of F0 in the perceptual separation of competing vowels, constant-vowel tokens and sentences is our interest. In this topic, conclusions are mainly obtained through a series of psychoacoustic experiments in the literature.


Rossi-Katz, J. A., & Arehart, K. H. (2005). Effects of cochlear hearing loss on perceptual grouping cues in competing-vowel perception. The Journal of the Acoustical Society of America, 118 (4), 2588–2598.

Summers, R. J., Bailey, P. J., & Roberts, B. (2010). Effects of differences in fundamental frequency on across-formant grouping in speech perception. The Journal of the Acoustical Society of America, 128 (6), 3667–3677.

David, M., Lavandier, M., Grimault, N., & Oxenham, A. J. (2017). Sequential stream segregation of voiced and unvoiced speech sounds based on fundamental frequency. Hearing Research, 344,235–243.

Advisor: Ramona Beinstingel, M.Sc.

Mithilfe einer Active Noise Control (ANC) Kopfstütze lässt sich eine lokale Ruhezone um den Kopf des Fahrzeuginsassen erzeugen. In dieser Zone werden Störgeräusche reduziert. Die Fähigkeit des Systems zur Geräuschreduzierung wird durch die Kopfbewegung des Insassen beeinträchtigt. Je intensiver die Kopfbewegungen ausfallen, umso instabiler ist die Geräuschunterdrückung. Das Ziel ist es, die Geräuschreduzierung aufrechtzuerhalten ohne den Bewegungsradius des Insassen zu beschränken. Deshalb werden unter anderem Head-Tracking Systeme zur Kompensation der Kopfbewegung eingesetzt. Die Arbeit soll einen Überblick über verschiedene Head-Tracking Ansätze in Kombination mit ANC geben. Es soll auf den Einfluss der jeweiligen Methoden eingegangen und Vor- und Nachteile aufgezeigt werden.


Han, R., Wu, M., Gong, C., Jia, S., Han, T., Sun, H., Yang, J. (2019). “Combination of Robust Algorithm and Head-Tracking for a Feedforward Active Headrest.” Applied Sciences 9:9, 1760.

Jung, W., Elliott, S. J., and Cheer, J. (2017). “Combining the remote microphone technique with head-tracking for local active sound control,” Journal of the Acoustical Society of America.142(1), 298–307.

Elliott, S. J., Jung, W., and Cheer, J. (2018). Head tracking extends local active control of broadband sound to higher frequencies. Scientific Reports 8:1, 5403.

Buck, J., Jukkert, S., Sachau, D. (2018). Performance evaluation of an active headrest considering non-stationary broadband disturbances and head movement. Journal of the Acoustical Society of America 143:5, 2571-2579 .

<thd>Advisor: <td>Ramona Beinstingel, M.Sc.</td> </thd>

Das Hören und Verstehen von Sprache ist ein wichtiger Bestandteil in der täglichen Kommunikation. Bei der Untersuchung von Hörsituationen kann neben der Sprachverständlichkeit die Höranstrengung als weiteres Bewertungsmaß herangezogen werden. Es wurde gezeigt, dass es Zuhörern trotz eines sehr hohen Sprachverständlichkeit schwerfällt, einem Sprecher zu folgen. Das Interesse am Messen der Höranstrengung als nützliche Kenngröße zur Bewertung realistischer Hörszenarien ist dadurch in den letzten Jahren gestiegen. Diese Arbeit soll sich auf Messverfahren zur Ermittlung der Höranstrengung konzentrieren. Die Präsentation soll einen Überblick über verschiedene Verfahren geben. Diese sollen untereinander verglichen und kritisch betrachtet werden.


Rennies, J., Kidd, G.Jr. (2018). “Benefit of binaural listening as revealed by speech intelligibility and listening effort”. The Journal of the Acoustical Society of America 144:4, 2147-2159

Klink, K., Schulte, M., and Meis, M. (2012a). “Measuring listening effort in the field of audiology—A literature review of methods (part 1),” Z.Audiol. 51, 60–67.

Krueger, M., Schulte, M., Brand, T., and Holube, I. (2017). “Development of an adaptive scaling method for subjective listening effort,” The Journal of the Acoustical Society of America.141, 4680–4693

Picou, E. M., Ricketts, T. A. (2018) “The relationship between speech recognition, behavioural listening effort, and subjective ratings”, International Journal of Audiology, 57:6, 457-467.


The registration for the seminar can be done via TUMOnline.