Loading…
Wednesday, October 20
 

1:30pm EDT

Best Paper: Perceptual Evaluation of Interior Panning Algorithms Using Static Auditory Events
Interior panning algorithms enable content authors to position auditory events not only at the periphery of the loudspeaker configuration, but also within the internal space between the listeners and the loudspeakers. In this study such algorithms are rigorously evaluated, comparing rendered static auditory events at various locations against true physical loudspeaker references. Various algorithmic approaches are subjectively assessed in terms of; Overall, Timbral, and Spatial Quality for three different stimuli, at five different positions and three radii. Results show for static positions that standard Vector Base Amplitude Panning performs equal, or better, than all other interior panning algorithms tested here. Timbral Quality is maintained throughout all distances. Ratings for Spatial Quality vary, with some algorithms performing significantly worse at closer distances. Ratings for Overall Quality reduce moderately with respect to reduced reproduction radius and are predominantly influenced by Timbral Quality.

Speakers
TR

Thomas Robotham

International Audio Laboratories Erlangen
AS

Andreas Silzle

International Audio Laboratories Erlangen
AN

Anamaria Nastasa

Aalto University
avatar for Alan Pawlak

Alan Pawlak

PhD Candidate, University of Huddersfield
Alan Pawlak is a final-year PhD candidate at the Applied Psychoacoustics Laboratory (APL) of the University of Huddersfield, specialising in spatial audio and binaural rendering. During his four-year Music Technology and Audio Systems program, Alan completed a year in industry as... Read More →
avatar for Juergen Herre

Juergen Herre

Chief Executive Scientist, International Audio Laboratories Erlangen
Prof. Dr.-Ing. Herre is a fellow member of the Audio Engineering Society (AES), co-chair of the AES Technical Committee on Coding of Audio Signals and chair of the AES Technical Council. In 1989 he joined the Fraunhofer Institute for Integrated Circuits (IIS) in Erlangen, Germany... Read More →


Wednesday October 20, 2021 1:30pm - 2:00pm EDT
Stream A
 
Saturday, October 23
 

1:00pm EDT

Best Student Paper: InSE-NET: A Perceptually Coded Audio Quality Model based on CNN
Automatic coded audio quality assessment is an important task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen codecs, bitrates, content-types, and a lack of flexibility of existing approaches. One of the typical human-perception-related metrics, ViSQOL v3 (ViV3), has been proven to provide a high correlation to the quality scores rated by humans. In this study, we take steps to tackle problems of predicting coded audio quality by completely utilizing programmatically generated data that is informed with expert domain knowledge. We propose a learnable neural network, entitled InSE-NET, with a backbone of Inception and Squeeze-and-Excitation modules to assess the perceived quality of
coded audio at a 48 kHz sample rate. We demonstrate that synthetic data augmentation is capable of enhancing the prediction. Our proposed method is intrusive, i.e. it requires Gammatone spectrograms of unencoded reference signals. Besides a comparable performance to ViV3, our approach provides a more robust prediction towards higher bitrates.

Speakers
avatar for Guanxin Jiang

Guanxin Jiang

Dolby Germany GmbH
avatar for Arijit Biswas

Arijit Biswas

Dolby Germany GmbH
CB

Christian Bergler

Pattern Recognition Lab, Friedrich-Alexander University Erlangen-Nuremberg
AM

Andreas Maier

Pattern Recognition Lab, Friedrich-Alexander University Erlangen-Nuremberg


Saturday October 23, 2021 1:00pm - 1:30pm EDT
Stream A
 
Thursday, October 28
 

9:00pm EDT

3D Impulse Response Convolution with Multichannel Direct Sound: Assessing Perceptual Equivalency between Room- and Source- Impression for Music Production
A method for representing the three-dimensional radiation patterns of instruments/performers within artificial reverberation using multichannel direct sound files convolved with channel-based spatial room impulse responses (SRIRs) is presented. Two reverb conditions are studied in a controlled listening test: a) all SRIR channel positions are convolved with a single monophonic direct sound file, and b) each SRIR channel position is convolved with a unique direct sound file taken from a microphone array surrounding the performer. Participants were asked to adjust the level of each reverberation condition (relative to a fixed direct sound stream) to three perceptual thresholds relating to source- and room- impression. Results of separate three-way within-subject ANOVAs and post-hoc analysis show significant interactions between instrument / room type, and instrument / reverb condition on each of the three thresholds. Most notably, reverb condition b) required less level than condition a) to yield perceptual equivalency between source- and room- impression, suggesting that the inclusion of multichannel direct sound in SRIR convolution may increase the salience of room impression in the immersive reproduction of acoustic music.

Speakers
avatar for Jack Kelly

Jack Kelly

McGill University
Jack Kelly is. Ph.D. candidate at the Schulich School of Music, McGill University. His thesis research centers on the influence of spatial room impulse response convolution technologies (channel-based and HOA arrays) on the sensation of physical presence in immersive music production. He... Read More →
avatar for Richard King

Richard King

McGill University
Richard King is an Educator, Researcher, and a Grammy Award winning recording engineer. Richard has garnered Grammy Awards in various fields including Best Engineered Album in both the Classical and Non-Classical categories. Richard is an Associate Professor at the Schulich School... Read More →
WW

Wieslaw Woszczyk

McGill University


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

A Neural Beamforming Front-end for Distributed Microphone Arrays
Robust real-time audio signal enhancement increasingly relies on multichannel microphone arrays for signal acquisition. Sophisticated beamforming algorithms have been developed to maximize the benefit of multiple microphones. With the recent success of deep learning models created for audio signal processing, the task of Neural Beamforming remains an open research topic. This paper presents a Neural Beamformer architecture capable of performing spatial beamforming with microphones randomly distributed over very large areas, even in negative signal-to-noise ratio environments with multiple noise sources and reverberation. The proposed method combines adaptive, nonlinear filtering and the computation of spatial relations with state-of-the-art mask estimation networks. The resulting End-to-End network architecture is fully differentiable and provides excellent signal separation performance. Combining a small number of principal building blocks, the method is capable of low-latency, domain-specific signal enhancement even in challenging environments.

Speakers
JZ

Jonathan Ziegler

Stuttgart Media University
LS

Leon Schröder

Stuttgart Media University
AK

Andreas Koch

HdM Stuttgart
AS

Andreas Schilling

Eberhard Karls University Tuebingen


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

AI 3D immersive audio codec based on content-adaptive dynamic down-mixing and up-mixing framework
Recently, people who prefer to consume media contents via over the top (OTT) platform, such as YouTube, Netflix etc., rather than a conventional broadcasting get increased more and more. To deliver an immersive audio experience to them more effectively, we propose a unified framework for AI-based 3D immersive audio codec. In this framework, to maximize the original immersiveness even at a down-mixed audio, while enabling to precisely reproduce the original 3D audio from the down-mixed audio, content-adaptive dynamic down-mixing and up-mixing scheme is newly proposed. The experimental results show that the proposed framework can render more improved down-mixed audio compared to the conventional method as well as successfully reproduce the original 3D audio.

Speakers
avatar for Woo Hyun Nam

Woo Hyun Nam

Principal Engineer, Samsung Research, Samsung Electronics
Woo Hyun Nam received the Ph.D. degree in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea in 2013. Since 2013, he has been with the Samsung Research, Samsung Electronics, where he is a Principal Engineer and is currently leading... Read More →
TL

Tammy Lee

Samsung Research, Samsung Electronics
SC

Sang Chul Ko

Samsung Research, Samsung Electronics
YS

Yoonjae Son

Samsung Research, Samsung Electronics
HK

Hyun Kwon Chung

Samsung Research, Samsung Electronics
KK

Kyung-Rae Kim

Samsung Research, Samsung Electronics
JK

Jungkyu Kim

Samsung Research, Samsung Electronics
SH

Sunghee Hwang

Samsung Research, Samsung Electronics
KL

Kyunggeun Lee

Samsung Research, Samsung Electronics


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Analysis of a Unique Pingable Circuit: The Gamelan Resonator
This paper offers a study of the circuits developed by artist Paul DeMarinis for the touring version of his work Pygmy Gamelan. Each of the six copies of the original circuit, developed June-July 1973, produce a carefully tuned and unique five-tone scale. These are obtained by five resonator circuits which pitch pings produced by a crude antenna fed into clocked bit-shift registers. While this resonator circuit may seem related to classic Bridged-T and Twin-T designs, common in analog drum machines, DeMarinis' work actually presents a unique and previously undocumented variation on those canonical circuits. We present an analysis of his third-order resonator (which we name the Gamelan Resonator), deriving its transfer function, time domain response, poles, and zeros. This model enables us to do two things: first, based on recordings of one of the copies, we can deduce which standard resistor and capacitor values DeMarinis is likely to have used in that specific copy, since DeMarinis' schematic purposefully omits these details to reflect their variability. Second, we can better understand what makes this filter unique. We conclude by outlining future projects which build on the present findings for technical development.

Speakers
EJ

Ezra J. Teboul

Paris
Historian of electronic music technology, its users, and its makers.CHSTM sound and technology group co-convener:https://www.chstm.org/content/sound-and-technology
avatar for Kurt James Werner

Kurt James Werner

Research Engineer, iZotope, Inc.


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Application of AI techniques for nonlinear control of loudspeakers
To obtain high loudness, with good bass extension while keeping distortion low, and ensure mechanical protection, one needs to control accurately the motion of the loudspeaker diaphragm. Actual solutions for nonlinear control of loudspeakers are complex, difficult to implement and to tune. They are limited in accuracy due to insufficient physical models, that do not completely capture the complexity of the loudspeaker. Furthermore, the physical model parameters are difficult to estimate.
We present here a novel approach that uses a Neural Network to map directly the diaphragm displacement to the input voltage, allowing us to invert the loudspeaker. This technique allows to control and linearize the loudspeaker without theoretical assumptions and with better accuracy than a model-based approach. It is also simpler to implement.

Speakers
YL

Yuan Li

Senior Engineer, Samsung Research America
avatar for Pascal Brunet

Pascal Brunet

Dir. Research, Samsung Research America
Pascal Brunet obtained his Bachelor's in Sound Engineering from Ecole Louis Lumiere, Paris, in 1981, his Master's in Electrical Engineering from CNAM, Paris, in 1989 and a PhD degree in EE from Northeastern University, Boston, in 2014. His thesis was on nonlinear modeling of loudspeakers... Read More →
GK

Glenn Kubota

Samsung Research America
AM

Aaquila Mariajohn

Samsung Research America


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Audio-Source Rendering on Flat-Panel Loudspeakers with Non-Uniform Boundary Conditions
Devices from smartphones to televisions are beginning to employ dual purpose displays, where the display serves as both a video screen and a loudspeaker. In this paper we demonstrate a method to generate localized sound-radiating regions on a flat-panel display. An array of force actuators affixed to the back of the panel is driven by appropriately filtered audio signals so the total response of the panel due to the actuator array approximates a target spatial acceleration profile. The response of the panel to each actuator individually is initially measured via a laser vibrometer, and the required actuator filters for each source position are determined by an optimization procedure that minimizes the mean squared error between the reconstructed and targeted acceleration profiles. Since the single-actuator panel responses are determined empirically, the method does not require analytical or numerical models of the system’s modal response, and thus is well-suited to panels having the complex boundary conditions typical of television screens, mobile devices, and tablets. The method is demonstrated on two panels with differing boundary conditions. When integrated with display technology, the localized audio source rendering method may transform traditional displays into multimodal audio-visual interfaces by colocating localized audio sources and objects in the video stream.

Speakers
MH

Michael Heilemann

Assistant Professor, University of Rochester
avatar for Tre DiPassio

Tre DiPassio

PhD Student, University of Rochester
Hello! My name is Tre, and I am in my final semester as a PhD student studying musical acoustics and signal processing under the supervision of Dr. Mark Bocko and Dr. Michael Heilemann. The research lab I am a part of has been developing an emerging type of speaker, called a flat... Read More →
MB

Mark Bocko

University of Rochester


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Automatic Loudspeaker Room Equalization Based On Sound Field Estimation with Artificial Intelligence Models
In-room loudspeaker equalization requires a significant amount of microphone positions in order to characterize the sound field in the room. This can be a cumbersome task for the user. This paper proposes the use of artificial intelligence to automatically estimate and equalize, without user interaction, the in-room response. To learn the relationship between loudspeaker near-field response and total sound power, or energy average over the listening area, a neural network was trained using room measurement data. Loudspeaker near-field SPL at discrete frequencies was the input data to the neural network. The approach has been tested in a subwoofer, a full-range loudspeaker, and a TV. Results showed that the in-room sound field can be estimated within 1--2 dB average standard deviation.

Speakers
avatar for Adrian Celestinos

Adrian Celestinos

Samsung Research America
YL

Yuan Li

Senior Engineer, Samsung Research America
VM

Victor Manuel Chin Lopez

Samsung Research Tijuana


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Binaural Audio Externalization Processing
Headphone or earbud listening scenarios span from the home or office to mobile and automotive environments, with audio source content formats including two-channel stereo, multi-channel surround, immersive or object-based material. Post-processing methods have been developed with the intent of restoring, during headphone playback, the spatial audio cues experienced in natural or loudspeaker listening, remediating known effects of headphone-mediated audio reproduction: the perceived localization of sounds in or near the head, accompanied by timbre or balance distortions and spatial image blurring or warping. The intended benefits include alleviating listening fatigue and cognitive load. In this E-Brief presentation, we review previously reported binaural audio post-processing methods and consider a strategy emphasizing minimal signal modification, applicable to enhancing conventionally produced stereo recordings.

This is a work-in-progress report on an investigation that we plan to report on in a future paper. The slides and audio demonstrations are posted at izotope.com/tech/aes_extern.

Speakers
avatar for Jean-Marc Jot

Jean-Marc Jot

Founder and Principal, Virtuel Works LLC
Spatial audio and music technology expert and innovator. Virtuel Works provides audio technology strategy, IP creation and licensing services to help accelerate the development of audio and music spatial computing technology and interoperability solutions.
avatar for Alexey Lukin

Alexey Lukin

Prinicipal DSP Engineer, iZotope Inc
Alexey specializes in audio signal processing, with particular interest in similarities with image processing in spectral analysis, noise reduction, and multiresolution filter banks. He earned his M.S. (2003) and Ph.D. (2006) in computer science from Lomonosov Moscow State University... Read More →
avatar for Kurt James Werner

Kurt James Werner

Research Engineer, iZotope, Inc.
EA

Evan Allen

iZotope, Inc.


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Bit Rate Requirements for an Audio Codec for Stereo, Surround and Immersive Formats
This paper describes a comprehensive study on the sound quality of the Opus codec for stereo, surround and immersive audio formats for music and cinematic content. We conducted three listening tests on Opus encoded stereo, 5.1 and 7.1.4 test samples taken from music, cinematic and EBU files encoded at bit rates of 32, 48 and 64 kbps per channel. Preliminary results indicate that a bit rate of 64 kbps per channel or higher is required for stereo, but 48 kbps per channel may be sufficient for surround and immersive audio formats.

Speakers
avatar for Sunil G. Bharitkar

Sunil G. Bharitkar

Samsung Research America
avatar for Allan Devantier

Allan Devantier

Samsung Research America
CT

Carlos Tejeda-Ocampo

Samsung Research Tijuana
CZ

Carren Zhongran Wang

Samsung Research America
WS

Will Saba

Samsung Research America


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Comparison of different techniques for recording and postproduction using main-microphone arrays for binaural reproduction.
We present a subjective evaluation of various 3D main-microphone techniques for three-dimensional binaural music production. Forty-seven subjects participated in the survey, listening on headphones. Results suggest that ESMA-3D, followed by Decca tree with height, work best of the included 3D arrays. However, the dummy head and a stereo AB microphone performed as well if not better than any of the arrays. Though not implemented for this study, our workflow allows the possibility to include individualized HRTF's and head-tracking; their impact will be considered in a future study.

Speakers
avatar for Josua Dillier

Josua Dillier

Zürcher Hochschule der Künste ZHdK
Josua Dillier is a young audio engineer and producer living in Zurich, Switzerland. His works range from CD- or Videoproduction to live mixing. He is specialized in the recording of acoustic instruments.Before his studies as a Tonmeister at University of the Arts Zurich he studied... Read More →
HJ

Hanna Järveläinen

Zürcher Hochschule der Künste ZHdK


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Deconvolution of Room Impulse Responses from Simultaneous Excitation of Loudspeakers
Traditional room-equalization involves exciting one loudspeaker at a time and deconvolving the loudspeaker-room response from the recording. As the number of loudspeakers and positions increase, the time required to measure loudspeaker-room responses will increase. In this paper, we present a technique to deconvolve impulse responses after exciting all loudspeakers at the same time. The stimuli are shifted relative to a base-stimuli and are optionally pre-processed with arbitrary filters to create specific sounding signals. The stimuli shift ensures capture of the low-frequency reverberation tail after deconvolution. Various deconvolution techniques including correlation-based, and adaptive filter-based, are presented. The performance is characterized in terms of plots and objective metrics using responses from the Multichannel Acoustic Reverberation Dataset (MARDY) dataset.

Speakers
avatar for Sunil G. Bharitkar

Sunil G. Bharitkar

Samsung Research America


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Defining reverberation plugin structure: A comparative exploration of system design and expert knowledge in an audio education context
Reverberation plugin designs differ significantly between manufacturers. The use of abstract terminology, individually stylised interfaces, and a manufacturers preferred lexicon increases complexity and decreases skill transference for novice users. Two studies were undertaken to explore the degree of complexity within the reverberation domain. In study one, the extent of both lexical and functional aspects of 46 reverberation plugins were examined through in-vivo coding of manufacturer documentation. From this, parameter labels were identified and inducted into nine higher level categories based on function. In study two, a free elicitation task was undertaken by seven experienced reverberation plugin users. This study identified the most salient parameters within their underlying knowledge structures, allowing the overlap between system and user to be viewed. The results from both studies establish the lexicon used within existing reverberation plugins, and the breadth of parameters discovered suggests that recognising and understanding parameters across designs may be challenging for novice users. The findings also provide an overview of the reverberation domain whilst highlighting the core parameters identified by expert users. This data could potentially act as the basis for a novice training system.

Speakers
avatar for Kevin Garland

Kevin Garland

PhD Researcher, TUS
Kevin Garland is a Postgraduate PhD Researcher at the Technological University of the Shannon: Midlands Midwest (TUS), Ireland. His primary research interests include human-computer interaction, user-centered design, and audio technology. Current research lies in user modelling and... Read More →


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Effect of Flicker Noise on Audio Signal Reproduction
The effect of multiplicative flicker noise superimposed on audio equipment on the sense of hearing was considered. Variable resistors used for volumes generate flicker noise, which indicates that it acts multiplicatively on the signal flowing through it. Flicker noise measurements were made for some variable resistors. In addition, the audition test was conducted to investigate the perceptible magnitude of the case where the flicker noise acts on the signal in a multiplicative manner. As a result, it was concluded that untrained individuals rarely could discern the multiplicative effect of volume flicker noise.

Speakers
AY

Akihiko Yoneya

Nagoya Institute of Technology


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Effects of Near-field Sources on Ambisonics Recording and Playback
Ambisonic recording with spherical microphone arrays (SMAs) is based on a far-field assumption which determines how microphone signals are encoded into Ambisonic signals. In the presence of a near-field source, low-frequency distance-dependent boosts arise in SMAs in similar nature to proximity effects in far-field equalized directional microphones. In this study, the effects of near-field sources on Ambisonic signals are modelled analytically, their interaction with regularization stages is observed, and then traced further across to two basic ambisonic processing operations: virtual microphones, and binaural decoding.

Speakers
avatar for Raimundo Gonzalez

Raimundo Gonzalez

Post-Doctoral Researcher, Aalto University
AP

Archontis Politis

Audio & Speech Processing Group, Tampere University of Technology
TL

Tapio Lokki

Department of Signal Processing and Acoustics, Aalto University


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Evaluating the Relationship Between Kurtosis Loss and Spectral Insertion Loss for Musicians' Hearing Protection Devices
Hearing protection devices (HPDs) are essential for musicians during loud performances to avoid hearing damage, but the standard Noise Reduction Rating (NRR) performance metric for HPDs metric says little about their behavior in a musical setting. One analysis tool being used to evaluate HPDs in the noise exposure research community is kurtosis measured in the ear and the reduction of noise kurtosis through an HPD. A musical signal, especially live music, will often have a high crest factor and kurtosis, so evaluating kurtosis loss will be important for an objective evaluation of musicians' HPDs. In this paper, a background on kurtosis and filters affecting kurtosis is described, as well as a setup for generating high-kurtosis signals and measuring in-ear kurtosis loss through an HPD. Measurement results on a variety of musicians' HPDs show that 83% of devices measured strongly reduce kurtosis, and that the kurtosis loss is likely an independent metric for performance because it is not correlated to the mean or standard deviation of the spectral insertion loss.

Speakers
DA

David Anderson

Applied Research Associates
TA

Theodore Argo

Applied Research Associates


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Forensic Handling of User Generated Audio Recordings
User generated recordings (UGRs) are common in audio forensic examination. The prevalence of handheld private recording devices, stationary doorbell cameras, law enforcement body cameras, and other systems capable of creating UGRs at public incidents is only expected to increase with the development of new and less expensive recording technology. It is increasingly likely that an audio forensic examiner will have to deal with an ad hoc collection of unsynchronized UGRs from mobile and stationary audio recording devices. The examiner’s tasks will include proper time synchronization, deducing microphone positions, and reducing the presence of competing sound sources and noise. We propose a standard forensic methodology for handling UGRs, including best practices for assessing authenticity and timeline synchronization.

Speakers
avatar for Rob Maher

Rob Maher

Professor, Montana State University
Audio digital signal processing, audio forensics, music analysis and synthesis.
BM

Benjamin Miller

Montana State University
FR

Fraser Robertson

Montana State University


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Gunshot Detection Systems: Methods, Challenges, and Can they be Trusted?
Many communities which are experiencing increased gun violence are turning to acoustic gunshot detection systems (GSDS) with the hope that their deployment would provide increased 24/7 monitoring and the potential for more rapid response by law enforcement to the scene. In addition to real-time monitoring, data collected by gunshot detection systems have been used alongside witness testimonies in criminal prosecutions. Because of their potential benefit, it would be appropriate to ask– how effective are GSDS in both lab/controlled settings vs. deployed real-world city scenarios? How reliable are outputs produced by GSDS? What is system performance
trade-off in gunshot detection vs. source localization of the gunshot? Should they be used only for early alerts or can they be relied upon in courtroom settings? Are resources spent on GSDS operational costs well utilized or could these resources be better invested to improve community safety? This study does not attempt to address many of these questions including social or economic questions of GSDS, but provides a reflective survey of hardware and algorithmic operations of the technology to better understand its potential as well as limitations. Specifically, challenges are discussed regarding environmental and other mismatch conditions, as well as emphasis on validation procedures used and their expected reliability. Many concepts discussed in this paper are general and will be likely utilized in or have impact on any gunshot detection technology. For this study, we refer to the ShotSpotter system to provide specific examples of system infrastructure and validation procedures.

Speakers
JH

John Hansen

Center for Robust Speech Systems; The University of Texas at Dallas
HB

Hynek Boril

University of Wisconsin - Platteville


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Implementing and Evaluating a Higher-order Ambisonic Sound System in a Multi-purpose Facility: A Lab Report
Although Ambisonic sound reproduction has an extensive history, it started finding more widespread use in the past decade due to the advances in computer hardware that enable real-time encoding and decoding of Ambisonic sound fields, availability of user-friendly software that facilitate the rendering of such sound fields, and recent developments in immersive media technologies, such as AR and VR systems, that prompt new research into spatial audio. In this paper, we discuss the design, implementation, and evaluation of a third-order Ambisonic system in an academic facility that is built to serve a range of functions including instruction, research, and artistic performances. Due to the multi-purpose nature of this space, there are numerous limitations to consider when designing an Ambisonic sound system that can operate efficiently without interfering with the variety of activities regularly carried out in it. We discuss our approach to working around such limitations and evaluating the resulting system. To that end, we present a user study conducted to assess the performance of this system in terms of perceived spatial accuracy. Based on the growing number of such facilities around the world, we believe that the design and evaluation methods presented here can be of use in the implementation of spatial audio systems in similar multi-purpose environments.

Speakers
avatar for Anıl Çamcı

Anıl Çamcı

Associate Professor of Performing Arts Technology, University of Michigan
SS

Sam Smith

University of Michigan
SH

Seth Helman

University of Michigan


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Informed postprocessing for auditory roughness removal for low-bitrate audio coders
In perceptual audio coding using very low bitrates, modulation artifacts can be introduced onto tonal signal components, which are often perceived as auditory roughness. These artifacts may occur for instance due to quantization errors or may be added when using an audio bandwidth extension, which sometimes causes an irregular harmonic structure at the borders of replicated bands. Especially, the roughness artifacts due to quantization errors are difficult to mitigate without investing considerably more bits in encoding of tonal components. We propose a novel technique to remove these roughness artifacts at the decoder side controlled by a small amount of guidance information transmitted by the encoder.

Speakers
SV

Steven Van De Par

Carl von Ossietzky University, Department of Medical Physics and Acoustics
SD

Sascha Disch

Fraunhofer IIS, Erlangen
AN

Andreas Niedermeier

Fraunhofer IIS, Erlangen
BE

Bernd Edler

Audiolabs Erlangen


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Interactive Application to Control and Rapid-prototype in a Collaborative Immersive Environment
Human-scale immersive environments offer rich, often interactive, experiences and their potential has been demonstrated across areas of research, teaching, and art. The variety of these spaces and their bespoke configurations leads to a requirement for content highly-tailored to individual environments and/or interfaces requiring complicated installations. These introduce hurdles which burden users with tedious and difficult learning curves, leaving less time for project development and rapid prototyping. This project demonstrates an interactive application to control and rapid-prototype within the CRAIVE-Lab at Rensselaer. Application Programming Interfaces (APIs) render complex functions of the immersive environment, such as audio spatialization, accessible via the Internet. A front-end interface configured to communicate with these APIs gives users simple and intuitive control over these functions from their personal devices (e.g. laptops, smartphones). While bespoke systems will often require bespoke solutions, this interface allows users to create content on day one, from their own devices, without set up, content-tailoring, or training. Three examples utilizing some or all of these functions are discussed.

Speakers
JB

Jonas Braasch

Professor, Rensselaer Polytechnic Institute
SC

Samuel Chabot

Rensselaer Polytechnic Institute


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Material models in loudspeakers using frictional elements
The compliance of a moving coil loudspeaker is known to depend on the level of the input signal. This effect is visible as a drop in resonance frequency. A nonlinear frictional element with hysteresis, and thus a level dependent compliance and damping, is added to the standard lumped parameter model. A comparison of simulation results and measurements reveals that the frictional model is able to explain the nonlinear behavior seen in the measurements.
The paper presents a scheme for fitting the model parameters to measured data. Results suggest that strong interaction between the frictional elements and the linear parameters is complicating this fitting, and strategies for solving this problem is presented and discussed

Speakers
RB

Rasmus Bølge Sørensen

Technical University of Denmark
FA

Finn Agerkvist

Technical University of Denmark


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Mayflower & The Seven Seas: Sonification of The Ocean
Created in conjunction with the Marine Institute at the University of Plymouth, the intention of this project was to use data transmitted by the on-board sensors of the Mayflower Autonomous Ship (MAS), to manipulate specially created pieces of music, based on sea shanties and folk ballads. Technical issues and Covid delays forced a late change, and the project was switched to using data from the university’s weather stations. This paper will illustrate how the music was produced and recorded, and the software configured to make the musical pieces vary in real-time, according to the changing sea conditions, so that the public will be able to view the current conditions and listen to the music evolve in real-time.

Speakers
ER

Eduardo Reck Miranda

University Of Plymouth
CM

Clive Mead

University Of Plymouth
DH

Dieter Hearle

University Of Plymouth


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Objective-oriented method for uniformation of various directivity representations
Over recent years, numerous attempts were taken to provide efficient methods of directivity representation, either regarding sound sources or head-related transfer functions. Because of the wide variety of programming tools and scripts used by different researchers, the resulting representations are inconvevnient to reproduce and compare with each other, hampering the development of the subject. Within this paper, an objective-oriented method is proposed to deal with this issue. The suggested approach bases on defining classes for different directivity models that share some general properties of directivity functions, allowing for easy comparison between different representations. A basic Matlab toolbox utlizing this method is presented alongside exemplary implementations of directivity models based on spherical and hyperspherical harmonics.

Speakers
AS

Adam Szwajcowski

AGH University of Science and Technology


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

On the comparison of flown and ground-stacked subwoofer configurations regarding noise pollution
In addition to audience experience and hearing health concerns, noise pollution issues are increasingly considered in large scale sound reinforcement for outdoor events. Among other factors, subwoofer positioning relative to the main system influences sound pressure levels at large distances, which may be considered as noise pollution.
In this paper, free field simulations are first performed showing that subwoofers positioning affects rear and side rejections but has a limited impact on noise level in front of the system. Then, the impact of wind on sound propagation at low frequencies is investigated. Simulation results show that the wind impacts more ground-stacked subwoofers than flown subwoofers, leading to higher sound levels downwind in the case of ground-stacked subwoofers.

Speakers
avatar for Etienne Corteel

Etienne Corteel

Director of Education & Scientific Outreach, global, L-Acoustics
Governing the scientific outreach strategy, Etienne and his team are the interface between L-Acoustics and the scientific and education communities. Their mission is to develop and maintain an education program tailor-made for the professional sound industry. Etienne also contributes... Read More →
avatar for Thomas Mouterde

Thomas Mouterde

Field application research engineer, L-Acoustics
Thomas Mouterde is a field application research engineer at L-Acoustics, a French manufacturer of loudspeakers, amplifiers, and signal processing devices. He is a member of the “Education and Scientific Outreach” department that aims at developing the education program of the... Read More →


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Overview and Comparison of Acoustical Characteristics in Three Historically Significant Nashville Recording Studios
Several key studios in Nashville, TN served as the focus for the creation of the recorded music experience known as “the Nashville Sound.” Recordings were notable for their songwriting style, musical arrangement, and the nature of the technical processes employed, including the specific recording spaces themselves. Three historically significant studios were selected as representative of this era. This study reviewed the historical background of the studios and investigated whether there may be similarities in these studios’ acoustical properties that resulted in a particular recording approach within these environments. Standard acoustic measurements were obtained and analysed in each of these three recording spaces.

Speakers
avatar for Doyuen Ko

Doyuen Ko

Associate Professor, Belmont University
Dr. Doyuen Ko is an Associate Professor of Audio Engineering Technology at Belmont University in Nashville, Tennessee. He received his Ph.D. and Master of Music from the Sound Recording Department at McGill University, Canada. Before studying at McGill, he has worked as a sound designer... Read More →
avatar for Jim Kaiser

Jim Kaiser

Belmont University
Jim Kaiser is an Instructor of Audio Engineering Technology at Belmont University in Nashville, TN.  He serves on the AES Technical Council, the Recording Academy Producers & Engineers Wing, and the Nashville Engineer Relief Fund Board.  Jim is a Past President of the International... Read More →
WB

Wesley Bulla

Belmont University


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Parametric Array Using Amplitude Modulated Pulse Trains: Experimental Evaluation of Beamforming and Single Sideband Modulation
We present a parametric array system realized with a microcontroller and MOSFET drivers. Pulse train signals with fundamental frequency of 40 kHz are generated by the microcontroller. The pulse trains are amplitude modulated by exploiting the switching mechanism of the MOSFETs. The higher-order harmonics are attenuated by the band-pass characteristic of the ultrasonic transducers, emitting only the carrier frequency and the sideband components. The sound beam can be steered by applying phase shifts to the pulse signals, which can be implemented by relatively inexpensive hardware. A new single sideband modulation is also introduced, where the upper sidebands of two double sideband modulation signals are acoustically cancelled. The proposed approaches for beamforming and single sideband modulation are evaluated by anechoic measurements.

Speakers
NH

Nara Hahn

Institute of Communications Engineering, University of Rostock
JA

Jens Ahrens

Division of Applied Acoustics, Chalmers University of Technology
CA

Carl Andersson

Division of Applied Acoustics, Chalmers University of Technology


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Phoneme Mappings for Online Vocal Percussion Transcription
Vocal Percussion Transcription (VPT) aims at detecting vocal percussion sound events in a beatboxing performance and classifying them into the correct drum instrument class (kick, snare, or hi-hat). To do this in an online (real-time) setting, however, algorithms are forced to classify these events within just a few milliseconds after they are detected. The purpose of this study was to investigate which phoneme-to-instrument mappings are the most robust for online transcription purposes. We used three different evaluation criteria to base our decision upon: frequency of use of phonemes among different performers, spectral similarity to reference drum sounds, and classification separability. With these criteria applied, the recommended mappings would potentially feel natural for performers to articulate while enabling the classification algorithms to achieve the best performance possible. Given the final results, we provided a detailed discussion on which phonemes to choose given different contexts and applications.

Speakers
AL

Alejandro Luezas

Roli / Queen Mary University of London
CS

Charalampos Saitis

Queen Mary University of London
MS

Mark Sandler

Queen Mary University of London


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Response clustering in loudspeaker radiation balloons
The measurement of the radiation balloon of a loudspeaker involves the acquisition of 2664 responses when acquired with 5º resolution in Theta and Phi angles, each response with magnitude and phase at a high number of frequencies which depends on the measurement spectral resolution. This large amount of information causes, many times, that its analysis is limited to certain frequencies and to certain planes (horizontal and vertical polar plots or isobars). In order to help to investigate radiation balloons, unsupervised machine learning data analysis tools have been applied to automatically group the loudspeaker responses that conforms a full balloon measurement according to their similarity, in order to extract meaningful patterns. Similar algorithms have also been applied to reduce the number of involved frequencies, keeping the same radiation information.

Speakers

Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Spatial auditory masking between real sound signals and virtual sound images
In augmented reality (AR) environment, audio signals of real world and virtual world are simultaneously presented to a listener. It is desirable that a virtual sound content and a real sound source do not interfere each other. In order to make it possible, we have examined spatial auditory masking between maskers and maskees, where maskers are real sound signals emitted from loudspeakers, and maskees are virtual sound images, generated by using head related transfer functions (HRTFs), emitted from headphones. Open-ear headphones were used for the experiment, which allow us to listen to the audio content while hearing the environmental sound. The results are very similar to those of the previous experiment [1, 2] where masker and maskee were both real signals emitted from loudspeakers. That is, with a given masker location, masking threshold levels as a function of maskee locations have symmetric property with respect to the frontal plane of a subject. Masking threshold level is, however, lowered than the previous experiment perhaps because of limitation of sound image localization by HRTFs. The results indicate that spatial auditory masking of human hearing occurs with virtually localized sound images in the same way as real sound signals.

Speakers
avatar for Masayuki Nishiguchi

Masayuki Nishiguchi

Professor, Akita Prefectural University
Masayuki Nishiguchi received his B.E., M.S., and Ph.D. degrees from Tokyo Institute of Technology, University of California Santa Barbara, and Tokyo Institute of Technology, in 1981, 1989, and 2006 respectively.  He was with Sony corporation from 1981 to 2015, where he was involved... Read More →
SI

Soma Ishihara

Akita Prefectural University
KW

Kanji Watanabe

Akita Prefectural University
KA

Koji Abe

Akita Prefectural University
ST

Shouichi Takane

Akita Prefectural University


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Synthesizing Reverberation Impulse Responses from Audio Signals: Auto-Reverberation and Interactive Environments
A method for creating reverberation impulse responses from a variety of audio source materials forms the basis of a family of novel reverberation effects. In auto-reverberation, segments of audio are selected and processed to form an evolving sequence of reverberation impulse responses that are applied to the original source material—that is, the audio is reverberating itself. In cross-reverberation, impulse responses derived from one audio track are applied to another audio track. The reverberation impulse responses are formed by summing randomly selected segments of the source audio, and imposing reverberation characteristics, including reverberation time and wet equalization. By controlling the number and timing of the selected source audio segments, the method produces an array of impulse responses that represent a trajectory through the source material. In so doing, the evolving impulse responses will have the character of room reverberation while also expressing the changing timbre and dynamics of the source audio. Processing architectures are described, and off-line and real-time virtual acoustic sound examples derived from the music of Bach and Dick Dale are presented.

Speakers
avatar for Eoin Callery

Eoin Callery

Irish World Academy of Music and Dance, University of Limerick
Eoin Callery is an Irish artist and researcher who develops electroacoustic systems relating to chamber music, performance space augmentation, and sound installation. This often involves exploring acoustic phenomena – especially feedback and virtual acoustics – in live situations... Read More →
JA

Jonathan Abel

CCRMA, Stanford University
KS

Kyle Spratt

Applied Research Laboratories, The University of Texas at Austin


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Teaching Modular Synth & Sound Design Online During COVID-19: Maximizing Learning Outcomes Through Open-source Software and Student-centered Pedagogy
This study introduces an inclusive and innovative online teaching pedagogy in sound design and modular synthesis using open-source software to achieve ideal student-centered learning outcomes and experience during the COVID-19 pandemic. This pedagogy proved to be effective after offering the course, conducting human subject research, and analyzing class evaluation data. The teaching strategies include comprehensive analysis in sound synthesis theory using sample patches, introduction to primary electronics, collaborative learning, hands-on lab experiments, student presentations, and alternative reading assignments in the form of educational videos. Online teaching software solutions were implemented to track student engagement. From a transformative perspective, the authors aim to cultivate student-centered learning, inclusive education, and equal opportunity in higher education in an online classroom setting. The goal is to achieve the same level of engagement as in-person classes, inspire a diverse student body, offer ample technical and mental support, as well as open the possibility of learning sound design through Eurorack modular synthesizers without investing money in expensive hardware. Students’ assignments, midterms, and final projects demonstrated their thorough understanding of the course material, strong motivation, and vibrant creativity. Human subject research was conducted during the course to improve the students’ learning experience and further shape the pedagogy. Three surveys and one-on-one interviews were given to a class of 25 students. The qualitative and quantitative data indicates the satisfaction and effectiveness of this student-centered learning pedagogy. Promoting social interaction and student well-being while teaching challenging topics during challenging times was also achieved.

Speakers
avatar for Jiayue Cecilia Wu

Jiayue Cecilia Wu

Assistant Professor, Graduate Program Director (MSRA), University of Colorado Denver
Originally from Beijing, Dr. Jiayue Cecilia Wu (AKA: 武小慈) is a scholar, composer, audio engineer, and multimedia technologist. Her work focuses on how technology can augment the healing power of music. She earned her Bachelor of Science degree in Design and Engineering in 2000... Read More →
AF

Ashell Fox

University of Colorado Denver


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

The effect of user's hands on mobile device frequency response, Part 1
First results of a study of the effects of the user’s hands on the frequency response and channel balance of a mobile phone hands-free loudspeaker are presented. The results show that the response variation caused by the user’s hands is high (up to 10 dB boost in narrow ranges) and highly user dependent, although general trends can be observed. The variation between users is strong especially above 5 kHz. The acoustical causes for the observed response shape are studied using a FEM model, indicating that especially the shape of the palm explains the observed features of the frequency responses. A conclusion of the results is that developing more realistic measurement methods is needed if more natural tonal balance is attempted in handheld devices.

Speakers
JB

Juha Backman

AAC Technologies
LV

Lauri Veko

AAC Technologies Solutions Finland Oy
YJ

Yuheng Jiang

AAC Technologies Holdings Inc


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

The influence of stage acoustics on the singers' performance and perception: A pilot study
It is known that musicians tend to adjust their performance to the acoustical properties of the hall as they perceive. In a large reverberant hall, for example, they may play staccato notes even shorter than they would in a less reverberant hall to make the music more clearly understandable by the audience. In this study, four singers were invited to sing two (slow and fast) pieces of music in three venues, of which the reverberation times were 0.3, 1.8, and 3.4 seconds. Singers were surveyed with questions regarding the tempo, intonation, resonance and diction of their performance in each venue. Also, the singing voice was recorded by using a headset microphone and analyzed to relate the audio features to the characteristics of the venues. The results showed that the singers’ perception of the vocal resonance was significantly related to the venue (p=0.024), and so were the average sound level and the dynamic range of the sound level (p=0.040 for both dependent variables), which could partly be explained in relation to the reverberation time.

Speakers
KK

Kajornsak Kittimathaveenan

Institute of Music, Science and Engineering, King Mongkut's Institute of Technology Ladkrabang
MP

Munhum Park

Institute of Music, Science and Engineering, King Mongkut's Institute of Technology Ladkrabang


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Tools For Visual Thinking: Teaching Electronic Music
Teaching the history and compositional techniques of electronic music can be challenging because there are few practical resources available for developing course curriculums, and current music styles are constantly changing. Here we explain the benefits of a few assignments that help students connect the analysis of classic Electronic Dance Music (EDM) songs with creating their own compositions that “nail the style.” Creating timeline analyses of classic EDM songs form a visual representation of how the elements of an arrangement develop. Students later use these timeline analyses as visual blueprints for EDM song arrangements that they compose. Critical listening plays a vital role in creating these detailed timeline analyses that encourage self-discovery of each element’s musical characteristics. This work positively influences the composer’s ability to “nail the style.” Pedagogical experiences based on self-discovery offer greater permanence through structured learning.

Speakers
avatar for Graham Spice

Graham Spice

Associate Professor of Music Production and Recording Technology, Shenandoah University


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand

9:00pm EDT

Transducer design considerations for slim TV applications.
With the development of new space-efficient display technologies over the last years has come the trend to overall decrease thickness of consumer electronics such as televisions. This slim form factor creates a challenge for the design of integrated audio systems because it severely limits the physical performance possibilities of any acoustic transducer. Modern DSP and amplifier technologies have been able to utilize the transducer up to its performance limits and thus maintained the audio quality, however this will not be enough if a further thickness reduction is desired. This paper discusses the physical limits of the current designs and suggests a new layout of a moving coil transducer for ultra-slim applications.

Speakers
avatar for Felix C. Kochendörfer

Felix C. Kochendörfer

Samsung Research America
Felix Kochendörfer was born in 1985 in Weimar, Germany. He received a M.Sc. Degree in Acoustics and Signal Processing from Aalborg University, Denmark in 2010 and a Diploma Degree in Electrical Engineering from Dresden University of Technology in 2011. After a short time at Klippel... Read More →


Thursday October 28, 2021 9:00pm - Friday December 3, 2021 6:00pm EST
On-Demand
 
  • Timezone
  • Filter By Date AES Fall Online Convention Oct 11 -31, 2021
  • Filter By Venue Online
  • Filter By Type
  • Acoustics & Psychoacoustics
  • Applications in Audio
  • Archiving & Restoration
  • Audio Builders Workshop
  • Audio for Cinema
  • Broadcast & Online Delivery
  • Diversity & Inclusion
  • E-Briefs on-demand
  • Education
  • Electronic Dance Music
  • Electronic Instrument Design & Applications
  • Game Audio & XR
  • Hip-Hop/R&B
  • Historical
  • Immersive Music
  • Immersive & Spatial Audio
  • Networked Audio
  • Papers on-demand
  • Recording & Production
  • Sound Reinforcement
  • Special Event
  • Tech Tours
  • Technical Committee Meeting
  • Company
  • Subject
  • Area


Filter sessions
Apply filters to sessions.