In this tutorial, we will show some basic building blocks of deep learning, particularly for audio, from the perspective of signal processing. The idea is to show some similarities between familiar signal processing structures and deep learning architectures. For that, we use examples in Python and Pytorch.
We start with best practices for deep learning, then exploring convolutional neural networks as filter banks (analysis and synthesis) and autoencoders as a filter bank-based audio coder, and finally, we discuss recurrent neural networks as IIR (infinite impulse response) filters. This is done using audio examples and Python Pytorch program examples.
Content:
- Best Practices for machine learning in audio
- Specific properties of audio signals and typical features
- Convolutional layers as filter banks
- Autoencoders as Filter bank with optimization
- Variational autoencoder as audio coder with quantization
- Recurrent Neural Networks as Infinite Impulse Response filters
The Jupyter notebook file for the tutorial slides can be found at
github.com/TUIlmenauAMS/AES_Tutorial_2021.