Exploring Audio Datasets with Python

Image datasets can be explored easily. Even if we have hundreds of images, we can scroll through directories and glance at the data. This way, we can quickly notice interesting properties

Exploring Audio Datasets with Python

Image datasets can be explored easily. Even if we have hundreds of images, we can scroll through directories and glance at the data. This way, we can quickly notice interesting properties: Colours, locations, time of the day. However, would we use the same strategy for audio datasets, we would not get far. Instead, we had to listen or skip through a single file at a time. For a large number of samples, this approach is prohibitive.

A solution is to visualize the audio files. Rather than listening to the samples sequentially, we plot different characteristics. The most basic plot is the waveform, which is shown in the above feature image.

We can see that the samples in the first row all show similar patterns. Also, for the second row, the sound seems to be concentrated in the beginning. Then, switching to a spectrogram representation, we can examine the samples further:

A visualization of the spectrogram for several audio samples.

On the y-axis, we see the frequency, and on the x-axis, we see the time. Finally, the colour holds the last information: The brighter, the more energy the area has. For our second row, the energy seems to be concentrated in the first 2, 3 seconds.

Then, in the next step, we take one sample and visualize several transformations at the same time, as shown below:

A combined view of different visualization techniques.

Doing all this manually over and over is tedious. But, thanks to python and streamlit, we can develop a simple GUI, which is rendered in the browser. You can try it live here.

The GUI for exploring audio datasets. You can try it live here and find the code on GitHub here.

This script lets us select a visualization type on the sidebar and then visualizes a few samples per class on the main screen. For the audio samples shown above, I have used a subset of the popular ESC-50 dataset. The main logic of the script, which was greatly improved by Akhil Vasvani, is shown below:

We import all necessary packages and the python scripts that are responsible for creating different kinds of visualizations. From line 11 on, we construct the sidebar that you see above. In this sidebar, we create a radio button with seven different visualization options and add a checkbox to show the file names to the plots optionally. This option is helpful if we spot a sample of particular interest and want to examine it further. From line 24 on, we iterate over our radio button and create the selected visualizations.

If you want to use the script for your audio datasets, you have to adopt the get_dir_overview() method in the file audio_loading_utils. It creates a label->samples mapping, which is used to select files to visualize later on.

All in all, the script is kept relatively simple, with many repetitive functions. If you want to add your own visualizations, you can find the complete code in this GitHub repository.

Now that we have a script and neat GUI to explore an audio dataset, what can we use it for?

  • If we split our audio into segments, we could determine the segment length: It has to be long enough to capture relevant characteristics.
  • If we work with spectrograms, we can evaluate the scaling (linear, log, Mel) or the number of bins (for the Mel).
  • Further, if we work with MFCCs, we can examine the influence of the number of coefficients.

Last but not least, we could expand the script by:

  • Adding more visualizations (have a look at a selection of possible features here)
  • Making parameters selectable (FFT size, etc.)
  • Visualizing augmentations (see here and here for a starting point if you are interested)