AI Training - Tutorial - Compare models with W&B for audio classification task – Support Guides

The purpose of this tutorial is to compare two methods by running two jobs in parallel to classify audio. To see which model is better in terms of accuracy, resource consumption, and training time, we will use the Weights & Biases tool.

Use case

The use case is Spoken Digit Database. It is available on Kaggle.

The database contains spoken digit files from zero to nine that have been recorded by different persons. All contain 1700 speech digits files.

Database License: Attribution 4.0 International (CC BY 4.0)

AI Models

To build these sound classifiers, we will use two methods.

Audio classification based on audio feature

The first method involves creating an Artificial Intelligence (AI) model to classify audio files based on the different features of sounds.

To do this, some data processing is required upstream. Each sound will constitute a line of a csv file thanks to its transformation into 26 parameters calculated by Librosa.

An Artificial Neural Network (ANN) is then built and trained on 100 epochs. It takes the 26 parameters calculated by Librosa as input and returns a probability for each class as output.

Image classification based on spectrograms

The second method is to create an image classification model using the spectrograms of each sound.

The data must be processed beforehand. From each sound, a spectrogram (an image) is generated using the Python module Librosa. A Convolutional Neural Network (CNN) is then built and trained on 100 epochs.

It takes as input the spectrograms whose size is defined and returns as output a probability for each class.

Comparison tool

Two Artificial Intelligence models of different natures are trained to perform the same task: to classify audio recordings of people speaking numbers from zero to nine.

To compare them, the Weights and Biases tool is used. It makes it easy to track and record the performance of deep learning models.

With Weights & Biases, it is possible to build better models faster through experiment tracking, dataset versioning, and model management.

In our case, we will be able to track the evolution of different models based on the values of accuracies and losses. The tool also offers us the possibility to visualize the training times and the consumption of resources (GPU).

To know more about Weights & Biases, please refer to the documentation.

The basic principles for using Weights & Biases can be found here with AI Notebooks.

Requirements

Access to the OVHcloud Control Panel
An AI Training project created inside a Public Cloud project in your OVHcloud account
A user for AI Training
Docker installed on your local computer
Make sure you have a Docker Hub account
Some knowledge about building image and Dockerfile
A Weights & Biases account, you can create it on their website (free for individuals)

Instructions

You will follow different steps to process your data and train your two models.

More detailed data processing in this notebook concerning the classification of marine mammal sounds.
A direct link to the full Python files can be found here.

The tutorial is as follows:

Here we will mainly discuss how to write the data processing and model training codes, the requirements.txt and packages.txt files, and the Dockerfile. If you want to see the whole code, please refer to the GitHub repository.

Clone the GitHub repository

The first thing to do is to clone the GitHub repository.

git clone https://github.com/ovh/ai-training-examples

You can then place yourself in the dedicated directory.

cd ai-training-examples/jobs/weights-and-biases/audio-classification-models-comparaison

Uploading your dataset on Public Cloud Storage

First, download the data on Kaggle.

It's a zip file (audio_files.zip)! We are going to push the unzipped version of it (audio_files) into an object container named spoken-digit.

If you want to upload it from the OVHcloud Control Panel, go to the Object Storage section and create a new object container by clicking Object Storage > Create an object container.

In the OVHcloud Control Panel, you can upload files but not folders. For instance, you can upload a .zip file to optimize the bandwidth, then unzip it later when accessing it through JupyterLab. You can also use the OVHcloud AI CLI to upload files and folders (and be more stable than through your browser).

If you want to run it with the CLI, just follow this guide. You have to choose the region, the name of your container, and the path where your data is located and use the following command:

ovhai bucket object upload <container>@<region> <paths>

You should have:

├── spoken-digit
    └── audio_files.zip
    └── audio_files
        └── zero
        └── one
        └── two
        └── three
        └── four
        └── five
        └── six
        └── seven
        └── eight
        └── nine

Write the data processing Python files

For the data processing part, we distinguish two Python files.

Audio to csv file with features extraction

The first Python file is called data-processing-audio-files-csv.py. It allows to transform all the sounds into Librosa parameters and to make a csv file.

Refer to the comments of the code for more information.

The head of the csv file:

Audio to spectrogram with image generation

The first Python file is called data-processing-audio-files-spectrograms.py. It allows you to obtain a spectrogram (an image) corresponding to each sound.

Refer to the comments of the code for more information.

A sample spectrogram:

Once the processing of the data is complete, the AI models must be built.

Write the models training Python files

For the models training part, we distinguish two Python files:

train-classification-audio_files_csv.py
train-image-classification-audio-files-spectrograms.py

NOTE: About the WANDB API KEY: Please, make sure to replace MY_WANDB_API_KEY by yours in the two Python files for training.

ANN for audio classification based on sounds feature

An Artificial Neural Network is built to classify audio based on their features.

It takes as input the 26 Librosa parameters previously normalized.

The model returns as output a score between 0 and 1 for each class through a softmax activation function. The class with the highest score is likely to be the one corresponding to the pronounced number.

Refer to the comments of the code for more information.

CNN for image classification based on spectrograms

A Convolutional Neural Network is constructed to classify images that are spectrograms.

The advantage of using CNNs is their ability to develop an internal representation of a two-dimensional image. This allows the model to learn the position and scale in the different data structures, which is important when working with images.

It takes as input the spectrograms previously processed by the Keras data generator for image classification.

As previously, the model returns as output a score between 0 and 1 for each class through a softmax activation function.

Refer to the comments of the code for more information.

NOTE: To be able to look at and compare the performance of our two models, the metrics observed must be the same.

The accuracy will allow us to measure the precision of our model.

sparse_categorical_crossentropy or categorical_crossentropy allow us to measure the loss.

Write the requirements.txt and packages.txt files

The requirements.txt file will allow us to write all the modules needed to make our application work.

numpy>=1.17,<1.24
matplotlib==3.5.2
pandas==1.4.3
split-folders==0.5.1
opencv-python-headless==4.5.5.64
librosa==0.8.0
tensorflow==2.9.1
wandb==0.12.21

The packages.txt file will allow us to install and use the Librosa module and its dependencies.

libsndfile1-dev

These files will be useful when writing the Dockerfile.

To prevent runtime errors during dataset generation

Please update the following scripts with the appropriate os.makedirs(..., exist_ok=True) lines:

data-processing-audio-files-csv.py

At the top of the createDataframe() function, add:

os.makedirs("/workspace/data/csv_files", exist_ok=True)

This ensures that the directory for the output CSV (/workspace/data/csv_files/) exists before the script tries to write the data_3_sec.csv file.

data-processing-audio-files-spectrograms.py

Inside the createSpectrograms() function, replace the line os.mkdir(spectrogram_path / fold) with:

os.makedirs(spectrogram_path / fold, exist_ok=True)

This creates both the parent /workspace/data/spectrograms/ folder and the digit subfolder as needed, and avoids errors if the folder already exists.

Write the Dockerfile for the application

Your Dockerfile should start with the FROM instruction indicating the parent image to use. In our case, we choose to start from a python:3.9 image:

FROM python:3.9

Create the home directory and add your files to it:

WORKDIR /workspace
ADD . /workspace

Install the packages.txt file which contains your needed Python modules using a apt-get install ... command:

RUN apt-get update
RUN xargs -a packages.txt apt-get install --yes

Install the requirements.txt file which contains your needed Python modules using a pip install ... command:

RUN pip install --no-cache-dir -r requirements.txt

Give correct access rights to the OVHcloud user (42420:42420):

NOTE: Don't forget the --user=42420:42420 argument if you want to simulate the same behavior that will occur on AI Training jobs. It executes the Docker container as the specific OVHcloud user (user 42420:42420).

RUN chown -R 42420:42420 /workspace
ENV HOME=/workspace

NOTE: Here we don't specify a command (CMD) to be run by default since we will do it directly in the AI Training job.

Build the Docker image from the Dockerfile

Launch the following command from the Dockerfile directory to build your application image:

NOTE: Remember to replace <your-docker-id> with yours.

# Build the image using your machine's default architecture
docker build . -t <your-docker-id>/audio-classification-models:latest

# Build image targeting the limux/amd64 architecture
docker buildx -- platform linux/amd64 -t <your-docker-id>/audio-classification-models:latest .

The first command builds the image using your system’s default architecture. This may work if your machine already uses the linux/amd64 architecture, which is required to run containers with our AI products. However, on systems with a different architecture (e.g. ARM64 on Apple Silicon), the resulting image will not be compatible and cannot be deployed.
The second command explicitly targets the linux/AMD64 architecture to ensure compatibility with our AI services. This requires buildx, which is not installed by default. If you haven’t used buildx before, you can install it by running: docker buildx install

The dot argument . indicates that your build context (place of the Dockerfile and other needed files) is the current directory.

The -t argument allows you to choose the identifier to give to your image. Usually, image identifiers are composed of a name and a version tag <name>:<version>. For this example, we chose audio-classification-models:latest.

Push the image into your Docker Hub

NOTE: To know more about the Docker Hub, click here.

docker push <your-docker-id>/audio-classification-models:latest

Launch the jobs

Here we will use the ovhai CLI. If you wish to do this from the OVHcloud Control Panel, refer to this documentation.

Jobs are launched in two stages. First, the data processing jobs are launched. Once they are Done, the training jobs can be executed.

To find out more about how jobs work and their status, check this documentation.

Data processing

Audio to csv file with features extraction:

To run this job, you need to plug in a volume containing your sounds. Once the job is in Done status, your csv file will be synchronized to your Object Storage.

--volume <my-data>@<region>/:/workspace/data:RW:cache is the volume attached for storing data. This volume is read/write (RW) because the csv file will be created and saved.

ovhai job run <your-docker-id>/audio-classification-models:latest \
      --cpu 12 \
      --volume <my-data>@<region>/:/workspace/data:RW:cache \
      -- bash -c 'python data-processing/data-processing-audio-files-csv.py'

Audio to spectrogram with image generation:

To run this job, you need to plug in a volume containing your sounds. Once the job is in Done status, your csv file will be synchronized to your Object Storage.

--volume <my-data>@<region>/:/workspace/data:RW:cache is the volume attached for storing data. This volume is read/write (RW) because the spectrograms will be created and saved.

ovhai job run <your-docker-id>/audio-classification-models:latest \
      --cpu 12 \
      --volume <my-data>@<region>/:/workspace/data:RW:cache \
      -- bash -c 'python data-processing/data-processing-audio-files-spectrograms.py'

Here, the Python modules and dependencies are not suitable for use with GPUs.

However, these steps take time. So we use as many CPUs as possible (12).

At the end of the data processing, your Object Storage container should be as follows:

├── spoken-digit
    └── audio_files.zip
    └── audio_files
        └── zero
        └── one
        └── ...
        └── nine
    └── csv_files
        └── data_3_sec.csv
    └── spectrograms
        └── zero
        └── one
        └── ...
        └── nine
    └── spectrograms_split
        └── train
            └── zero
            └── one
            └── ...
            └── nine
        └── val
            └── zero
            └── one
            └── ...
            └── nine

To get the status of your jobs, run the following command:

ovhai job get <job-id>

Once your data has been pre-processed and both jobs are in Done status, you will be able to start your two training jobs.

Models training

ANN for audio classification based on audio feature:

To run this job, you need to plug in a volume containing your sounds. Once the job is in Done status, your csv file will be synchronized to your Object Storage.

--volume <my-data>@<region>/:/workspace/data:RO:cache is the volume attached for storing data. This volume is read/write (RO) because the csv file will only be read.

ovhai job run <your-docker-id>/audio-classification-models:latest \
      --gpu 1 \
      --volume <my-data>@<region>/:/workspace/data:RO:cache \
      -- bash -c 'python models-training/train-classification-audio_files_csv.py'

CNN for image classification based on spectrograms:

To run this job, you need to plug in a volume containing your sounds. Once the job is in Done status, your csv file will be synchronized to your Object Storage.

--volume <my-data>@<region>/:/workspace/data:RO:cache is the volume attached for storing data. This volume is read/write (RO) because the spectrograms data will only be read.

ovhai job run <your-docker-id>/audio-classification-models:latest \
      --gpu 1 \
      --volume <my-data>@<region>/:/workspace/data:RO:cache \
      -- bash -c 'python models-training/train-image-classification-audio-files-spectrograms.py'

Consider adding the --unsecure-http attribute if you want your application to be reachable without any authentication.

You can now compare your models with Weights & Biases.

Compare with Weights & Biases

You will be able to check your model's training once your jobs are in running status. Run the following command:

ovhai job get <job-id>

Once the jobs are in running status, you can check the logs to obtain the Weight & Biases link. Run the command:

ovhai job logs <job-id>

Now, you can access the Weights & Biases panel. You will be able to check the accuracy and the loss values for the training and the validation sets.

Training data:

Accuracy:

Loss:

Validation data:

Accuracy:

Loss:

You can then observe which model is better in terms of speed, accuracy, or resource consumption...

In this case, we see that the model classifying the spectrograms is better in terms of accuracy and loss on the validation set.

However, it takes longer to train and consumes more computing resources.

Go further

Do you want to know how to build and use custom Docker image with AI Training? Here it is.

For more information and tutorials, please see our other AI & Machine Learning support guides or explore the guides for other OVHcloud products and services.

If you need training or technical assistance to implement our solutions, contact your sales representative or click on this link to get a quote and ask our Professional Services experts for a custom analysis of your project.

Use case

AI Models

Audio classification based on audio feature

Image classification based on spectrograms

Comparison tool

Requirements

Instructions

Clone the GitHub repository

Uploading your dataset on Public Cloud Storage

Write the data processing Python files

Audio to csv file with features extraction

Audio to spectrogram with image generation

Write the models training Python files

ANN for audio classification based on sounds feature

CNN for image classification based on spectrograms

Write the requirements.txt and packages.txt files

To prevent runtime errors during dataset generation

Write the Dockerfile for the application

Build the Docker image from the Dockerfile

Push the image into your Docker Hub

Launch the jobs

Data processing

Models training

Compare with Weights & Biases

Go further

Related articles