The purpose of this tutorial is to compare two methods by running two jobs in parallel to classify audio. To see which model is better in terms of accuracy, resource consumption, and training time, we will use the Weights & Biases tool.
Use case
The use case is Spoken Digit Database. It is available on Kaggle.
The database contains spoken digit files from zero to nine that have been recorded by different persons. All contain 1700 speech digits files.
Database License: Attribution 4.0 International (CC BY 4.0)
AI Models
To build these sound classifiers, we will use two methods.
Audio classification based on audio feature
The first method involves creating an Artificial Intelligence (AI) model to classify audio files based on the different features of sounds.
To do this, some data processing is required upstream. Each sound will constitute a line of a csv
file thanks to its transformation into 26 parameters calculated by Librosa.
An Artificial Neural Network (ANN) is then built and trained on 100 epochs. It takes the 26 parameters calculated by Librosa as input and returns a probability for each class as output.
Image classification based on spectrograms
The second method is to create an image classification model using the spectrograms of each sound.
The data must be processed beforehand. From each sound, a spectrogram (an image) is generated using the Python module Librosa. A Convolutional Neural Network (CNN) is then built and trained on 100 epochs.
It takes as input the spectrograms whose size is defined and returns as output a probability for each class.
Comparison tool
Two Artificial Intelligence models of different natures are trained to perform the same task: to classify audio recordings of people speaking numbers from zero to nine.
To compare them, the Weights and Biases tool is used. It makes it easy to track and record the performance of deep learning models.
With Weights & Biases, it is possible to build better models faster through experiment tracking, dataset versioning, and model management.
In our case, we will be able to track the evolution of different models based on the values of accuracies and losses. The tool also offers us the possibility to visualize the training times and the consumption of resources (GPU
).
To know more about Weights & Biases, please refer to the documentation.
The basic principles for using Weights & Biases can be found here with AI Notebooks.
Requirements
- Access to the OVHcloud Control Panel
- An AI Training project created inside a Public Cloud project in your OVHcloud account
- A user for AI Training
- Docker installed on your local computer
- Make sure you have a Docker Hub account
- Some knowledge about building image and Dockerfile
- A Weights & Biases account, you can create it on their website (free for individuals)
Instructions
You will follow different steps to process your data and train your two models.
- More detailed data processing in this notebook concerning the classification of marine mammal sounds.
- A direct link to the full Python files can be found here.
The tutorial is as follows:
Here we will mainly discuss how to write the data processing and model training codes, the requirements.txt
and packages.txt
files, and the Dockerfile
. If you want to see the whole code, please refer to the GitHub repository.
Clone the GitHub repository
The first thing to do is to clone the GitHub repository.
You can then place yourself in the dedicated directory.
Uploading your dataset on Public Cloud Storage
First, download the data on Kaggle.
It's a zip file (audio_files.zip
)! We are going to push the unzipped version of it (audio_files
) into an object container named spoken-digit
.
If you want to upload it from the OVHcloud Control Panel, go to the Object Storage section and create a new object container by clicking Object Storage
> Create an object container
.
In the OVHcloud Control Panel, you can upload files but not folders. For instance, you can upload a .zip file to optimize the bandwidth, then unzip it later when accessing it through JupyterLab. You can also use the OVHcloud AI CLI to upload files and folders (and be more stable than through your browser).
If you want to run it with the CLI, just follow this guide. You have to choose the region, the name of your container, and the path where your data is located and use the following command:
You should have:
Write the data processing Python files
For the data processing part, we distinguish two Python files.
Audio to csv file with features extraction
The first Python file is called data-processing-audio-files-csv.py
. It allows to transform all the sounds into Librosa parameters and to make a csv
file.
Refer to the comments of the code for more information.
The head of the csv
file:
Audio to spectrogram with image generation
The first Python file is called data-processing-audio-files-spectrograms.py
. It allows you to obtain a spectrogram (an image) corresponding to each sound.
Refer to the comments of the code for more information.
A sample spectrogram:
Once the processing of the data is complete, the AI models must be built.
Write the models training Python files
For the models training part, we distinguish two Python files:
train-classification-audio_files_csv.py
train-image-classification-audio-files-spectrograms.py
NOTE: About the WANDB API KEY: Please, make sure to replace MY_WANDB_API_KEY
by yours in the two Python files for training.
ANN for audio classification based on sounds feature
An Artificial Neural Network is built to classify audio based on their features.
It takes as input the 26 Librosa parameters previously normalized.
The model returns as output a score between 0 and 1 for each class through a softmax
activation function. The class with the highest score is likely to be the one corresponding to the pronounced number.
Refer to the comments of the code for more information.
CNN for image classification based on spectrograms
A Convolutional Neural Network is constructed to classify images that are spectrograms.
The advantage of using CNNs is their ability to develop an internal representation of a two-dimensional image. This allows the model to learn the position and scale in the different data structures, which is important when working with images.
It takes as input the spectrograms previously processed by the Keras data generator for image classification.
As previously, the model returns as output a score between 0 and 1 for each class through a softmax
activation function.
Refer to the comments of the code for more information.
NOTE: To be able to look at and compare the performance of our two models, the metrics observed must be the same.
The accuracy
will allow us to measure the precision of our model.
sparse_categorical_crossentropy
or categorical_crossentropy
allow us to measure the loss.
Write the requirements.txt and packages.txt files
The requirements.txt
file will allow us to write all the modules needed to make our application work.
The packages.txt
file will allow us to install and use the Librosa module and its dependencies.
These files will be useful when writing the Dockerfile
.
To prevent runtime errors during dataset generation
Please update the following scripts with the appropriate os.makedirs(...,
exist_ok=True)
lines:
data-processing-audio-files-csv.py
At the top of the createDataframe() function, add:
os.makedirs("/workspace/data/csv_files", exist_ok=True)
This ensures that the directory for the output CSV (/workspace/data/csv_files/
) exists before the script tries to write the data_3_sec.csv
file.
data-processing-audio-files-spectrograms.py
Inside the createSpectrograms() function, replace the line os.mkdir(spectrogram_path
/ fold)
with:
os.makedirs(spectrogram_path / fold, exist_ok=True)
This creates both the parent /workspace/data/spectrograms/
folder and the digit subfolder as needed, and avoids errors if the folder already exists.
Write the Dockerfile for the application
Your Dockerfile should start with the FROM
instruction indicating the parent image to use. In our case, we choose to start from a python:3.9
image:
Create the home directory and add your files to it:
Install the packages.txt
file which contains your needed Python modules using a apt-get install ...
command:
Install the requirements.txt
file which contains your needed Python modules using a pip install ...
command:
Give correct access rights to the OVHcloud user (42420:42420):
NOTE: Don't forget the --user=42420:42420
argument if you want to simulate the same behavior that will occur on AI Training jobs. It executes the Docker container as the specific OVHcloud user (user 42420:42420).
NOTE: Here we don't specify a command (CMD
) to be run by default since we will do it directly in the AI Training job.
Build the Docker image from the Dockerfile
Launch the following command from the Dockerfile directory to build your application image:
NOTE: Remember to replace <your-docker-id>
with yours.
-
The first command builds the image using your system’s default architecture. This may work if your machine already uses the
linux/amd64
architecture, which is required to run containers with our AI products. However, on systems with a different architecture (e.g.ARM64
onApple Silicon
), the resulting image will not be compatible and cannot be deployed. -
The second command explicitly targets the
linux/AMD64
architecture to ensure compatibility with our AI services. This requiresbuildx
, which is not installed by default. If you haven’t usedbuildx
before, you can install it by running:docker buildx install
The dot argument .
indicates that your build context (place of the Dockerfile and other needed files) is the current directory.
The -t
argument allows you to choose the identifier to give to your image. Usually, image identifiers are composed of a name and a version tag <name>:<version>
. For this example, we chose audio-classification-models:latest.
Push the image into your Docker Hub
NOTE: To know more about the Docker Hub, click here.
Launch the jobs
Here we will use the ovhai CLI. If you wish to do this from the OVHcloud Control Panel, refer to this documentation.
Jobs are launched in two stages. First, the data processing jobs are launched. Once they are Done
, the training jobs can be executed.
To find out more about how jobs work and their status, check this documentation.
Data processing
- Audio to
csv
file with features extraction:
To run this job, you need to plug in a volume containing your sounds. Once the job is in Done
status, your csv
file will be synchronized to your Object Storage.
--volume <my-data>@<region>/:/workspace/data:RW:cache
is the volume attached for storing data. This volume is read/write (RW
) because the csv
file will be created and saved.
- Audio to spectrogram with image generation:
To run this job, you need to plug in a volume containing your sounds. Once the job is in Done
status, your csv
file will be synchronized to your Object Storage.
--volume <my-data>@<region>/:/workspace/data:RW:cache
is the volume attached for storing data. This volume is read/write (RW
) because the spectrograms
will be created and saved.
Here, the Python modules and dependencies are not suitable for use with GPUs
.
However, these steps take time. So we use as many CPUs
as possible (12).
At the end of the data processing, your Object Storage container should be as follows:
To get the status of your jobs, run the following command:
Once your data has been pre-processed and both jobs are in Done
status, you will be able to start your two training jobs.
Models training
- ANN for audio classification based on audio feature:
To run this job, you need to plug in a volume containing your sounds. Once the job is in Done
status, your csv
file will be synchronized to your Object Storage.
--volume <my-data>@<region>/:/workspace/data:RO:cache
is the volume attached for storing data. This volume is read/write (RO
) because the csv
file will only be read.
- CNN for image classification based on spectrograms:
To run this job, you need to plug in a volume containing your sounds. Once the job is in Done
status, your csv
file will be synchronized to your Object Storage.
--volume <my-data>@<region>/:/workspace/data:RO:cache
is the volume attached for storing data. This volume is read/write (RO
) because the spectrograms
data will only be read.
Consider adding the --unsecure-http
attribute if you want your application to be reachable without any authentication.
You can now compare your models with Weights & Biases.
Compare with Weights & Biases
You will be able to check your model's training once your jobs are in running status. Run the following command:
Once the jobs are in running status, you can check the logs to obtain the Weight & Biases link. Run the command:
Now, you can access the Weights & Biases panel. You will be able to check the accuracy and the loss values for the training and the validation sets.
- Training data:
Accuracy:
Loss:
- Validation data:
Accuracy:
Loss:
You can then observe which model is better in terms of speed, accuracy, or resource consumption...
In this case, we see that the model classifying the spectrograms is better in terms of accuracy and loss on the validation set.
However, it takes longer to train and consumes more computing resources.
Go further
- Do you want to know how to build and use custom Docker image with AI Training? Here it is.
For more information and tutorials, please see our other AI & Machine Learning support guides or explore the guides for other OVHcloud products and services.
If you need training or technical assistance to implement our solutions, contact your sales representative or click on this link to get a quote and ask our Professional Services experts for a custom analysis of your project.