Learn how to initialize AI Training and submit jobs through the OVHcloud Control Panel.
Requirements
- A Public Cloud project
- (optional) Container objects to attach data to the job in Step 7, see our Create data container guide.
- Access to the OVHcloud Control Panel
Instructions
Step 1 - Going to the AI Training menu
Log in to the OVHcloud Control Panel and navigate to the AI Training
section via the Public Cloud
menu.
Step 2 - Starting a job submission
You don't need a user to launch a new job from the OVHcloud Control Panel, but you will need one later if you want to use the CLI or access the URLs of your jobs. Instructions for creating new users are described here.
From the AI Training home page, click Create my first job
to create your first job.
If you have an existing job, you can start a new one by clicking the + Launch a job
button.
Job name
First, choose a name for your AI Training job, or accept the automatically generated name if it meets your needs, to make it easier to manage all your jobs.
Location
Each job is executed in an OVHcloud region. Each region has its own AI Training cluster with potentially varying capabilities. For more information, see the capabilities.
Resources
In this step, you can select the number of GPUs or CPUs (not both) you need for your training workload.
The maximum number of GPUs or CPUs you can select for your job is region-dependent. If you choose a GPU, a fixed ratio of CPU is applied based on the number of GPUs. Similarly, there is a fixed ratio of memory based on the number of CPUs. For more information, see the capabilities.
Once the amount of resources is set, you can see a preview of the billing rate.
Docker image
A job is basically a Docker container that is run within the OVHcloud infrastructure. You need to provide a Docker image to be executed. There are several options you can choose from:
OVHcloud provides a set of images from which you can choose to ease the submission of your first jobs. Provided images are essentially a JupyterLab environment bundled with some Deep Learning technology such as Tensorflow or MXNet.
Preset images cannot cover all your needs so you can specify your own image if necessary. You can use any image that is accessible from AI Training.
This includes public images (e.g., Dockerhub), images within the shared registry, or images in your added private registry. For more information, see how to add a private registry.
Once your image is chosen, click the add +
button.
Confidentiality
Choose Restricted or Public access.
Advanced configuration
Orders
The Docker image you chose above includes an entrypoint for your container. You can override this entrypoint by specifying your own command. Once the entrypoint is set up click the add +
button.
Volumes
You can attach data objects to your job, either as input for your training workload or as output for your results (e.g., model weights).
Before attaching a data object, you need to create one. A data object cannot be attached to a running job.
To attach a data object, just select from the list on the left. Next to each data object, within the parentheses, you can check the mount path in the Docker container for the submitted job. If you wish to customize this mount path, you will need to use the ovhai
CLI, its installation procedure is available here.
NOTE: To attach a data object you must click on the add +
button after filling in the fields.
SSH Public Keys
Here, you have the option to attach a Public SSH key to your training job.
Review and launch your AI Training job
In the final step, you get an overview of the job you configured before submission. You also get the equivalent command to use with the ovhai
CLI.
The AI Training service is mainly supposed to be used through the ovhai
CLI. The OVHcloud Control Panel only offers a subset of the features and is meant to help you get started before using the CLI. Discover how to install the OVHcloud AI CLI.
Finally, click Order now
to submit your job to the cluster.
NOTE: A job will run indefinitely until completion or manual interruption.
Step 3 - Consulting your job
Once the job is submitted, you are redirected to the jobs list page.
From this list, you can access your job details either by clicking on its ID
or by clicking on the more options ...
button and selecting Manage
.
The details include several components:
-
Access: Provides you with the URL to access any service exposed by your job on the port
8080
. The URL is of the formhttps://<JOB-ID>.job.<REGION>.ai.cloud.ovh.us/
. If the service is not exposed on port8080
it is still accessible by specifying the port in the URL this way:https://<JOB-ID>-<PORT>.job.<REGION>.ai.cloud.ovh.us/
. You can check the list of available ports in the capabilities. In this panel you can also view, add, and delete labels for your job. - Lifecycle: A timeline of job statuses and operating time.
- Resources: A summary of the resources consumed by the job.
- Support & Billing: available actions
-
Configuration: Your job ID and
Delete job
button. - CLI: The CLI commands to relaunch that job.
Step 4 - Canceling your job
If you are done using your job, if your model converged prematurely, or if you just wish to interrupt your job, you can do so from the jobs list.
From the list of jobs, you can list the available actions at the far right of each entry and interrupt the job by clicking Stop
. Alternatively, from the job details, you can also interrupt the job from the list of actions.
Go further
For more information and tutorials, please see our other AI & Machine Learning support guides or explore the guides for other OVHcloud products and services.
If you need training or technical assistance to implement our solutions, contact your sales representative or click on this link to get a quote and ask our Professional Services experts for a custom analysis of your project.