Learn about the lifecycle of an AI Deploy app
and associated billing.
OVHcloud AI Deploy service provides ease in AI models and application deployments. You can deploy Docker images to HTTP endpoints linked to CPU or GPU resources without the hassle of installing or operating them.
AI Deploy is covered by OVHcloud Public Cloud Special Conditions.
Introduction
AI Deploy is linked to a Public Cloud project. The whole project is billed at the end of the month, with pay-as-you-go. This means you will only pay for what you consume based on the compute resources you use (CPUs and GPUs) and their running time. At this time, we do not support a "pay per call" pricing.
AI Deploy apps lifecycle
OVHcloud AI Deploy allows the deployment of Docker images and each deployment is called an app
. During its lifetime, the app will go through the following statuses:
-
QUEUED
: the app deployment request is about to be processed. First arrived, first deployed. -
INITIALIZING
: the app is being started, and, if any, the remote data is synchronized from the Object Storage. Please see our Data - Concepts and best practices documentation to learn more about data synchronization. -
SCALING
: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is increased or decreased. -
RUNNING
: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back toSCALING
. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -
STOPPING
: the app is stopping, and your compute resources are freed. Ephemeral data is deleted. If any, remote data is synchronized back to the Object Storage. -
STOPPED
: the app ended normally. You can restart it whenever you want or delete it. It will be the same endpoint. -
FAILED
: the app ended in error, e.g., the Docker image is invalid (unreachable, built with linux/arm, etc.). -
ERROR
: the app ended due to a backend error (an issue on the OVHcloud side). You may reach our support. -
DELETING
: the app is being removed. When it is deleted, you will no longer see it; it will no longer exist. -
DELETED
: the app is fully deleted.
Billing principles
AI Deploy apps are a pay-per-use solution, with billing based on the consumption of compute resources (CPUs or GPUs). You can select the type and amount of resources you would like to work with and will be charged only for the resources consumed during the SCALING
and RUNNING
phases of your app replicas.
We do not provide pay-per-call pricing so far.
Included resources in AI Deploy are:
- AI Deploy managed service (zero infrastructure to manage).
- Dedicated CPU/GPU compute resources (based on the selected amount).
- Ephemeral storage when the app is running (storage space related to compute resources sizing).
- Ingress/Egress network traffic.
- Monitoring tool and live metrics (Grafana).
Optional resources not included with AI Deploy are:
- Remote storage space, based on OVHcloud Object Storage pricing.
- For this optional Object Storage, Egress traffic when communicated outside OVHcloud
- Private Docker registry, if any.
Here is a detailed timeline that illustrates every step that is billed or not during the AI Deploy workflow:
Compute resources details
During the app creation, you can select compute resources, known as CPUs or GPUs. Their official pricing is available in the OVHcloud Control Panel or on the OVHcloud Public Cloud website.
Rates for compute are mentioned per hour to facilitate the reading of the prices, but the billing granularity remains per minute.
Once you select the compute resources, you can specify the scaling strategy:
- Static scaling: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high availability.
- Auto-scaling: you can specify a minimum and maximum number of replicas and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before.
Storage details
Ephemeral local storage
Each compute resource (CPU or GPU) comes with local storage that we can consider ephemeral since this storage space is not saved when you stop or delete an AI Deploy app.
The sizing depends on the selected amount of compute resources, check the details on the OVHcloud Public Cloud website.
This storage space can be used by your Docker image for local operations.
Remote Object Storage
When working with remote data, you pay separately for the storage of this data. The pricing of Object Storage is separate from the app pricing.
Pricing examples
For these examples, we will take a pricing of $0.91 / hour per GPU NVIDIA L4 and $0.04 / hour per CPU. Pricing may vary, please refer to the official pricing page.
Example 1: a GPU app for 10 hours, then deleted
We deploy one AI Deploy app with 2 x GPUs and keep it running for 10 hours before we delete it.
We receive thousands of calls: it's included (no pay-per-call provided; you pay running compute).
- compute resources per replica : 2 x GPU NVIDIA L4 ($0.91 / hour)
- scaling: fixed
- replicas: 1 only
- amount of calls: unlimited
- duration: 10 hours, then deleted
Price calculation for compute: 10 (hours) x 2 (GPU) x 1 (replica) x $0.91 (price / GPU) = $18.20, billed at the end of the month.
Example 2: multiple AI Deploy apps for 5 hours, then deleted
We start 15 x AI Deploy apps in parallel, each with one vCPU.
We receive thousands of calls: it's included (no pay-per-call provided; you pay running compute).
- compute resources per app with fixed scaling: 1 x vCPU ($0.04 /hour /CPU)
- scaling: fixed
- replica: 1 only
- amount of calls: unlimited
- duration: 5 hours, then deleted
Price calculation for compute: 15 (app) x 5 (hours) x 1 (CPU) x $0.04 (price / CPU) = $3.00, billed at the end of the month.
Example 3: GPUs and autoscaling
We start 1 x AI Deploy app with autoscaling configured to one replica minimum and three replicas maximum.
We receive thousands of calls: it's included (no pay-per-call provided; you pay running compute).
- compute resources per replica: 1 x GPU ($0.91 /hour /GPU)
- scaling: auto-scaling, from 1 to 3 replicas
- amount of calls: unlimited
- duration: 5 hours with one replica running, then a peak with 1 hour at three replicas, then stopped and deleted.
Price calculation for compute will vary over time due to auto-scaling:
1 (app) x 5 (hours) x 1 (replica) x 1 (GPU) x $0.91 (price / GPU) = $4.55 + 1 (app) x 1 (hour) x 3 (replicas) x 1 (GPU) x $0.91 (price / GPU) = $2.73, totaling $7.28, billed at the end of the month.
Go further
For more information and tutorials, please see our other AI & Machine Learning support guides or explore the guides for other OVHcloud products and services.
If you need training or technical assistance to implement our solutions, contact your sales representative or click on this link to get a quote and ask our Professional Services experts for a custom analysis of your project.