AI Deploy - Billing and lifecycle – Support Guides

Learn about the lifecycle of an AI Deploy app and associated billing.

OVHcloud AI Deploy service provides ease in AI models and application deployments. You can deploy Docker images to HTTP endpoints linked to CPU or GPU resources without the hassle of installing or operating them.

AI Deploy is covered by OVHcloud Public Cloud Special Conditions.

Introduction

AI Deploy is linked to a Public Cloud project. The whole project is billed at the end of the month, with pay-as-you-go. This means you will only pay for what you consume based on the compute resources you use (CPUs and GPUs) and their running time. At this time, we do not support a "pay per call" pricing.

AI Deploy apps lifecycle

OVHcloud AI Deploy allows the deployment of Docker images, and each deployment is called an app. During its lifetime, the app will go through the following statuses:

QUEUED: the app deployment request is about to be processed. First arrived, first deployed.
INITIALIZING: the app is being started, and, if any, the remote data is synchronized from the Object Storage. Please see our Data - Concepts and best practices documentation to learn more about data synchronization.
SCALING: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is increased or decreased.
RUNNING: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to SCALING. However, there is no interruption in service, and the original replica(s) remain accessible during this time.
STANDBY: The app has no running replicas but is ready to scale back up when traffic arrives. This happens after a period with no incoming requests when scaling to 0 is enabled. When traffic arrives, the app transitions from STANDBY to RUNNING through the INITIALIZING and SCALING states.
STOPPING: the app is stopping, and your compute resources are freed. Ephemeral data is deleted.
STOPPED: the app ended normally. You can restart it whenever you want or delete it. It will be the same endpoint.
FAILED: the app ended in error, e.g., the Docker image is invalid (unreachable, built with linux/arm, etc.).
ERROR: the app ended due to a backend error (an issue on the OVHcloud side). You may reach our support.
DELETING: the app is being removed. When it is deleted, you will no longer see it; it will no longer exist.
DELETED: the app is fully deleted.

Billing principles

AI Deploy apps are a pay-per-use solution, with billing based on the consumption of compute resources (CPUs or GPUs). You can select the type and amount of resources you would like to work with, and you will be charged only for the resources consumed during the SCALING and RUNNING phases of your app replicas.

We do not provide pay-per-call pricing so far.

Included resources in AI Deploy are:

AI Deploy managed service (zero infrastructure to manage).
Dedicated CPU/GPU compute resources (based on the selected amount).
Ephemeral storage when the app is running (storage space related to compute resources sizing).
Ingress/Egress network traffic.
Monitoring tool and live metrics (Grafana).

Optional resources not included with AI Deploy are:

Remote storage space, based on OVHcloud Object Storage pricing.
For this optional Object Storage, Egress traffic when communicated outside OVHcloud
Private Docker registry, if any.

Here is a detailed timeline that illustrates every step that is billed or not during the AI Deploy workflow:

Compute resources details

During the app creation, you can select compute resources, known as CPUs or GPUs. Their official pricing is available in the OVHcloud Control Panel or on the OVHcloud Public Cloud website.

Rates for compute are mentioned per hour to facilitate the reading of the prices, but the billing granularity remains per minute.

Once you select the compute resources, you can specify the scaling strategy:

Static scaling: You can specify a fixed number of replicas, starting at one. Please note that with one replica, you will not benefit from high availability.
Auto-scaling: You can specify a minimum and maximum number of replicas and a metric that will act as a trigger for scaling up or down (CPU, RAM usage, or custom metric). Each replica will benefit from the compute resource selected before.

Storage details

Ephemeral local storage

Each compute resource (CPU or GPU) comes with local storage that we can consider ephemeral since this storage space is not saved when you stop or delete an AI Deploy app.

The sizing depends on the selected amount of compute resources; check the details on the OVHcloud Public Cloud website.

This storage space can be used by your Docker image for local operations.

Remote Object Storage

When working with remote data, you pay separately for the storage of this data. The pricing of Object Storage is separate from the app pricing.

Pricing examples

For these examples, we will take a pricing of $0.91 / hour per GPU NVIDIA L4 and $0.04 / hour per CPU. Pricing may vary; please refer to the official pricing page.

Example 1: a GPU app for 10 hours, then deleted

We deploy one AI Deploy app with 2 x GPUs and keep it running for 10 hours before we delete it.

We receive thousands of calls: it's included (no pay-per-call provided; you pay running compute).

compute resources per replica : 2 x GPU NVIDIA L4 ($0.91 / hour)
scaling: fixed
replicas: 1 only
amount of calls: unlimited
duration: 10 hours, then deleted

Price calculation for compute: 10 (hours) x 2 (GPU) x 1 (replica) x $0.91 (price / GPU) = $18.20, billed at the end of the month.

Example 2: multiple AI Deploy apps for 5 hours, then deleted

We start 15 x AI Deploy apps in parallel, each with one vCPU.

We receive thousands of calls: it's included (no pay-per-call provided; you pay running compute).

compute resources per app with fixed scaling: 1 x vCPU ($0.04 /hour /CPU)
scaling: fixed
replica: 1 only
amount of calls: unlimited
duration: 5 hours, then deleted

Price calculation for compute: 15 (app) x 5 (hours) x 1 (CPU) x $0.04 (price / CPU) = $3.00, billed at the end of the month.

Example 3: GPUs and autoscaling

We start 1 x AI Deploy app with autoscaling configured to one replica minimum and three replicas maximum.

We receive thousands of calls: it's included (no pay-per-call provided; you pay running compute).

compute resources per replica: 1 x GPU ($0.91 /hour /GPU)
scaling: auto-scaling, from 1 to 3 replicas
amount of calls: unlimited
duration: 5 hours with one replica running, then a peak with 1 hour at three replicas, then stopped and deleted.

Price calculation for compute will vary over time due to auto-scaling:

1 (app) x 5 (hours) x 1 (replica) x 1 (GPU) x $0.91 (price / GPU) = $4.55 + 1 (app) x 1 (hour) x 3 (replicas) x 1 (GPU) x $0.91 (price / GPU) = $2.73, totaling $7.28, billed at the end of the month.

Go further

For more information and tutorials, please see our other AI & Machine Learning support guides or explore the guides for other OVHcloud products and services.

If you need training or technical assistance to implement our solutions, contact your sales representative or click on this link to get a quote and ask our Professional Services experts for a custom analysis of your project.

Introduction

AI Deploy apps lifecycle

Billing principles

Compute resources details

Storage details

Ephemeral local storage

Remote Object Storage

Pricing examples

Example 1: a GPU app for 10 hours, then deleted

Example 2: multiple AI Deploy apps for 5 hours, then deleted

Example 3: GPUs and autoscaling

Go further

Related articles