Learn how we handle the resilience of the OKMS infrastructure used for OVHcloud KMS (Key Management Service) and Secret Manager.
Instructions
The OKMS architecture has three main objectives:
- Confidentiality: Assure that no one except you can access your key.
- Availability: Offering a high level of resilience and, therefore, high availability.
- Integrity: Making sure that keys cannot be lost or altered.
Access Management
Access to the keys is controlled by the OVHcloud IAM. Only the users allowed by an IAM policy can manage the keys or use them to encrypt or sign data.
Even the OVHcloud employees cannot access your keys.
OKMS architecture
Each OKMS region is fully independent from the others and uses dedicated hosts.
The architecture of a single-AZ region is based on two zones located in distinct buildings within one or more datacenters of the same region, where the servers are spread.
To increase resilience in 1-AZ regions, a database replica server is deployed in a distinct nearby region. Replication to the remote region may take a few seconds longer than replication to the main region.
OKMS components location
Each OKMS Region consists of several hosts in a single OVHcloud Region.
These hosts are partitioned into two different zones so that any single hardware failure is as unlikely as possible to take out both zones at once.
Data resilience
- DB Replication
The OKMS will not return a success status for write operations (e.g., creation or import of key material) unless the data has been successfully replicated to at least two database hosts (the primary and the synchronous replica). This is to ensure that if one of the database hosts is lost, no data will be lost.
An auto-failover mechanism is also in place to automatically reassign the database hosts' roles in case the current primary or synchronous replica becomes unavailable. This means that if any of the three database hosts becomes unavailable, there will be no service interruption, except during the short failover phase (approximately one minute).
However, if two zones or two database hosts become unavailable simultaneously, the OKMS will switch to read-only mode, and write operations will fail (creation of new keys, secrets management, metadata updates, etc.). Existing keys will still be available to perform any cryptographic operations, and existing secrets will remain readable.
- DB Backups
Incremental backups are taken every 5 minutes at most, and a full backup is taken daily. Each backup is stored in two different regions.
These backups are kept for 30 days.
Data security
All customer data is always stored encrypted in the databases, and the databases themselves are encrypted.
Backup location
The backup location depends on the location of the OKMS.
-
US-EAST-VA
- OKMS Backup Region: US-WEST-OR
-
US-WEST-OR
- OKMS Backup Region: US-EAST-VA
Disaster scenarios
What happens if one host in a zone is lost?
Keys remain available, and traffic is redirected to the other zone. Requests in flight can timeout or return errors, depending on which host is affected.
What happens if a zone is lost?
Keys remain available, and traffic is redirected to another zone. Requests in flight can timeout or return errors.
What happens if a region is lost?
The keys created in the last seconds can be lost, and the OKMS becomes unavailable. The database replica will be used in the region and rebuilt to retrieve stored keys.
Go further
For more information and tutorials, please see our other Manage & Operate support guides or explore the guides for other OVHcloud products and services.