Learn about the OVHcloud intervention process.
What causes an intervention?
OVHcloud Monitoring allows you to monitor the status of your machine from the OVHcloud Control Panel. Interventions are needed when a server becomes unresponsive to ICMP (ping) requests. Technical interventions may also be needed if a customer performs an IPMI task and our system detects the task failing, or when a reboot stalls and the server is down.
Depending on your monitoring settings, an intervention can be automatically triggered, or you can submit a ticket requesting an intervention.
The available options are:
- Disabled: Your server is not monitored. It can receive ICMP (Internet Control Message Protocol) requests, but the monitoring will ignore their results if there is no response, and you will not receive any alerts from OVHcloud if the connection to the public network is lost.
- Enabled with proactive intervention: Your server is monitored (ICMP ping test). You will receive alerts sent to the email address linked to your technical contact's ID. You have authorized OVHcloud to resolve issues on your machine when there is an ICMP ping unavailable alert. A support ticket is automatically created, so you can follow up with your ticket.
- Enabled without proactive intervention: Your server is monitored (ICMP ping test). You will receive alerts sent to the email address linked to your technical contact's ID. Without proactive intervention enabled, OVHcloud will not resolve issues on your machine when there is an ICMP unavailable ping alert.
NOTE: Regardless of the monitoring option you select, certain failed operations may automatically trigger intervention from the data center team. These operations include, but are not limited to:
- Failed OS installations using OVHcloud templates (including diagnostics and potential hardware replacement, e.g., faulty disks),
- Failure to boot into rescue mode via the OVHcloud Control Panel, and
- Failed reboot attempts initiated from the OVHcloud Control Panel.
What does the DC (data center) team do?
The data center team manages the underlying hardware and infrastructure. For example, if hardware logs are provided displaying the issue, hardware replacements can be done as well as BIOS flashes, cooling checks, and IPMI configurations.
Why does the service boot to rescue mode?
If the server is down due to Operating System configuration issues, the data center team would verify functionality and boot to rescue mode. Rescue mode is a Debian-based Linux kernel that is useful for troubleshooting because it rules out outliers in the Operating System that can cause a server to be down.
If the datacenter boots the service in rescue mode they typically leave an intervention note that would be emailed to the customer saying: "Software configuration needs to be corrected by the customer."
If the OS is inaccessible and corrupted the customer can troubleshoot their corrupt OS via rescue mode (via IPMI or SSH). Then mount the partition to access their data and go from there.
The customer can also back up their data and reinstall if needed.
What does OVHcloud manage?
We manage the hardware and infrastructure on the OVHcloud side only - not the OS-related issues. Please see our OVHcloud US Statement of Support for more information.
What do customers need to do?
Preventative measures
You can set up automatic alerts when your server requires intervention by following the instructions in this guide.
During the intervention process
Once the data center team leaves the "Software configuration needs to be corrected by the customer," message, the customer is responsible for troubleshooting the issue. See above for more information.
After the intervention process
The steps required after the intervention process are outlined in our Finalize a Maintenance Intervention on a Dedicated Server guide.
How do I request an intervention?
Customers can request an intervention via a customer support ticket and must provide hardware error logs and a timeframe during which the intervention can be scheduled. It is best to send hardware error logs via rescue mode.
The disk replacement process includes:
- providing the serial number of the drive(s) to be replaced,
- backing up all data on your side,
- giving a timeframe in which the intervention can be scheduled, and
- providing the hardware error logs of the disks by running a SMART test (or any error logs equivalent).
Below is some documentation regarding SMART tests that you can refer to: