Learn about Logs Data Platform, what it does, and the role of each component.
Introduction
Welcome to Logs Data Platform
Logs Data Platform is a platform that allows you to manage your logs. It ingests the logs generated by your infrastructure and applications, stores them, displays them in real time dashboards, and allows users to perform complex queries.
Logs Data Platform can also be used as a powerful indexing platform for any kind of document. However, this distinct use case is covered in another guide.
To operate Logs Data Platform, OVHcloud leverages various open-source software such as OpenSearch, Logstash, Flowgger, etc. to bring many features to you and guarantee interoperability with most of the software on the market.
The goal of this documentation is:
- to give an overview of the architecture of Logs Data Platform.
- to introduce the core concepts and key vocabulary.
- to describe how Logs Data Platform ingests, stores, and exposes your logs.
After reading this documentation, you can read the Quick Start documentation to configure your account and send your first logs to Logs Data Platform.
The Lifecycle of Logs
The lifecycle of logs can be split into four phases:
-
Generation: Logs are generated by applications, systems they are run on, or even by your cloud services. In most cases, they are stored locally in files. You may decide to push some or all of them to Logs Data Platform, either using an appropriate SDK directly in your code or a log forwarder software such as Rsyslog, Logstash, etc. A log forwarder is a software that gathers logs from different sources (local files but potentially also remote systems), optionally transforms them, and forwards them to a remote system or stores them locally.
-
Ingestion: Logs are received by the ingestion agents of Logs Data Platform, called inputs. They check the validity of the source and the formatting of the logs (following these guidelines), and may add, change, or remove fields in your logs before forwarding them to the storage.
-
Storage: After ingestion, your logs are stored in Logs Data Platform. They can be stored under two non-exclusive forms: as indexed data that is exposed via APIs and other tools, or as archived data.
-
Consumption: There are many ways to use your logs. On Logs Data Platform, users can display logs in real-time, build dashboards, craft complex queries and launch them (via UI or API), and even define alarms.
Key concepts of Logs Data Platform
Before giving more details on how Logs Data Platform handles each of these phases, you will need to understand how tenancy is handled and a few other key concepts.
-
OVHcloud Account: The OVHcloud account is the highest-level tenancy level, not specific to Logs Data Platform.
-
Logs Data Platform Account: A Logs Data Platform Account is the highest-level tenancy level specific to Logs Data Platform. Every Logs Data Platform account is associated with an OVHcloud account like any other OVHcloud service.
For the remainder of this guide and others and unless specified otherwise, the word Account will refer to a Logs Data Platform account. Different accounts associated with the same OVHcloud account are treated exactly as if they were associated with different OVHcloud accounts. It is at the account level that you will manage groups of users, permissions, subscriptions to options (such as dashboards, dedicated inputs, etc...), and create streams (see next point).
A Logs Data Platform account has a unique identifier that looks likeldp-[a-z]^2-[0-9]^5
, for example, ldp-xy-98765 and is associated with a unique usernamelogs-[a-z]^2-[0-9]^5
, for example logs-ab-12345 (be careful, the chains of characters are not the same for Logs Data Platform account and username).
This username is associated with a password that you will have to configure via the OVHcloud Control Panel or the API before using your cluster. The corresponding user is the administrator of the cluster. The usage of Logs Data Platform's RBAC model is described in this guide. -
Stream: A Logs Data Platform Stream is a logical partition of logs that you create and that you will use when ingesting, storing, visualizing, or querying your logs.
It is at the stream granularity that you will configure many things such as retention duration, archival policies, access rights, or even activate the live WebSocket option.
A stream is associated with a unique token that you will use when you push your logs.
There is no limit as to how many logs a Stream can store. Queries and dashboards can be built across multiple streams by using Graylog (see below) or an Alias (again, see below). -
Index: A Logs Data Platform Index (plural: indices) is, simply put, an OpenSearch Index.
While Logs Data Platform handles the management of OpenSearch indices transparently for you when handling logs, having your own index is useful when you want to interact directly with OpenSearch.
Potential use cases for having your own index range from indexing other things than logs to enriching your logs at ingestion with data contained in a database format. The usage of Indices won't be covered further in this guide for the sake of clarity and brevity.In this guide and others, the terms Index/Indices and indexing/indexed are not interchangeable. Unless specified otherwise, the term Index explicitly refers to the OpenSearch index as described above. On the other hand, the term indexing refers to the sorting of logs and documents based on their fields and text. All logs are indexed in Logs Data Platform, whether or not you deploy an optional Index.
- Alias: A Logs Data Platform Alias is a virtual OpenSearch Index that is mapped to a combination of actual Indices or Streams. It is used to allow compatibility with software that integrates with OpenSearch and thus requires an index in their configuration, such as ElastAlert, OpenSearch Dashboards, or Grafana, without requiring you to manage your own Index.
Ingestion
The first question you will ask yourself is: "How do I push my logs to Logs Data Platform?"
To do that, you will have to configure your SDK or logs-collecting software to forward your logs to one of our ingestion agents, called inputs. There are two types of inputs that you can use in Logs Data Platform.
Mutualized inputs
By default, Logs Data Platform exposes inputs that can ingest your logs in different formats (Gelf, LTSV, RFC 5424, Cap'n'Proto, and Beats). To use them, you will have to configure your SDK or software to target an endpoint that is assigned to your Logs Data Platform account with a specific port, depending on the logs format that you use and whether you send them over UDP, TCP, or TCP/TLS (encrypted on the network), and add a custom field to your logs corresponding to the token of the Stream that you want to push your logs to.
Our inputs will match the token with the target stream, verify the validity of some fields, and convert your logs to the Gelf format before storing them in our platform.
Dedicated inputs
If your use case requests it, you also have the option to deploy managed dedicated inputs. There might be several reasons for that:
- For security reasons, you want to have more control over which IP can push logs in your platform.
- You want to customize your input to transform the logs with your own rules before they are pushed to the platform.
The dedicated inputs have the following properties:
- You can choose which software is run between Logstash and Flowgger, depending on what you need and what you are more familiar with.
- They are automatically configured to push the logs they receive to a chosen stream and automatically set the OVHcloud token field for you.
- You can choose how many instances of input you deploy [SOON AUTOSCALING].
- You can choose which port they listen to and whitelist IPs that are allowed to push logs.
OpenSearch API
If you'd rather directly use the OpenSearch API to send your logs to Logs Data Platform, you can also do that by following this guide.
Note that you do not need an Index or an Alias to use the OpenSearch API this way, you just have to follow the guide to configure your software the right way.
Storage
There are two non-exclusive ways to store your logs in Logs Data Platform: indexed and archived. The storage is configured per stream.
Indexed storage
Indexed storage is the "natural" way to store your logs. It is the way to store your logs if you want to be able to query them or build dashboards, typically used for operational logs that you need to configure alerts on or access easily. When configuring a stream to index your logs, you can configure two additional parameters:
-
Retention: You can choose to keep your logs indexed for 14 days, 1 month, 3 months, or 1 year. After this retention duration, they are automatically discarded from the indexed storage. You should be careful when configuring this retention duration: it can't be changed (neither increased nor decreased) once it is set.
-
Limit: If you want to make sure you don't exceed a certain storage limit (and thus bill), you can configure a limit on how many logs (in GB) are indexed in a stream. If you do that, the stream will stop ingesting any more logs once you reach that limit. When logs are naturally discarded because they reach the retention duration or when you change the configuration of your stream, it can accept incoming logs again. You will be notified when certain thresholds are reached - 80, 90, and 100% - to have time to react if you have set the limit too low.
-
WebSocket activation: You can choose whether or not to activate the exposition of your real-time logs through a WebSocket. The usage of this WebSocket will be covered in a later part of this guide.
Archived storage
Archived storage is there to allow you to store your logs for a very long time in a cost-effective manner, typically for auditing or legal reasons. However, the very low cost comes at a price: you will not be able to query or visualize logs that are archived. They will be stored as compressed archives that you will be able to download. You can configure the following options when activating the archival for a stream:
- Compression algorithm: You can choose which compression algorithm is used and thus under which format the archive will be made available to you.
- Retention: You can choose how much time we keep the archives at your disposal: 1, 2, 5, or 10 years.
- Archival backend: You can choose which archival backend we use to archive the logs. OVHcloud Object Storage is more expensive but allows you to download your archives when you want them, whereas OVHcloud Cloud Archive is the cheapest option but requires you to wait a few hours when you request to get an archive before being able to download it.
- Encryption: You can choose whether or not to encrypt the archives and with which key to encrypt it if relevant.
Scalability and immutability
Logs Data Platform handles all the scaling for you. Therefore, there is no virtual limitation as to how many logs you can store in a stream. Furthermore, logs stored in streams can't be individually deleted or tampered with except by deleting the entire stream. Once a log is indexed in a stream, it will stay the same and be queryable for as long as the configured retention period.
Query and visualization
Now that you have seen how your logs are ingested and stored, let us look at how you can use them.
Graylog
Logs Data Platform comes with a managed Graylog platform that you can access with the credentials from your Logs Data Platform account. If you are not familiar with it, Graylog is a web-based UI that allows you to query your logs and build dashboards to have a graphical representation of your logs. The Graylog API is also exposed.
OpenSearch API
Much software interacts directly with the OpenSearch API. The OpenSearch API is available behind port 9200 of the cluster you are assigned to. Since most OpenSearch API calls need an Index as a parameter, you must use an Alias that matches a set of streams and indices from your Logs Data Platform account or an Index if you have subscribed to one.
OpenSearch Dashboards (Kibana alternative)
If Graylog doesn't meet your needs but you don't want to manage your own software, an optional dedicated instance of OpenSearch dashboards can be deployed at will and managed by OVHcloud. OpenSearch dashboards is a fork of the well-known Kibana project designed to integrate with OpenSearch rather than ElasticSearch.
Because OpenSearch Dashboards interact directly with the OpenSearch API, you will have to configure your OpenSearch Dashboards to access your Indices and/or Streams, either directly with an Index or via an Alias, depending on what you want to do.
Managed Grafana
While Logs Data Platform doesn't propose a managed Grafana inside the platform, OVHcloud Public Cloud has a managed Grafana offer that is well-suited to integrate directly with the OpenSearch API exposed by Logs Data Platform. While it is not pre-configured to integrate with your platform out of the box, the configuration is very easy to make by following this guide.
WebSocket & LDP-tail
If you have activated the WebSocket exposition for a Stream, you can directly connect to a web socket to view the logs arriving in your platform in real time. In addition to that, we have built a small and efficient tool called LDP-tail that can help you get the best of this feature.
LDP-tail is a CLI tool developed by OVHcloud that can connect to the web socket corresponding to a stream but also comes with advanced formatting and filtering capabilities, helping you to make better use of the web socket feature. You can discover how to best use it in this guide.
Alerting
If you follow this guide, you can configure alerts on your streams to warn you when some conditions are met about the volume of logs matching criteria that you can define. When they are triggered, these alerts send you an email to the configured email address.
Summary
To help you visualize the information in this guide, the picture below should sum up what each component from Logs Data Platform brings you:
Go Further
After reading this documentation, you should be familiar with most concepts used in Logs Data Platform. When you feel ready to work with Logs Data Platform, jump to the Quick Start guide to configure your account, create a first stream, send your first logs to Logs Data Platform, and watch it appear on Graylog!
For more information and tutorials, please see our other Logs Data Platform support guides or explore the guides for other OVHcloud products and services.