Skip to main content

Overview

Qovery provides integrated observability to help you monitor the health, performance, and behavior of your services directly within the Qovery Console. Your observability data stays within your infrastructure with zero maintenance required. Qovery Observability Overview
Currently available for AWS, GCP, and Scaleway clusters. Azure support is coming soon. Supports Applications and Containers (Jobs and Managed Databases support coming soon).
Qovery Observe is not yet self-service. Contact Qovery via Slack or email to get access.

Features

Service Health

Real-time service health and performance tracking

Metrics

CPU, memory, network, request latency, and error rates

Logs

12 weeks log retention with automatic error detection

Events

Qovery and Kubernetes events (deployments, scaling, failures)
Observability Capabilities

Key Benefits

  • Data stays in your infrastructure: All observability data remains within your cloud
  • Zero maintenance: No configuration or management required
  • Correlated data: Metrics and logs automatically linked for faster troubleshooting

Architecture

Qovery’s observability combines open-source tools to monitor your Kubernetes infrastructure:

Data Collection

Metrics

Prometheus + Thanos collect and store metrics (CPU, memory, network)

Logs

Loki + Promtail collect and store container logs

Events

Qovery Event Logger captures Kubernetes events

Data Retention

  • Prometheus: 7-day local retention
  • Thanos: Raw metrics (15 days), 5-minute resolution (30 days), 1-hour resolution (30 days)
  • Loki: 12-week log retention

Key Features

  • Per-cluster isolation: Data protection and performance optimization
  • Automatic error detection: Custom metrics track error logs for alerting
  • High availability: Prometheus runs with 2 replicas; Thanos auto-scales 2-5 replicas

Architecture Diagram

Qovery Observability Architecture Qovery Observability Architecture

Monitoring

Access the Monitoring tab at the service level to view real-time and historical application data.

Service Health

Monitor your service health with:
  • Event tracking: Qovery events (deployments, failures) and Kubernetes events (autoscaler triggers, OOMKilled pods, health check issues)
  • Error logging: Automatically counts error-level logs with direct navigation to errors
  • HTTP error metrics: Aggregated 499 and 5xx error rates by endpoint and status code
  • Request latency: P99 tail latency visualization (expandable to P90 and P50)
Service Health and Events

Resource Monitoring

Track per-pod resources:
  • CPU usage: Against configured requests and limits
  • Memory usage: Against configured requests and limits
Resource Monitoring

Network Metrics

Monitor network-level data:
  • Request status by path and error code
  • Request duration (P50, P95, P99 percentiles)
  • Request size statistics
Metrics represent ingress traffic for services with public ports or internal cluster traffic otherwise. Scaleway clusters currently lack internal traffic monitoring when no public port is exposed.
Network Metrics

Controls

  • Live update toggle: Continuous chart refresh
  • Custom time frames: Select data display ranges

Logs

Access logs via the Logs tab or the Monitoring tab. Service Logs

Log Features

Qovery collects and stores logs using Loki + Promtail with:
  • 12 weeks retention when observability is enabled
  • 24 hours retention without observability
  • Automatic error detection: Error-level logs are counted and highlighted
  • Log enrichment: Service ID, environment ID, and pod information

Filtering Capabilities

Keyword Search

Locate specific messages within log entries

Time Range

Isolate logs around deployments or incidents

Log Level

Filter by severity (error, info, debug)

Next Steps