Overview
Qovery provides integrated observability to help you monitor the health, performance, and behavior of your services directly within the Qovery Console. Your observability data stays within your infrastructure with zero maintenance required.
Currently available for AWS, GCP, and Scaleway clusters. Azure support is coming soon. Supports Applications and Containers (Jobs and Managed Databases support coming soon).
Qovery Observe is not yet self-service. Contact Qovery via Slack or email to get access.
Features
Service Health
Real-time service health and performance tracking
Metrics
CPU, memory, network, request latency, and error rates
Logs
12 weeks log retention with automatic error detection
Events
Qovery and Kubernetes events (deployments, scaling, failures)

Key Benefits
- Data stays in your infrastructure: All observability data remains within your cloud
- Zero maintenance: No configuration or management required
- Correlated data: Metrics and logs automatically linked for faster troubleshooting
Architecture
Qovery’s observability combines open-source tools to monitor your Kubernetes infrastructure:Data Collection
Metrics
Prometheus + Thanos collect and store metrics (CPU, memory, network)
Logs
Loki + Promtail collect and store container logs
Events
Qovery Event Logger captures Kubernetes events
Data Retention
- Prometheus: 7-day local retention
- Thanos: Raw metrics (15 days), 5-minute resolution (30 days), 1-hour resolution (30 days)
- Loki: 12-week log retention
Key Features
- Per-cluster isolation: Data protection and performance optimization
- Automatic error detection: Custom metrics track error logs for alerting
- High availability: Prometheus runs with 2 replicas; Thanos auto-scales 2-5 replicas
Architecture Diagram


Monitoring
Access the Monitoring tab at the service level to view real-time and historical application data.Service Health
Monitor your service health with:- Event tracking: Qovery events (deployments, failures) and Kubernetes events (autoscaler triggers, OOMKilled pods, health check issues)
- Error logging: Automatically counts error-level logs with direct navigation to errors
- HTTP error metrics: Aggregated 499 and 5xx error rates by endpoint and status code
- Request latency: P99 tail latency visualization (expandable to P90 and P50)

Resource Monitoring
Track per-pod resources:- CPU usage: Against configured requests and limits
- Memory usage: Against configured requests and limits

Network Metrics
Monitor network-level data:- Request status by path and error code
- Request duration (P50, P95, P99 percentiles)
- Request size statistics
Metrics represent ingress traffic for services with public ports or internal cluster traffic otherwise. Scaleway clusters currently lack internal traffic monitoring when no public port is exposed.

Controls
- Live update toggle: Continuous chart refresh
- Custom time frames: Select data display ranges
Logs
Access logs via the Logs tab or the Monitoring tab.
Log Features
Qovery collects and stores logs using Loki + Promtail with:- 12 weeks retention when observability is enabled
- 24 hours retention without observability
- Automatic error detection: Error-level logs are counted and highlighted
- Log enrichment: Service ID, environment ID, and pod information
Filtering Capabilities
Keyword Search
Locate specific messages within log entries
Time Range
Isolate logs around deployments or incidents
Log Level
Filter by severity (error, info, debug)