Add OpenTelemetry instrumentation with distributed tracing and metrics: - Structured JSON logging with trace context correlation - Auto-instrumentation for FastAPI, asyncpg, httpx, redis - OTLP exporter for traces and Prometheus metrics endpoint Implement Celery worker and notification task system: - Celery app with Redis/SQS broker support and configurable queues - Notification tasks for incident fan-out, webhooks, and escalations - Pluggable TaskQueue abstraction with in-memory driver for testing Add Grafana observability stack (Loki, Tempo, Prometheus, Grafana): - OpenTelemetry Collector for receiving OTLP traces and logs - Tempo for distributed tracing backend - Loki for log aggregation with Promtail DaemonSet - Prometheus for metrics scraping with RBAC configuration - Grafana with pre-provisioned datasources and API overview dashboard - Helm templates for all observability components Enhance application infrastructure: - Global exception handlers with structured ErrorResponse schema - Request logging middleware with timing metrics - Health check updated to verify task queue connectivity - Non-root user in Dockerfile for security - Init containers in Helm deployments for dependency ordering - Production Helm values with autoscaling and retention policies
49 lines
1.0 KiB
YAML
49 lines
1.0 KiB
YAML
apiVersion: 1
|
|
|
|
datasources:
|
|
- name: Prometheus
|
|
type: prometheus
|
|
uid: prometheus
|
|
url: http://prometheus:9090
|
|
access: proxy
|
|
isDefault: false
|
|
jsonData:
|
|
httpMethod: POST
|
|
exemplarTraceIdDestinations:
|
|
- name: trace_id
|
|
datasourceUid: tempo
|
|
|
|
- name: Tempo
|
|
type: tempo
|
|
uid: tempo
|
|
url: http://tempo:3200
|
|
access: proxy
|
|
isDefault: false
|
|
jsonData:
|
|
tracesToLogsV2:
|
|
datasourceUid: loki
|
|
spanStartTimeShift: '-1h'
|
|
spanEndTimeShift: '1h'
|
|
filterByTraceID: true
|
|
filterBySpanID: true
|
|
tracesToMetrics:
|
|
datasourceUid: prometheus
|
|
nodeGraph:
|
|
enabled: true
|
|
lokiSearch:
|
|
datasourceUid: loki
|
|
|
|
- name: Loki
|
|
type: loki
|
|
uid: loki
|
|
url: http://loki:3100
|
|
access: proxy
|
|
isDefault: true
|
|
jsonData:
|
|
derivedFields:
|
|
- datasourceUid: tempo
|
|
matcherRegex: '"trace_id":"([a-f0-9]+)"'
|
|
name: TraceID
|
|
url: '$${__value.raw}'
|
|
urlDisplayLabel: 'View Trace'
|