113 lines
2.7 KiB
Markdown
113 lines
2.7 KiB
Markdown
# Monitoring & Observability
|
|
|
|
Der Middlelayer ist mit umfassendem Monitoring ausgestattet:
|
|
|
|
## 1. Structured Logging (Winston)
|
|
|
|
**Konfiguration:**
|
|
- Log-Level: `LOG_LEVEL` (default: `info`)
|
|
- Format: JSON in Production, farbig in Development
|
|
- Output: Console + Files (`logs/error.log`, `logs/combined.log`)
|
|
|
|
**Verwendung:**
|
|
```typescript
|
|
import { logger, logQuery, logError } from './monitoring/logger.js';
|
|
|
|
logger.info('Info message', { context: 'data' });
|
|
logQuery('GetProducts', { limit: 10 }, 45);
|
|
logError(error, { operation: 'getProducts' });
|
|
```
|
|
|
|
## 2. Prometheus Metrics
|
|
|
|
**Endpoints:**
|
|
- `GET http://localhost:9090/metrics` - Prometheus Metrics
|
|
- `GET http://localhost:9090/health` - Health Check
|
|
|
|
**Verfügbare Metriken:**
|
|
|
|
### Query Metrics
|
|
- `graphql_queries_total` - Anzahl der Queries (Labels: `operation`, `status`)
|
|
- `graphql_query_duration_seconds` - Query-Dauer (Histogram)
|
|
- `graphql_query_complexity` - Query-Komplexität (Gauge)
|
|
|
|
### Cache Metrics
|
|
- `cache_hits_total` - Cache Hits (Label: `cache_type`)
|
|
- `cache_misses_total` - Cache Misses (Label: `cache_type`)
|
|
|
|
### DataService Metrics
|
|
- `dataservice_calls_total` - DataService Aufrufe (Labels: `method`, `status`)
|
|
- `dataservice_duration_seconds` - DataService Dauer (Histogram)
|
|
|
|
### Error Metrics
|
|
- `errors_total` - Anzahl der Fehler (Labels: `type`, `operation`)
|
|
|
|
**Beispiel Prometheus Query:**
|
|
```promql
|
|
# Query Rate
|
|
rate(graphql_queries_total[5m])
|
|
|
|
# Error Rate
|
|
rate(errors_total[5m])
|
|
|
|
# Cache Hit Ratio
|
|
rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m]))
|
|
```
|
|
|
|
## 3. Distributed Tracing
|
|
|
|
**Features:**
|
|
- Automatische Trace-ID-Generierung pro Request
|
|
- Span-Tracking für verschachtelte Operationen
|
|
- Dauer-Messung für Performance-Analyse
|
|
|
|
**Trace-IDs werden automatisch in Logs und Metrics eingebunden.**
|
|
|
|
## Environment Variables
|
|
|
|
```bash
|
|
# Logging
|
|
LOG_LEVEL=info # debug, info, warn, error
|
|
|
|
# Metrics
|
|
METRICS_PORT=9090 # Port für Metrics-Endpoint
|
|
|
|
# Query Complexity
|
|
MAX_QUERY_COMPLEXITY=1000 # Max. Query-Komplexität
|
|
```
|
|
|
|
## Integration mit Grafana
|
|
|
|
**Prometheus Scrape Config:**
|
|
```yaml
|
|
scrape_configs:
|
|
- job_name: 'graphql-middlelayer'
|
|
static_configs:
|
|
- targets: ['localhost:9090']
|
|
```
|
|
|
|
**Grafana Dashboard:**
|
|
- Importiere die Metriken in Grafana
|
|
- Erstelle Dashboards für:
|
|
- Query Performance
|
|
- Cache Hit Rates
|
|
- Error Rates
|
|
- Request Throughput
|
|
|
|
## Beispiel-Dashboard Queries
|
|
|
|
```promql
|
|
# Requests pro Sekunde
|
|
sum(rate(graphql_queries_total[1m])) by (operation)
|
|
|
|
# Durchschnittliche Query-Dauer
|
|
avg(graphql_query_duration_seconds) by (operation)
|
|
|
|
# Cache Hit Rate
|
|
sum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m])))
|
|
|
|
# Error Rate
|
|
sum(rate(errors_total[5m])) by (type, operation)
|
|
```
|
|
|