Files
sell/middlelayer/monitoring/README.md

113 lines
2.7 KiB
Markdown

# Monitoring & Observability
Der Middlelayer ist mit umfassendem Monitoring ausgestattet:
## 1. Structured Logging (Winston)
**Konfiguration:**
- Log-Level: `LOG_LEVEL` (default: `info`)
- Format: JSON in Production, farbig in Development
- Output: Console + Files (`logs/error.log`, `logs/combined.log`)
**Verwendung:**
```typescript
import { logger, logQuery, logError } from './monitoring/logger.js';
logger.info('Info message', { context: 'data' });
logQuery('GetProducts', { limit: 10 }, 45);
logError(error, { operation: 'getProducts' });
```
## 2. Prometheus Metrics
**Endpoints:**
- `GET http://localhost:9090/metrics` - Prometheus Metrics
- `GET http://localhost:9090/health` - Health Check
**Verfügbare Metriken:**
### Query Metrics
- `graphql_queries_total` - Anzahl der Queries (Labels: `operation`, `status`)
- `graphql_query_duration_seconds` - Query-Dauer (Histogram)
- `graphql_query_complexity` - Query-Komplexität (Gauge)
### Cache Metrics
- `cache_hits_total` - Cache Hits (Label: `cache_type`)
- `cache_misses_total` - Cache Misses (Label: `cache_type`)
### DataService Metrics
- `dataservice_calls_total` - DataService Aufrufe (Labels: `method`, `status`)
- `dataservice_duration_seconds` - DataService Dauer (Histogram)
### Error Metrics
- `errors_total` - Anzahl der Fehler (Labels: `type`, `operation`)
**Beispiel Prometheus Query:**
```promql
# Query Rate
rate(graphql_queries_total[5m])
# Error Rate
rate(errors_total[5m])
# Cache Hit Ratio
rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m]))
```
## 3. Distributed Tracing
**Features:**
- Automatische Trace-ID-Generierung pro Request
- Span-Tracking für verschachtelte Operationen
- Dauer-Messung für Performance-Analyse
**Trace-IDs werden automatisch in Logs und Metrics eingebunden.**
## Environment Variables
```bash
# Logging
LOG_LEVEL=info # debug, info, warn, error
# Metrics
METRICS_PORT=9090 # Port für Metrics-Endpoint
# Query Complexity
MAX_QUERY_COMPLEXITY=1000 # Max. Query-Komplexität
```
## Integration mit Grafana
**Prometheus Scrape Config:**
```yaml
scrape_configs:
- job_name: 'graphql-middlelayer'
static_configs:
- targets: ['localhost:9090']
```
**Grafana Dashboard:**
- Importiere die Metriken in Grafana
- Erstelle Dashboards für:
- Query Performance
- Cache Hit Rates
- Error Rates
- Request Throughput
## Beispiel-Dashboard Queries
```promql
# Requests pro Sekunde
sum(rate(graphql_queries_total[1m])) by (operation)
# Durchschnittliche Query-Dauer
avg(graphql_query_duration_seconds) by (operation)
# Cache Hit Rate
sum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m])))
# Error Rate
sum(rate(errors_total[5m])) by (type, operation)
```