sell/middlelayer/monitoring/README.md

# Monitoring & Observability

Der Middlelayer ist mit umfassendem Monitoring ausgestattet:

## 1. Structured Logging (Winston)

**Konfiguration:**
- Log-Level: `LOG_LEVEL` (default: `info`)
- Format: JSON in Production, farbig in Development
- Output: Console + Files (`logs/error.log`, `logs/combined.log`)

**Verwendung:**
```typescript
import { logger, logQuery, logError } from './monitoring/logger.js';

logger.info('Info message', { context: 'data' });
logQuery('GetProducts', { limit: 10 }, 45);
logError(error, { operation: 'getProducts' });
```

## 2. Prometheus Metrics

**Endpoints:**
- `GET http://localhost:9090/metrics` - Prometheus Metrics
- `GET http://localhost:9090/health` - Health Check

**Verfügbare Metriken:**

### Query Metrics
- `graphql_queries_total` - Anzahl der Queries (Labels: `operation`, `status`)
- `graphql_query_duration_seconds` - Query-Dauer (Histogram)
- `graphql_query_complexity` - Query-Komplexität (Gauge)

### Cache Metrics
- `cache_hits_total` - Cache Hits (Label: `cache_type`)
- `cache_misses_total` - Cache Misses (Label: `cache_type`)

### DataService Metrics
- `dataservice_calls_total` - DataService Aufrufe (Labels: `method`, `status`)
- `dataservice_duration_seconds` - DataService Dauer (Histogram)

### Error Metrics
- `errors_total` - Anzahl der Fehler (Labels: `type`, `operation`)

**Beispiel Prometheus Query:**
```promql
# Query Rate
rate(graphql_queries_total[5m])

# Error Rate
rate(errors_total[5m])

# Cache Hit Ratio
rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m]))
```

## 3. Distributed Tracing

**Features:**
- Automatische Trace-ID-Generierung pro Request
- Span-Tracking für verschachtelte Operationen
- Dauer-Messung für Performance-Analyse

**Trace-IDs werden automatisch in Logs und Metrics eingebunden.**

## Environment Variables

```bash
# Logging
LOG_LEVEL=info              # debug, info, warn, error

# Metrics
METRICS_PORT=9090           # Port für Metrics-Endpoint

# Query Complexity
MAX_QUERY_COMPLEXITY=1000   # Max. Query-Komplexität
```

## Integration mit Grafana

**Prometheus Scrape Config:**
```yaml
scrape_configs:
  - job_name: 'graphql-middlelayer'
    static_configs:
      - targets: ['localhost:9090']
```

**Grafana Dashboard:**
- Importiere die Metriken in Grafana
- Erstelle Dashboards für:
  - Query Performance
  - Cache Hit Rates
  - Error Rates
  - Request Throughput

## Beispiel-Dashboard Queries

```promql
# Requests pro Sekunde
sum(rate(graphql_queries_total[1m])) by (operation)

# Durchschnittliche Query-Dauer
avg(graphql_query_duration_seconds) by (operation)

# Cache Hit Rate
sum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m])))

# Error Rate
sum(rate(errors_total[5m])) by (type, operation)
```