Guidelines
Monitoring & Observability
Essential metrics and logging requirements
Essential Metrics
Key Metrics
- Success rate per endpoint (target: >99.9%)
- P95 response time alerts (threshold: 400ms)
- Error rate by error code
- Retry and idempotent request rates
Balance Metrics
- Balance discrepancy alerts between your system and game state
- Daily reconciliation reports
- Suspicious activity patterns
- Large transaction alerts
Security Metrics
- Signature verification failure rates
- Invalid token attempts
- Unusual request patterns
- Geographic anomalies
Logging Requirements
Request Logging
- Log all webhook requests with timestamps
- Include request method, path, and response code
- Record response times for each request
- Log request body (sanitize sensitive data)
Correlation IDs
- Include correlation IDs (
tx_id,action_id) in all logs - Use structured logging for easy searching
- Maintain audit trail for compliance
- Enable distributed tracing
Security Logging
- Log signature verification results (but not the signatures)
- Record authentication attempts
- Track rate limiting triggers
- Monitor for unusual patterns
Alerting Strategy
Critical Alerts
- Response time > 500ms
- Error rate > 1%
- Balance discrepancies detected
- Signature verification failures
Warning Alerts
- Response time > 200ms (P95)
- Increased retry rates
- Database connection pool exhaustion
- Memory/CPU usage spikes
Dashboard Components
- Real-time request volume
- Response time graphs
- Error rate trends
- Player balance distribution
- Transaction volume by type