Monitoring & Metrics Guide¶
Comprehensive guide for monitoring Morphium applications with focus on DriverStats and performance metrics.
DriverStats - Core Monitoring Foundation¶
DriverStats are essential for monitoring driver performance, especially connection pool health and issue detection. These metrics provide real-time insights into Morphium's internal state.
Accessing DriverStats¶
// Get current driver statistics
Map<DriverStatsKey, Double> stats = morphium.getDriver().getDriverStats();
// All metrics are cumulative counters or current values
Double connectionsInUse = stats.get(DriverStatsKey.CONNECTIONS_IN_USE);
Double connectionsInPool = stats.get(DriverStatsKey.CONNECTIONS_IN_POOL);
Double errors = stats.get(DriverStatsKey.ERRORS);
Complete DriverStats Reference¶
Connection Pool Metrics¶
Metric | Type | Description | Healthy Range |
---|---|---|---|
CONNECTIONS_IN_USE |
Current | Active connections currently borrowed | < 80% of max pool size |
CONNECTIONS_IN_POOL |
Current | Available connections in pool | > 10% of max pool size |
CONNECTIONS_OPENED |
Counter | Total connections opened since start | Monotonic increasing |
CONNECTIONS_CLOSED |
Counter | Total connections closed since start | Should be < OPENED |
CONNECTIONS_RELEASED |
Counter | Connections returned to pool | Should match borrowing patterns |
Performance Metrics¶
Metric | Type | Description | Monitoring Goal |
---|---|---|---|
THREADS_WAITING_FOR_CONNECTION |
Current | Threads waiting for available connection | Should be 0 |
ERRORS |
Counter | Total driver errors encountered | Low and stable |
NETWORK_ERRORS |
Counter | Network-related errors | Temporary spikes only |
Example Monitoring Implementation¶
@Component
@Scheduled(fixedDelay = 60000) // Every minute
public class MorphiumMonitor {
private final Morphium morphium;
private final MeterRegistry meterRegistry; // Micrometer for metrics export
public void collectDriverStats() {
Map<DriverStatsKey, Double> stats = morphium.getDriver().getDriverStats();
// Export all stats to monitoring system
stats.forEach((key, value) -> {
Gauge.builder("morphium.driver." + key.name().toLowerCase())
.register(meterRegistry, () -> value);
});
// Calculate derived metrics
double connectionUtilization = calculateConnectionUtilization(stats);
double errorRate = calculateErrorRate(stats);
// Export derived metrics
Gauge.builder("morphium.connection.utilization")
.register(meterRegistry, () -> connectionUtilization);
Gauge.builder("morphium.connection.error_rate")
.register(meterRegistry, () -> errorRate);
}
private double calculateConnectionUtilization(Map<DriverStatsKey, Double> stats) {
Double inUse = stats.get(DriverStatsKey.CONNECTIONS_IN_USE);
Double inPool = stats.get(DriverStatsKey.CONNECTIONS_IN_POOL);
if (inPool == null || inPool == 0) return 0.0;
return inUse / inPool;
}
private double calculateErrorRate(Map<DriverStatsKey, Double> stats) {
Double errors = stats.get(DriverStatsKey.ERRORS);
Double opened = stats.get(DriverStatsKey.CONNECTIONS_OPENED);
if (opened == null || opened == 0) return 0.0;
return errors / opened;
}
}
Connection Pool Health Monitoring¶
Critical Health Indicators¶
1. Pool Utilization
// Monitor connection pool utilization
public void checkPoolHealth() {
Map<DriverStatsKey, Double> stats = morphium.getDriver().getDriverStats();
double utilization = stats.get(DriverStatsKey.CONNECTIONS_IN_USE) /
stats.get(DriverStatsKey.CONNECTIONS_IN_POOL);
if (utilization > 0.8) {
// CRITICAL: High utilization - may need pool scaling
logger.warn("High connection pool utilization: {}%", utilization * 100);
alertManager.send(Alert.HIGH_POOL_UTILIZATION, utilization);
} else if (utilization > 0.6) {
// WARNING: Moderate utilization - monitor closely
logger.info("Moderate connection pool utilization: {}%", utilization * 100);
}
}
2. Thread Starvation Detection
public void checkThreadStarvation() {
Map<DriverStatsKey, Double> stats = morphium.getDriver().getDriverStats();
Double waitingThreads = stats.get(DriverStatsKey.THREADS_WAITING_FOR_CONNECTION);
if (waitingThreads > 0) {
// CRITICAL: Threads waiting for connections
logger.error("Thread starvation detected: {} threads waiting", waitingThreads);
alertManager.send(Alert.THREAD_STARVATION, waitingThreads);
// Additional diagnostics
double utilization = stats.get(DriverStatsKey.CONNECTIONS_IN_USE) /
stats.get(DriverStatsKey.CONNECTIONS_IN_POOL);
logger.error("Current pool utilization: {}%", utilization * 100);
}
}
3. Connection Churn Analysis
public void analyzeConnectionChurn() {
Map<DriverStatsKey, Double> stats = morphium.getDriver().getDriverStats();
Double opened = stats.get(DriverStatsKey.CONNECTIONS_OPENED);
Double closed = stats.get(DriverStatsKey.CONNECTIONS_CLOSED);
// High churn might indicate configuration issues
double churnRate = closed / opened;
if (churnRate > 0.5) {
logger.warn("High connection churn rate: {}% of connections are closed",
churnRate * 100);
// May indicate:
// - MaxConnectionLifetime too short
// - MaxConnectionIdleTime too short
// - Network instability
}
}
Real-Time Monitoring Dashboard¶
Grafana Dashboard Configuration¶
DriverStats Panel Queries (Prometheus):
# Connection pool utilization
morphium_driver_connections_in_use / morphium_driver_connections_in_pool * 100
# Error rate
rate(morphium_driver_errors[5m])
# Threads waiting for connections
morphium_driver_threads_waiting_for_connection
# Connection opening rate
rate(morphium_driver_connections_opened[5m])
# Connection closing rate
rate(morphium_driver_connections_closed[5m])
Alert Rules Configuration¶
# alerting-rules.yml
groups:
- name: morphium.rules
rules:
- alert: MorphiumHighConnectionUtilization
expr: morphium_connection_utilization > 0.8
for: 2m
labels:
severity: warning
annotations:
summary: "Morphium connection pool utilization high"
description: "Connection pool is {{ $value }}% utilized"
- alert: MorphiumThreadStarvation
expr: morphium_driver_threads_waiting_for_connection > 0
for: 30s
labels:
severity: critical
annotations:
summary: "Morphium threads waiting for connections"
description: "{{ $value }} threads are waiting for connections"
- alert: MorphiumHighErrorRate
expr: rate(morphium_driver_errors[5m]) > 0.1
for: 1m
labels:
severity: warning
annotations:
summary: "High Morphium driver error rate"
description: "Error rate is {{ $value }} errors/second"
Application-Level Metrics¶
Query Performance Monitoring¶
@Component
public class QueryMetricsCollector {
private final MeterRegistry meterRegistry;
private final Timer queryTimer;
public QueryMetricsCollector(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
this.queryTimer = Timer.builder("morphium.query.duration")
.description("Query execution time")
.register(meterRegistry);
}
// Wrap queries with timing
public <T> List<T> timedQuery(Query<T> query) {
return queryTimer.recordCallable(() -> query.asList());
}
// Monitor different query types
public <T> List<T> timedQueryWithTags(Query<T> query, String collection, String operation) {
return Timer.builder("morphium.query.duration")
.tag("collection", collection)
.tag("operation", operation)
.register(meterRegistry)
.recordCallable(() -> query.asList());
}
}
Cache Performance Monitoring¶
@Component
public class CacheMetricsCollector {
public void collectCacheMetrics() {
// Note: Cache metrics depend on cache implementation
// This is conceptual - actual implementation may vary
MorphiumCache cache = morphium.getCache();
if (cache instanceof MorphiumCacheImpl) {
MorphiumCacheImpl cacheImpl = (MorphiumCacheImpl) cache;
// Collect cache statistics
long hits = cacheImpl.getHits();
long misses = cacheImpl.getMisses();
long evictions = cacheImpl.getEvictions();
double hitRatio = (double) hits / (hits + misses);
Gauge.builder("morphium.cache.hit_ratio")
.register(meterRegistry, () -> hitRatio);
Counter.builder("morphium.cache.hits")
.register(meterRegistry)
.increment(hits);
Counter.builder("morphium.cache.misses")
.register(meterRegistry)
.increment(misses);
}
}
}
Messaging System Monitoring¶
Message Queue Metrics¶
@Component
public class MessagingMetricsCollector {
private final Messaging messaging;
@Scheduled(fixedDelay = 30000)
public void collectMessagingMetrics() {
// Monitor message processing
// Note: Exact metrics depend on messaging implementation
// Queue depth monitoring
for (String topic : getActiveTopics()) {
long queueDepth = getQueueDepth(topic);
Gauge.builder("morphium.messaging.queue_depth")
.tag("topic", topic)
.register(meterRegistry, () -> queueDepth);
if (queueDepth > 1000) {
logger.warn("High queue depth for topic {}: {}", topic, queueDepth);
}
}
}
// Message processing rate tracking
public void trackMessageProcessed(String topic, boolean success, long processingTime) {
Counter.builder("morphium.messaging.messages_processed")
.tag("topic", topic)
.tag("status", success ? "success" : "error")
.register(meterRegistry)
.increment();
Timer.builder("morphium.messaging.processing_duration")
.tag("topic", topic)
.register(meterRegistry)
.record(processingTime, TimeUnit.MILLISECONDS);
}
}
Health Checks and Diagnostics¶
Comprehensive Health Check¶
@Component
public class MorphiumHealthIndicator implements HealthIndicator {
private final Morphium morphium;
@Override
public Health health() {
Health.Builder builder = new Health.Builder();
try {
// Test basic connectivity
long start = System.currentTimeMillis();
morphium.createQueryFor(User.class).limit(1).asList();
long responseTime = System.currentTimeMillis() - start;
// Get driver statistics for detailed health info
Map<DriverStatsKey, Double> stats = morphium.getDriver().getDriverStats();
// Analyze connection pool health
double utilization = stats.get(DriverStatsKey.CONNECTIONS_IN_USE) /
stats.get(DriverStatsKey.CONNECTIONS_IN_POOL);
double waitingThreads = stats.get(DriverStatsKey.THREADS_WAITING_FOR_CONNECTION);
// Determine overall health
if (utilization < 0.8 && waitingThreads == 0 && responseTime < 1000) {
builder.status(Status.UP);
} else if (utilization < 0.9 && waitingThreads == 0 && responseTime < 3000) {
builder.status("DEGRADED");
} else {
builder.status(Status.DOWN);
}
// Add detailed metrics
builder.withDetail("database", Map.of(
"responseTime", responseTime + "ms",
"status", responseTime < 3000 ? "UP" : "SLOW"
));
builder.withDetail("connectionPool", Map.of(
"utilization", String.format("%.1f%%", utilization * 100),
"connectionsInUse", stats.get(DriverStatsKey.CONNECTIONS_IN_USE).intValue(),
"connectionsInPool", stats.get(DriverStatsKey.CONNECTIONS_IN_POOL).intValue(),
"threadsWaiting", waitingThreads.intValue(),
"totalErrors", stats.get(DriverStatsKey.ERRORS).intValue()
));
return builder.build();
} catch (Exception e) {
return builder.status(Status.DOWN)
.withException(e)
.build();
}
}
}
Diagnostic Information Collection¶
@RestController
@RequestMapping("/admin/morphium")
public class MorphiumDiagnosticsController {
@GetMapping("/stats")
public ResponseEntity<Map<String, Object>> getDriverStats() {
Map<DriverStatsKey, Double> rawStats = morphium.getDriver().getDriverStats();
Map<String, Object> response = new HashMap<>();
rawStats.forEach((key, value) ->
response.put(key.name().toLowerCase(), value));
// Add derived metrics
response.put("connection_utilization",
rawStats.get(DriverStatsKey.CONNECTIONS_IN_USE) /
rawStats.get(DriverStatsKey.CONNECTIONS_IN_POOL));
return ResponseEntity.ok(response);
}
@GetMapping("/config")
public ResponseEntity<Map<String, Object>> getConfiguration() {
MorphiumConfig config = morphium.getConfig();
Map<String, Object> response = new HashMap<>();
response.put("maxConnectionsPerHost", config.connectionSettings().getMaxConnectionsPerHost());
response.put("minConnectionsPerHost", config.connectionSettings().getMinConnectionsPerHost());
response.put("maxWaitTime", config.connectionSettings().getMaxWaitTime());
response.put("database", config.connectionSettings().getDatabase());
response.put("driverName", config.driverSettings().getDriverName());
return ResponseEntity.ok(response);
}
@PostMapping("/connection-pool/analyze")
public ResponseEntity<Map<String, Object>> analyzeConnectionPool() {
Map<DriverStatsKey, Double> stats = morphium.getDriver().getDriverStats();
Map<String, Object> analysis = new HashMap<>();
// Connection pool analysis
double utilization = stats.get(DriverStatsKey.CONNECTIONS_IN_USE) /
stats.get(DriverStatsKey.CONNECTIONS_IN_POOL);
analysis.put("utilization", utilization);
// Health assessment
List<String> issues = new ArrayList<>();
List<String> recommendations = new ArrayList<>();
if (utilization > 0.8) {
issues.add("High connection pool utilization (" + (utilization * 100) + "%)");
recommendations.add("Consider increasing maxConnectionsPerHost");
}
if (stats.get(DriverStatsKey.THREADS_WAITING_FOR_CONNECTION) > 0) {
issues.add("Threads waiting for connections");
recommendations.add("Increase connection pool size or optimize query performance");
}
double errorRate = stats.get(DriverStatsKey.ERRORS) /
stats.get(DriverStatsKey.CONNECTIONS_OPENED);
if (errorRate > 0.05) {
issues.add("High error rate (" + (errorRate * 100) + "%)");
recommendations.add("Check network stability and MongoDB health");
}
analysis.put("issues", issues);
analysis.put("recommendations", recommendations);
analysis.put("healthScore", calculateHealthScore(stats));
return ResponseEntity.ok(analysis);
}
private int calculateHealthScore(Map<DriverStatsKey, Double> stats) {
double utilization = stats.get(DriverStatsKey.CONNECTIONS_IN_USE) /
stats.get(DriverStatsKey.CONNECTIONS_IN_POOL);
double waitingThreads = stats.get(DriverStatsKey.THREADS_WAITING_FOR_CONNECTION);
double errorRate = stats.get(DriverStatsKey.ERRORS) /
stats.get(DriverStatsKey.CONNECTIONS_OPENED);
int score = 100;
if (utilization > 0.9) score -= 30;
else if (utilization > 0.8) score -= 15;
else if (utilization > 0.6) score -= 5;
if (waitingThreads > 0) score -= 40;
if (errorRate > 0.1) score -= 25;
else if (errorRate > 0.05) score -= 10;
return Math.max(0, score);
}
}
Monitoring Best Practices¶
1. Baseline Establishment¶
- Monitor DriverStats for 1-2 weeks to establish baseline performance
- Document normal operating ranges for each metric
- Set alert thresholds based on observed patterns
2. Proactive Monitoring¶
- Connection utilization > 60%: Monitor closely
- Connection utilization > 80%: Plan capacity increase
- Threads waiting > 0: Immediate investigation required
- Error rate > 5%: Check MongoDB and network health
3. Regular Health Checks¶
- Automated health checks every minute
- Deep diagnostic analysis every hour
- Weekly performance trend analysis
4. Incident Response¶
- Level 1: Connection utilization > 90% - Scale immediately
- Level 2: Threads waiting for connections - Emergency response
- Level 3: Error rate > 10% - Full system investigation
This monitoring guide provides comprehensive coverage of Morphium's DriverStats and ensures optimal connection pool performance and early issue detection.