Log Anomaly Detection: Z-Score vs Machine Learning Compared

Why Anomaly Detection Matters

Static threshold alerts are the leading cause of alert fatigue. A CPU spike at 3 AM during a scheduled backup job triggers a page even though the system is healthy — because the threshold doesn't know about the backup. Modern anomaly detection adapts to your system's actual behavior patterns rather than firing on fixed numbers. ObservabilityOS uses a hybrid approach: statistical Z-score analysis for real-time detection and ML models for pattern recognition over longer time windows.

Z-Score: Fast and Interpretable

The Z-score measures how many standard deviations a data point is from the rolling mean. A Z-score above 3 indicates a statistically significant anomaly. Z-scores are lightweight, explainable, and work well for metrics with approximately normal distributions: error rates, latency, request throughput. The key advantage is speed and interpretability — you can show engineers exactly why an alert fired.

typescript

function calculateZScore(value: number, window: number[]): number {
  if (window.length < 30) return 0; // insufficient data
  const mean = window.reduce((a, b) => a + b, 0) / window.length;
  const variance = window.reduce((s, v) => s + (v - mean) ** 2, 0) / window.length;
  const stdDev = Math.sqrt(variance);
  return stdDev === 0 ? 0 : (value - mean) / stdDev;
}
// Z > 3.0 → anomaly | Z > 4.0 → critical

ML Approaches: Pattern Recognition at Scale

Machine learning models excel at detecting subtle patterns Z-scores miss: gradual drift, seasonal correlations across multiple services, and multi-dimensional anomalies where no single metric crosses a threshold but the combination of several is unprecedented. However, ML models require training data, more compute, and are harder to debug — you can't explain a neural network's output in one sentence.

ObservabilityOS uses ML for weekly trend analysis and capacity planning — longer time horizons where model training latency is acceptable. Z-scores handle real-time incident detection where you need a result in milliseconds, not seconds. This hybrid gives you the speed of statistical methods for immediate alerting and the depth of ML for proactive insights.

Stop debugging production in the dark

ObservabilityOS gives every engineer AI-powered incident intelligence. Zero config. Connects in 5 minutes.

Start Free — No Credit Card Read the Docs

About the Author

ObservabilityOS Team

Core Engineering & DevRel

The core engineering, site reliability, and developer relations team behind ObservabilityOS. We build AI-native observability infrastructure to eliminate 3 AM firefighting.

@observabilityos observabilityos

Incident Management

Log Anomaly Detection: Z-Score vs Machine Learning Approaches

Why Anomaly Detection Matters

Z-Score: Fast and Interpretable

ML Approaches: Pattern Recognition at Scale

Stop debugging production in the dark

Related Articles

The SRE's Guide to Eliminating Alert Fatigue in 2026

What is Observability? A Practical Guide for Developers

AI Root Cause Analysis: A Technical Deep Dive