Mitacs Research CatBoost / XGBoost Anomaly Detection Real-Time Monitoring Published — Energies 2024

Supermarket Refrigerant Leak Detection

Smart Leak Detection

A two-phase research program with Neelands Group tackling refrigerant leak detection in supermarket refrigeration. Phase 1 (published, Energies 2024) introduced CatBoost-based detection for HFC systems. Phase 2 scales to CO2 systems with 162 subsystems, four parallel detectors, and a hysteresis state machine.

Phase 1 — Published Paper Phase 2 — CO2 System Interactive Demo

Published — Energies 2024 17, 736

Precision Leak Detection in Supermarket Refrigeration Systems

Integrating Categorical Gradient Boosting with Advanced Thresholding for HFC-based refrigeration.

Rashinda Wijethunga, Hooman Nouraei, Craig Zych, Jagath Samarabandu, Ayan Sadhu
doi.org/10.3390/en17030736 · Open Access (CC BY)

0.92

Slow Leak Avg F1

1.00

Subcooling F1

5 days

Avg Early Detection

Target Features

Real Supermarkets

Motivation

Why leak detection matters

Supermarket refrigeration accounts for ~50% of total energy consumption. Leaks have severe economic and environmental consequences.

💸

$150K Annual Energy Cost

Refrigeration costs ~1% of total sales, equal to the average net profit margin. A 10% energy reduction can double profits. Leak rates average 11% annually, spiking to 30%.

🌍

Environmental Impact

HFCs have atmospheric lifespans up to 14 years. Regulatory bodies worldwide (Kigali Amendment, EU F-gas) mandate stricter leak monitoring and repair timelines.

🔍

Detection Gap

Most existing AFDD methods depend on the refrigerant level sensor in the receiver tank, which is unavailable in most supermarkets. Many only address one leak type.

Contributions

Three novel contributions

A complete framework for both slow and catastrophic leak detection, independent of receiver-level sensors.

🔍

Dual Leak Detection

First framework to detect both slow and catastrophic leaks under a single system. These show contrasting behaviours requiring fundamentally different detection strategies.

⚙

Receiver-Independent

Uses only universally available parameters: COP, subcooling, superheat, mass flow rate, compression ratio, and energy consumption. No receiver-level sensor required.

🚨

False Alarm Mitigation

First leak detection solution for supermarkets with built-in false alarm mitigation, handling control mode transitions (defrost, heat reclaim, condenser splits) that mimic anomalies.

Architecture

Overall algorithm architecture

Two parallel pipelines share a common CatBoost prediction backbone, with specialised anomaly detection for each leak type.

Fig. 4: Overall algorithm architecture for leak detection. Left branch handles catastrophic leaks (15-min data, dynamic thresholding). Right branch handles slow leaks (1-hour data, cumulative error analysis).

Catastrophic Detection

Non-parametric dynamic thresholding

A CatBoost model predicts each target feature during normal operation. During a leak, prediction errors spike — the challenge is distinguishing genuine anomalies from noise.

Step 1: Smoothed Error Series

Prediction errors e(t) = (y(t) − ŷ(t))² are computed in real-time and smoothed using an exponentially weighted average to eliminate sharp spikes from noise.

e_s = EWMA(e(t−h), ..., e(t))

Step 2: Optimal Threshold Selection

A set of candidate thresholds ε(z_i) = μ(e_s) + z_iσ(e_s) is generated for z ranging from 2 to 10. The optimal ε* maximises separation between normal and anomalous data.

ε* = argmax [Δμ/μ + Δσ/σ] / [|e_a| + |E_seq|²]

Step 3: Anomaly Scoring

All error values above ε* in continuous sequences E_seq receive an anomaly severity score based on how far they exceed the threshold relative to the error statistics.

s⁽ⁱ⁾ = [max(e_seq⁽ⁱ⁾) − ε*] / [μ(e_s) + σ(e_s)]

Step 4: False Positive Mitigation

A pruning vector e_max ranks anomalous subsequences by peak error. Walking through this vector, if the percentage decrease d_i between consecutive peaks exceeds threshold p, earlier sequences are confirmed as true anomalies; the rest are reclassified as normal noise.

d_i = (e_maxⁱ⁻¹ − e_maxⁱ) / e_maxⁱ⁻¹
if d_i > p → confirm anomalies j < i

Slow Leak Detection

Cumulative error analysis for gradual leaks

Slow leaks don't significantly affect system efficiency until substantial refrigerant has been lost. A fundamentally different detection strategy is needed.

The Core Insight

As the system enters a slow leak state, the partial correlation between input variables (X) and target variables (Y) shifts. The CatBoost model, trained on healthy data, produces a gradually increasing prediction error that accumulates over time. The key innovation is tracking the bi-weekly gradual rise of cumulative squared errors while removing rapid spikes caused by noise.

1. Predict COP

CatBoost predicts COP from 1-hour sampled data. Only COP responds to slow leaks.

2. Cumulative Sum

Squared errors are accumulated: C_e²(t) = Σe²(t). Gradient change reveals leak onset.

3. Remove Rapid Shifts

ΔRapid removed using gradient G(t) > μ + 3σ. Only gradual, consistent accumulation retained.

4. Daily Threshold

D_ΔGradual resampled daily. Threshold T = 1.8 × max(validation) applied per store.

Results

Detection performance

Evaluated on real-world data from Canadian supermarkets with HFC refrigerant.

⚡ Catastrophic Leak (1 event)

Refrigerant dropped from 40% to 30% within hours. Five of six target features showed clear anomaly signals. Subcooling temperature achieved perfect F1 = 1.0. Energy consumption (kW) showed no response, proving it is insufficient alone.

🐢 Slow Leaks (3 events, 3 stores)

Leaks persisted for weeks/months before manual detection. The algorithm detected them within an average of 5 days. Only COP responded; other features showed no gradual change. Average F1 = 0.92 across all stores.

Target Parameter	Precision	Recall	F1 Score
Subcooling Temperature	1.0000	1.0000	1.0000
Compression Ratio	0.7894	1.0000	0.8824
COP	0.9000	0.6000	0.7200
Mass Flow Rate	1.0000	0.5333	0.6957
Superheat Temperature	0.3684	0.4667	0.4118
kW Value	0.0000	0.0000	0.0000

Catastrophic leak detection performance (point-adjusted F1 scores). Slow leak: COP F1 = 0.92 avg across 3 stores.

Parameter	MAPE (%)	R² Score	False Alarm Rate (%)
COP	1.47	0.877	0.27
Subcooling	—	0.827	0.13
Superheat	1.38	0.904	0.40
Mass Flow Rate	2.55	0.802	0.33
Compression Ratio	1.27	0.764	0.10
kW Value	2.13	0.600	0.10

Prediction model & anomaly detection performance during non-leak operation (low false alarm rates confirm robust calibration).

Phase 2 — In Progress Residual-Based Multi-Detector Voting

Scaling to CO2 Transcritical Refrigeration

An intelligent anomaly detection system monitoring 162 subsystems with XGBoost residuals, four parallel detectors, and a hysteresis state machine.

162

Subsystems Monitored

Parallel Detectors

7.6M

Training Samples

0.936

R² Validation

Alert Severity Levels

<1%

False Positive Rate

Motivation

Why automated leak detection?

CO2 refrigerant leaks in commercial systems are costly, dangerous, and invisible to conventional monitoring.

💸

Massive Financial Loss

A single leak event can lose 350–1,400+ lbs of CO2 refrigerant. The Neelands store has experienced 12 documented incidents in 3 years, each requiring emergency service calls and risking product spoilage.

🔍

No Direct Leak Sensor

Existing refrigerant-level sensors have limited coverage and unreliable readings. The system must infer leaks indirectly from operational telemetry — specifically from how valve positions deviate from expected behaviour.

⚡

Speed Saves Money

Slow leaks can persist for weeks before manual detection. Every hour of earlier detection saves refrigerant, prevents compressor damage, and avoids food safety incidents from rising case temperatures.

Data Pipeline

From raw sensors to model-ready features

A 6-step pipeline transforms 15-minute sensor readings into a panel dataset with 35 engineered features per subsystem.

Step 2b–2c — Merge & Interpolate

Merge subsystem data (CT, VPO for 168 units) with system-level features (environmental, pressure, power). Linear-interpolate Indoor Temp, Outdoor Temp, Indoor Humidity (7–10% missing, 99%+ gaps ≤1 hour).

Step 3 — Anomaly Labeling

Mark 12 known service-record periods as anomalous. Apply sensor-threshold rules (e.g., kW<10, Pressure<500) with ±1-hour buffers. Creates binary HealthLabel column for train/test filtering.

Step 4 — Clean & Filter

Exclude 8 problematic subsystems (high missing data or sensor anomalies). Drop unused columns (power, pressure, PIDs). Interpolate small NaN gaps (≤2 rows) in CT/VPO. 168 → 162 active subsystems.

Step 5 — Feature Engineering

Create 6 lag features per signal (1.5 hrs history) for CT, VPO, and 3 environmental columns. Add cyclical sin/cos encodings for hour, day-of-week, and month. Total: 35 features per subsystem-timestamp pair.

Step 6 — Split & Panelise

Temporal split: Train (2022–2023), Val (Jan–Jun 2024), Test (Jun 2024+). Filter to "truly healthy" rows (current + 6 prior rows all healthy). Convert wide → panel format (one row per subsystem×timestamp). Final: 7.6M train, 1.9M val rows.

Step 7 — Train XGBoost

GPU-accelerated XGBoost regressor (depth=6, min_child=100, lr=0.05) trained on healthy-only data. Predicts VPO from CT + environmental + temporal features. Val R²=0.936, RMSE=3.80. VPO lags intentionally excluded to preserve anomaly sensitivity.

Architecture

Four-layer detection pipeline

From raw predictions to system-level alerts: each layer progressively filters noise and increases confidence.

Fig. 1: Four-layer detection pipeline. Layer 1 computes per-subsystem z-scored residuals from XGBoost predictions. Layer 2 runs four parallel detectors with majority voting. Layer 3 aggregates flagged subsystems by refrigeration family. Layer 4 escalates alerts through a hysteresis state machine.

Detectors

Four parallel anomaly detectors

Each detector is tuned to catch a different leak profile. They vote together — a subsystem is flagged only when ≥2 agree.

Bilateral CUSUM

drift=0.5 · decay=0.85 · h=786

Accumulates deviations above a drift allowance. Catches slow, sustained leaks over hours/days. Exponential decay prevents stale accumulation after anomaly clears.

S⁺[t] = max(0, S⁺[t-1] + z[t] - 0.5)
S⁻[t] = max(0, S⁻[t-1] - z[t] - 0.5)
if |z| < 3: apply ×0.85 decay

EWMA

λ=0.1 · h=4.32

Exponentially weighted moving average with ~2.5-hour effective memory. Catches gradual trend changes that CUSUM might take longer to accumulate.

EWMA[t] = 0.1 · z[t] + 0.9 · EWMA[t-1]
flag if |EWMA| > 4.32

Rolling Z-Score

window=96 (24h) · h=1.94

Mean z-score over a 24-hour sliding window. Catches persistent subtle elevation that is individually small but collectively significant over a full day.

rolling_z[t] = mean(z[t-95], ..., z[t])
flag if |rolling_z| > 1.94

Spike Detector

4σ · 3 consecutive steps (45 min)

Fires when z-score exceeds 4σ for 3+ consecutive timesteps. Catches sudden catastrophic events — pipe bursts, major valve failures, compressor shutdowns.

if |z[t]| > 4.0 for 3 consecutive steps
→ spike_flagged = True

Interactive Demo

Leak detection simulator

Watch the detection pipeline respond in real time. Inject different leak types and see how detectors, voting, and the state machine react. All four layers run live in your browser.

Speed

Z-SCORE SIGNAL (avg across affected subsystems)

SGrLT

0/36

SGrMT

0/83

SGrSC

0/9

SGrTC

0/34

System Alert Level

NORMAL

0 steps in level

Detector States

CUSUM

EWMA

RollingZ

Spike

Voting & Aggregation

0/4 detectors active — not flagged

Results

Detection performance on historical events

Evaluated on 8 real leak events from 2022–2025. The system detected all events, with time-to-detection ranging from hours to days depending on leak severity.

Event	Period	Description	Severity	Peak Level
02	Mar–Jun 2022	Slow leak (~800 lb)	Major	CRITICAL
04	Sep–Nov 2022	System #30 leak	Moderate	ALERT
05	Dec 2022	Major leak (bakery)	Major	CRITICAL
06	Apr–Jun 2023	Power outage + leaks	Severe	CRITICAL
07	Aug 2023	Multi-day leak (350 lb)	Moderate	ALERT
08	Dec 2023–Jan 2024	Suspicious activity	Mild	WARNING
10	Jul 2024	Extended abnormal	Moderate	ALERT
12	Dec 2024–Jan 2025	Suspicious activity	Mild	WARNING

Design Decisions

Why we built it this way

Each architectural choice balances detection sensitivity, false-positive control, and operational practicality.

🎯

Single Global Model

One XGBoost model with sub_idx as a feature serves all 162 subsystems, sharing the CT→VPO physics while learning per-unit offsets. More data-efficient and maintainable than 162 separate models.

🚫

No VPO Lags

Deliberately excludes autoregressive VPO features. During slow leaks, VPO lags would let the model “explain away” anomalous values, suppressing the residual signal. This sacrifices ~2 R² points for far better anomaly sensitivity.

🗳

Multi-Detector Voting

Four detectors with different time horizons vote together. Requiring ≥2/4 agreement dramatically reduces false positives while maintaining detection power across leak profiles (slow, sudden, sustained).

🏢

Family Aggregation

Leaks affect shared piping, so multiple subsystems deviate together. The dual threshold (>5% AND ≥3 absolute) filters out single-unit faults while catching real leaks that propagate across families.

⏳

Hysteresis State Machine

Escalation requires sustained conditions (4h→12h→24h→48h). The 6-hour downgrade delay prevents alert fatigue from oscillating borderline conditions. Fast-track triggers bypass timers for severe cases.

🧪

Healthy-Only Calibration

Thresholds are calibrated at the 99th percentile of healthy-data statistics, never tuned on leak events. This preserves statistical validity of the FPR guarantee and ensures generalisation to novel leak patterns.

Stack

Technical foundation

Built with robust ML and data engineering libraries, GPU-accelerated training, and config-driven detection.

ML Model

XGBoost (GPU)

Gradient-boosted regression trees with CUDA histogram splitting. 100K trees with early stopping, depth 6, aggressive regularisation.

Data Processing

pandas + NumPy

Vectorised residual computation, rolling-window detectors, and wide-to-panel format conversion for 7.6M+ rows.

Visualisation

Plotly + Matplotlib

Interactive Dash dashboards for exploration. Static matplotlib figures for stakeholder reports and publication-quality PDF exports.

Unsupervised

Isolation Forest + PCA

Optional unsupervised scoring layer on the residual vector for additional anomaly evidence beyond the four primary detectors.

Configuration

JSON Config-Driven

All detector parameters, voting rules, family weights, and state machine timings in detection_config.json — no hardcoded thresholds.

Environment

Conda + CUDA

Reproducible conda environment with GPU-accelerated training on NVIDIA hardware. Full pipeline from raw data to dashboard in numbered scripts.

Roadmap

What's next

The detection engine is operational. Next steps focus on data quality, deployment, and expansion.

📡

Clean CT Data Recovery

119/162 subsystems reported frozen CT values from Apr–Aug 2025 due to a data export issue. Obtaining clean CT data from alternative source will unlock full-period blind testing.

☁️

Real-Time Deployment

Deploy the online engine for live 15-minute monitoring with email/SMS alerting to facility managers. Current pipeline processes events retrospectively — next step is live streaming.

🏪

Multi-Store Generalisation

Adapt the model to additional grocery stores with different system configurations. The single-model architecture with sub_idx embedding makes transfer learning straightforward.