Mitacs Research CatBoost / XGBoost Anomaly Detection Real-Time Monitoring Published — Energies 2024

Supermarket Refrigerant Leak Detection

Smart Leak Detection

A two-phase research program with Neelands Group tackling refrigerant leak detection in supermarket refrigeration. Phase 1 (published, Energies 2024) introduced CatBoost-based detection for HFC systems. Phase 2 scales to CO2 systems with 162 subsystems, four parallel detectors, and a hysteresis state machine.

Published — Energies 2024 17, 736

Precision Leak Detection in Supermarket Refrigeration Systems

Integrating Categorical Gradient Boosting with Advanced Thresholding for HFC-based refrigeration.

Rashinda Wijethunga, Hooman Nouraei, Craig Zych, Jagath Samarabandu, Ayan Sadhu
doi.org/10.3390/en17030736 · Open Access (CC BY)

0.92
Slow Leak Avg F1
1.00
Subcooling F1
5 days
Avg Early Detection
6
Target Features
3
Real Supermarkets

Why leak detection matters

Supermarket refrigeration accounts for ~50% of total energy consumption. Leaks have severe economic and environmental consequences.

💸

$150K Annual Energy Cost

Refrigeration costs ~1% of total sales, equal to the average net profit margin. A 10% energy reduction can double profits. Leak rates average 11% annually, spiking to 30%.

🌍

Environmental Impact

HFCs have atmospheric lifespans up to 14 years. Regulatory bodies worldwide (Kigali Amendment, EU F-gas) mandate stricter leak monitoring and repair timelines.

🔍

Detection Gap

Most existing AFDD methods depend on the refrigerant level sensor in the receiver tank, which is unavailable in most supermarkets. Many only address one leak type.


Three novel contributions

A complete framework for both slow and catastrophic leak detection, independent of receiver-level sensors.

🔍

Dual Leak Detection

First framework to detect both slow and catastrophic leaks under a single system. These show contrasting behaviours requiring fundamentally different detection strategies.

Receiver-Independent

Uses only universally available parameters: COP, subcooling, superheat, mass flow rate, compression ratio, and energy consumption. No receiver-level sensor required.

🚨

False Alarm Mitigation

First leak detection solution for supermarkets with built-in false alarm mitigation, handling control mode transitions (defrost, heat reclaim, condenser splits) that mimic anomalies.


Overall algorithm architecture

Two parallel pipelines share a common CatBoost prediction backbone, with specialised anomaly detection for each leak type.

Raw Dataset Data Preprocessing Identification of Leak events, Service dates, and Normal Operation 15 min sampled data Feature Engineering Train Set Val Set Test Set CatBoost regressor model Prediction Error for Target feature i Dynamic Thresholding algorithm Mitigating False positives Catastrophic Leak detection 1 hour sampled data Feature Engineering Train Set Val Set Test Set CatBoost regressor model Prediction Error for Target feature i Cumsum of squared prediction error Bi-weekly diff of Cumsum (remove rapid) Sample into daily values (last of day) Threshold Value

Fig. 4: Overall algorithm architecture for leak detection. Left branch handles catastrophic leaks (15-min data, dynamic thresholding). Right branch handles slow leaks (1-hour data, cumulative error analysis).


Non-parametric dynamic thresholding

A CatBoost model predicts each target feature during normal operation. During a leak, prediction errors spike — the challenge is distinguishing genuine anomalies from noise.

Step 1: Smoothed Error Series

Prediction errors e(t) = (y(t) − ŷ(t))² are computed in real-time and smoothed using an exponentially weighted average to eliminate sharp spikes from noise.

es = EWMA(e(t−h), ..., e(t))
Step 2: Optimal Threshold Selection

A set of candidate thresholds ε(zi) = μ(es) + ziσ(es) is generated for z ranging from 2 to 10. The optimal ε* maximises separation between normal and anomalous data.

ε* = argmax [Δμ/μ + Δσ/σ] / [|ea| + |Eseq|²]
Step 3: Anomaly Scoring

All error values above ε* in continuous sequences Eseq receive an anomaly severity score based on how far they exceed the threshold relative to the error statistics.

s(i) = [max(eseq(i)) − ε*] / [μ(es) + σ(es)]
Step 4: False Positive Mitigation

A pruning vector emax ranks anomalous subsequences by peak error. Walking through this vector, if the percentage decrease di between consecutive peaks exceeds threshold p, earlier sequences are confirmed as true anomalies; the rest are reclassified as normal noise.

di = (emaxi−1 − emaxi) / emaxi−1
if di > p → confirm anomalies j < i

Cumulative error analysis for gradual leaks

Slow leaks don't significantly affect system efficiency until substantial refrigerant has been lost. A fundamentally different detection strategy is needed.

The Core Insight

As the system enters a slow leak state, the partial correlation between input variables (X) and target variables (Y) shifts. The CatBoost model, trained on healthy data, produces a gradually increasing prediction error that accumulates over time. The key innovation is tracking the bi-weekly gradual rise of cumulative squared errors while removing rapid spikes caused by noise.

1. Predict COP

CatBoost predicts COP from 1-hour sampled data. Only COP responds to slow leaks.

2. Cumulative Sum

Squared errors are accumulated: C(t) = Σe²(t). Gradient change reveals leak onset.

3. Remove Rapid Shifts

ΔRapid removed using gradient G(t) > μ + 3σ. Only gradual, consistent accumulation retained.

4. Daily Threshold

DΔGradual resampled daily. Threshold T = 1.8 × max(validation) applied per store.


Detection performance

Evaluated on real-world data from Canadian supermarkets with HFC refrigerant.

⚡ Catastrophic Leak (1 event)

Refrigerant dropped from 40% to 30% within hours. Five of six target features showed clear anomaly signals. Subcooling temperature achieved perfect F1 = 1.0. Energy consumption (kW) showed no response, proving it is insufficient alone.

🐢 Slow Leaks (3 events, 3 stores)

Leaks persisted for weeks/months before manual detection. The algorithm detected them within an average of 5 days. Only COP responded; other features showed no gradual change. Average F1 = 0.92 across all stores.

Target ParameterPrecisionRecallF1 Score
Subcooling Temperature1.00001.00001.0000
Compression Ratio0.78941.00000.8824
COP0.90000.60000.7200
Mass Flow Rate1.00000.53330.6957
Superheat Temperature0.36840.46670.4118
kW Value0.00000.00000.0000

Catastrophic leak detection performance (point-adjusted F1 scores). Slow leak: COP F1 = 0.92 avg across 3 stores.

ParameterMAPE (%)R² ScoreFalse Alarm Rate (%)
COP1.470.8770.27
Subcooling0.8270.13
Superheat1.380.9040.40
Mass Flow Rate2.550.8020.33
Compression Ratio1.270.7640.10
kW Value2.130.6000.10

Prediction model & anomaly detection performance during non-leak operation (low false alarm rates confirm robust calibration).

Phase 2 — In Progress Residual-Based Multi-Detector Voting

Scaling to CO2 Transcritical Refrigeration

An intelligent anomaly detection system monitoring 162 subsystems with XGBoost residuals, four parallel detectors, and a hysteresis state machine.

162
Subsystems Monitored
4
Parallel Detectors
7.6M
Training Samples
0.936
R² Validation
5
Alert Severity Levels
<1%
False Positive Rate

Why automated leak detection?

CO2 refrigerant leaks in commercial systems are costly, dangerous, and invisible to conventional monitoring.

💸

Massive Financial Loss

A single leak event can lose 350–1,400+ lbs of CO2 refrigerant. The Neelands store has experienced 12 documented incidents in 3 years, each requiring emergency service calls and risking product spoilage.

🔍

No Direct Leak Sensor

Existing refrigerant-level sensors have limited coverage and unreliable readings. The system must infer leaks indirectly from operational telemetry — specifically from how valve positions deviate from expected behaviour.

Speed Saves Money

Slow leaks can persist for weeks before manual detection. Every hour of earlier detection saves refrigerant, prevents compressor damage, and avoids food safety incidents from rising case temperatures.


From raw sensors to model-ready features

A 6-step pipeline transforms 15-minute sensor readings into a panel dataset with 35 engineered features per subsystem.

Step 2b–2c — Merge & Interpolate
Merge subsystem data (CT, VPO for 168 units) with system-level features (environmental, pressure, power). Linear-interpolate Indoor Temp, Outdoor Temp, Indoor Humidity (7–10% missing, 99%+ gaps ≤1 hour).
Step 3 — Anomaly Labeling
Mark 12 known service-record periods as anomalous. Apply sensor-threshold rules (e.g., kW<10, Pressure<500) with ±1-hour buffers. Creates binary HealthLabel column for train/test filtering.
Step 4 — Clean & Filter
Exclude 8 problematic subsystems (high missing data or sensor anomalies). Drop unused columns (power, pressure, PIDs). Interpolate small NaN gaps (≤2 rows) in CT/VPO. 168 → 162 active subsystems.
Step 5 — Feature Engineering
Create 6 lag features per signal (1.5 hrs history) for CT, VPO, and 3 environmental columns. Add cyclical sin/cos encodings for hour, day-of-week, and month. Total: 35 features per subsystem-timestamp pair.
Step 6 — Split & Panelise
Temporal split: Train (2022–2023), Val (Jan–Jun 2024), Test (Jun 2024+). Filter to "truly healthy" rows (current + 6 prior rows all healthy). Convert wide → panel format (one row per subsystem×timestamp). Final: 7.6M train, 1.9M val rows.
Step 7 — Train XGBoost
GPU-accelerated XGBoost regressor (depth=6, min_child=100, lr=0.05) trained on healthy-only data. Predicts VPO from CT + environmental + temporal features. Val R²=0.936, RMSE=3.80. VPO lags intentionally excluded to preserve anomaly sensitivity.

Four-layer detection pipeline

From raw predictions to system-level alerts: each layer progressively filters noise and increases confidence.

LAYER 1 — RESIDUAL COMPUTATION XGBoost Model Residual VPO_actual - VPO_pred Z-Score per-subsystem normalize x162 LAYER 2 — PER-SUBSYSTEM ANOMALY SCORING (x162) CUSUM bilateral + decay EWMA λ=0.1 Rolling Z 24h window Spike 4σ x 3 consecutive VOTE: ≥2 of 4 agree LAYER 3 — FAMILY AGGREGATION (x4 families) SGrLT 36 freezers SGrMT 83 coolers SGrSC 9 subcoolers SGrTC 34 transcritical >5% AND ≥3 flagged LAYER 4 — ALERT STATE MACHINE NORMAL 4h WATCH 12h WARNING 24h ALERT 48h CRITICAL downgrade one level after 6h unmet

Fig. 1: Four-layer detection pipeline. Layer 1 computes per-subsystem z-scored residuals from XGBoost predictions. Layer 2 runs four parallel detectors with majority voting. Layer 3 aggregates flagged subsystems by refrigeration family. Layer 4 escalates alerts through a hysteresis state machine.


Four parallel anomaly detectors

Each detector is tuned to catch a different leak profile. They vote together — a subsystem is flagged only when ≥2 agree.

Bilateral CUSUM
drift=0.5 · decay=0.85 · h=786

Accumulates deviations above a drift allowance. Catches slow, sustained leaks over hours/days. Exponential decay prevents stale accumulation after anomaly clears.

S+[t] = max(0, S+[t-1] + z[t] - 0.5)
S[t] = max(0, S[t-1] - z[t] - 0.5)
if |z| < 3: apply ×0.85 decay
EWMA
λ=0.1 · h=4.32

Exponentially weighted moving average with ~2.5-hour effective memory. Catches gradual trend changes that CUSUM might take longer to accumulate.

EWMA[t] = 0.1 · z[t] + 0.9 · EWMA[t-1]
flag if |EWMA| > 4.32
Rolling Z-Score
window=96 (24h) · h=1.94

Mean z-score over a 24-hour sliding window. Catches persistent subtle elevation that is individually small but collectively significant over a full day.

rolling_z[t] = mean(z[t-95], ..., z[t])
flag if |rolling_z| > 1.94
Spike Detector
4σ · 3 consecutive steps (45 min)

Fires when z-score exceeds 4σ for 3+ consecutive timesteps. Catches sudden catastrophic events — pipe bursts, major valve failures, compressor shutdowns.

if |z[t]| > 4.0 for 3 consecutive steps
→ spike_flagged = True

Leak detection simulator

Watch the detection pipeline respond in real time. Inject different leak types and see how detectors, voting, and the state machine react. All four layers run live in your browser.

Speed
Z-SCORE SIGNAL (avg across affected subsystems)
SGrLT
0/36
SGrMT
0/83
SGrSC
0/9
SGrTC
0/34
System Alert Level
NORMAL
0 steps in level
Detector States
CUSUM
EWMA
RollingZ
Spike
Voting & Aggregation
0/4 detectors active — not flagged

Detection performance on historical events

Evaluated on 8 real leak events from 2022–2025. The system detected all events, with time-to-detection ranging from hours to days depending on leak severity.

Event Period Description Severity Peak Level
02 Mar–Jun 2022 Slow leak (~800 lb) Major CRITICAL
04 Sep–Nov 2022 System #30 leak Moderate ALERT
05 Dec 2022 Major leak (bakery) Major CRITICAL
06 Apr–Jun 2023 Power outage + leaks Severe CRITICAL
07 Aug 2023 Multi-day leak (350 lb) Moderate ALERT
08 Dec 2023–Jan 2024 Suspicious activity Mild WARNING
10 Jul 2024 Extended abnormal Moderate ALERT
12 Dec 2024–Jan 2025 Suspicious activity Mild WARNING

Why we built it this way

Each architectural choice balances detection sensitivity, false-positive control, and operational practicality.

🎯

Single Global Model

One XGBoost model with sub_idx as a feature serves all 162 subsystems, sharing the CT→VPO physics while learning per-unit offsets. More data-efficient and maintainable than 162 separate models.

🚫

No VPO Lags

Deliberately excludes autoregressive VPO features. During slow leaks, VPO lags would let the model “explain away” anomalous values, suppressing the residual signal. This sacrifices ~2 R² points for far better anomaly sensitivity.

🗳

Multi-Detector Voting

Four detectors with different time horizons vote together. Requiring ≥2/4 agreement dramatically reduces false positives while maintaining detection power across leak profiles (slow, sudden, sustained).

🏢

Family Aggregation

Leaks affect shared piping, so multiple subsystems deviate together. The dual threshold (>5% AND ≥3 absolute) filters out single-unit faults while catching real leaks that propagate across families.

Hysteresis State Machine

Escalation requires sustained conditions (4h→12h→24h→48h). The 6-hour downgrade delay prevents alert fatigue from oscillating borderline conditions. Fast-track triggers bypass timers for severe cases.

🧪

Healthy-Only Calibration

Thresholds are calibrated at the 99th percentile of healthy-data statistics, never tuned on leak events. This preserves statistical validity of the FPR guarantee and ensures generalisation to novel leak patterns.


Technical foundation

Built with robust ML and data engineering libraries, GPU-accelerated training, and config-driven detection.

ML Model
XGBoost (GPU)
Gradient-boosted regression trees with CUDA histogram splitting. 100K trees with early stopping, depth 6, aggressive regularisation.
Data Processing
pandas + NumPy
Vectorised residual computation, rolling-window detectors, and wide-to-panel format conversion for 7.6M+ rows.
Visualisation
Plotly + Matplotlib
Interactive Dash dashboards for exploration. Static matplotlib figures for stakeholder reports and publication-quality PDF exports.
Unsupervised
Isolation Forest + PCA
Optional unsupervised scoring layer on the residual vector for additional anomaly evidence beyond the four primary detectors.
Configuration
JSON Config-Driven
All detector parameters, voting rules, family weights, and state machine timings in detection_config.json — no hardcoded thresholds.
Environment
Conda + CUDA
Reproducible conda environment with GPU-accelerated training on NVIDIA hardware. Full pipeline from raw data to dashboard in numbered scripts.

What's next

The detection engine is operational. Next steps focus on data quality, deployment, and expansion.

📡

Clean CT Data Recovery

119/162 subsystems reported frozen CT values from Apr–Aug 2025 due to a data export issue. Obtaining clean CT data from alternative source will unlock full-period blind testing.

☁️

Real-Time Deployment

Deploy the online engine for live 15-minute monitoring with email/SMS alerting to facility managers. Current pipeline processes events retrospectively — next step is live streaming.

🏪

Multi-Store Generalisation

Adapt the model to additional grocery stores with different system configurations. The single-model architecture with sub_idx embedding makes transfer learning straightforward.