Supermarket Refrigerant Leak Detection
Smart Leak Detection
A two-phase research program with Neelands Group tackling refrigerant leak detection in supermarket refrigeration. Phase 1 (published, Energies 2024) introduced CatBoost-based detection for HFC systems. Phase 2 scales to CO2 systems with 162 subsystems, four parallel detectors, and a hysteresis state machine.
Precision Leak Detection in Supermarket Refrigeration Systems
Integrating Categorical Gradient Boosting with Advanced Thresholding for HFC-based refrigeration.
Rashinda Wijethunga, Hooman Nouraei, Craig Zych, Jagath Samarabandu, Ayan Sadhu
doi.org/10.3390/en17030736 · Open Access (CC BY)
Why leak detection matters
Supermarket refrigeration accounts for ~50% of total energy consumption. Leaks have severe economic and environmental consequences.
$150K Annual Energy Cost
Refrigeration costs ~1% of total sales, equal to the average net profit margin. A 10% energy reduction can double profits. Leak rates average 11% annually, spiking to 30%.
Environmental Impact
HFCs have atmospheric lifespans up to 14 years. Regulatory bodies worldwide (Kigali Amendment, EU F-gas) mandate stricter leak monitoring and repair timelines.
Detection Gap
Most existing AFDD methods depend on the refrigerant level sensor in the receiver tank, which is unavailable in most supermarkets. Many only address one leak type.
Three novel contributions
A complete framework for both slow and catastrophic leak detection, independent of receiver-level sensors.
Dual Leak Detection
First framework to detect both slow and catastrophic leaks under a single system. These show contrasting behaviours requiring fundamentally different detection strategies.
Receiver-Independent
Uses only universally available parameters: COP, subcooling, superheat, mass flow rate, compression ratio, and energy consumption. No receiver-level sensor required.
False Alarm Mitigation
First leak detection solution for supermarkets with built-in false alarm mitigation, handling control mode transitions (defrost, heat reclaim, condenser splits) that mimic anomalies.
Overall algorithm architecture
Two parallel pipelines share a common CatBoost prediction backbone, with specialised anomaly detection for each leak type.
Fig. 4: Overall algorithm architecture for leak detection. Left branch handles catastrophic leaks (15-min data, dynamic thresholding). Right branch handles slow leaks (1-hour data, cumulative error analysis).
Non-parametric dynamic thresholding
A CatBoost model predicts each target feature during normal operation. During a leak, prediction errors spike — the challenge is distinguishing genuine anomalies from noise.
Prediction errors e(t) = (y(t) − ŷ(t))² are computed in real-time and smoothed using an exponentially weighted average to eliminate sharp spikes from noise.
A set of candidate thresholds ε(zi) = μ(es) + ziσ(es) is generated for z ranging from 2 to 10. The optimal ε* maximises separation between normal and anomalous data.
All error values above ε* in continuous sequences Eseq receive an anomaly severity score based on how far they exceed the threshold relative to the error statistics.
A pruning vector emax ranks anomalous subsequences by peak error. Walking through this vector, if the percentage decrease di between consecutive peaks exceeds threshold p, earlier sequences are confirmed as true anomalies; the rest are reclassified as normal noise.
if di > p → confirm anomalies j < i
Cumulative error analysis for gradual leaks
Slow leaks don't significantly affect system efficiency until substantial refrigerant has been lost. A fundamentally different detection strategy is needed.
As the system enters a slow leak state, the partial correlation between input variables (X) and target variables (Y) shifts. The CatBoost model, trained on healthy data, produces a gradually increasing prediction error that accumulates over time. The key innovation is tracking the bi-weekly gradual rise of cumulative squared errors while removing rapid spikes caused by noise.
CatBoost predicts COP from 1-hour sampled data. Only COP responds to slow leaks.
Squared errors are accumulated: Ce²(t) = Σe²(t). Gradient change reveals leak onset.
ΔRapid removed using gradient G(t) > μ + 3σ. Only gradual, consistent accumulation retained.
DΔGradual resampled daily. Threshold T = 1.8 × max(validation) applied per store.
Detection performance
Evaluated on real-world data from Canadian supermarkets with HFC refrigerant.
Refrigerant dropped from 40% to 30% within hours. Five of six target features showed clear anomaly signals. Subcooling temperature achieved perfect F1 = 1.0. Energy consumption (kW) showed no response, proving it is insufficient alone.
Leaks persisted for weeks/months before manual detection. The algorithm detected them within an average of 5 days. Only COP responded; other features showed no gradual change. Average F1 = 0.92 across all stores.
| Target Parameter | Precision | Recall | F1 Score |
|---|---|---|---|
| Subcooling Temperature | 1.0000 | 1.0000 | 1.0000 |
| Compression Ratio | 0.7894 | 1.0000 | 0.8824 |
| COP | 0.9000 | 0.6000 | 0.7200 |
| Mass Flow Rate | 1.0000 | 0.5333 | 0.6957 |
| Superheat Temperature | 0.3684 | 0.4667 | 0.4118 |
| kW Value | 0.0000 | 0.0000 | 0.0000 |
Catastrophic leak detection performance (point-adjusted F1 scores). Slow leak: COP F1 = 0.92 avg across 3 stores.
| Parameter | MAPE (%) | R² Score | False Alarm Rate (%) |
|---|---|---|---|
| COP | 1.47 | 0.877 | 0.27 |
| Subcooling | — | 0.827 | 0.13 |
| Superheat | 1.38 | 0.904 | 0.40 |
| Mass Flow Rate | 2.55 | 0.802 | 0.33 |
| Compression Ratio | 1.27 | 0.764 | 0.10 |
| kW Value | 2.13 | 0.600 | 0.10 |
Prediction model & anomaly detection performance during non-leak operation (low false alarm rates confirm robust calibration).
Scaling to CO2 Transcritical Refrigeration
An intelligent anomaly detection system monitoring 162 subsystems with XGBoost residuals, four parallel detectors, and a hysteresis state machine.
Why automated leak detection?
CO2 refrigerant leaks in commercial systems are costly, dangerous, and invisible to conventional monitoring.
Massive Financial Loss
A single leak event can lose 350–1,400+ lbs of CO2 refrigerant. The Neelands store has experienced 12 documented incidents in 3 years, each requiring emergency service calls and risking product spoilage.
No Direct Leak Sensor
Existing refrigerant-level sensors have limited coverage and unreliable readings. The system must infer leaks indirectly from operational telemetry — specifically from how valve positions deviate from expected behaviour.
Speed Saves Money
Slow leaks can persist for weeks before manual detection. Every hour of earlier detection saves refrigerant, prevents compressor damage, and avoids food safety incidents from rising case temperatures.
From raw sensors to model-ready features
A 6-step pipeline transforms 15-minute sensor readings into a panel dataset with 35 engineered features per subsystem.
Four-layer detection pipeline
From raw predictions to system-level alerts: each layer progressively filters noise and increases confidence.
Fig. 1: Four-layer detection pipeline. Layer 1 computes per-subsystem z-scored residuals from XGBoost predictions. Layer 2 runs four parallel detectors with majority voting. Layer 3 aggregates flagged subsystems by refrigeration family. Layer 4 escalates alerts through a hysteresis state machine.
Four parallel anomaly detectors
Each detector is tuned to catch a different leak profile. They vote together — a subsystem is flagged only when ≥2 agree.
Accumulates deviations above a drift allowance. Catches slow, sustained leaks over hours/days. Exponential decay prevents stale accumulation after anomaly clears.
S−[t] = max(0, S−[t-1] - z[t] - 0.5)
if |z| < 3: apply ×0.85 decay
Exponentially weighted moving average with ~2.5-hour effective memory. Catches gradual trend changes that CUSUM might take longer to accumulate.
flag if |EWMA| > 4.32
Mean z-score over a 24-hour sliding window. Catches persistent subtle elevation that is individually small but collectively significant over a full day.
flag if |rolling_z| > 1.94
Fires when z-score exceeds 4σ for 3+ consecutive timesteps. Catches sudden catastrophic events — pipe bursts, major valve failures, compressor shutdowns.
→ spike_flagged = True
Leak detection simulator
Watch the detection pipeline respond in real time. Inject different leak types and see how detectors, voting, and the state machine react. All four layers run live in your browser.
Detection performance on historical events
Evaluated on 8 real leak events from 2022–2025. The system detected all events, with time-to-detection ranging from hours to days depending on leak severity.
| Event | Period | Description | Severity | Peak Level |
|---|---|---|---|---|
| 02 | Mar–Jun 2022 | Slow leak (~800 lb) | Major | CRITICAL |
| 04 | Sep–Nov 2022 | System #30 leak | Moderate | ALERT |
| 05 | Dec 2022 | Major leak (bakery) | Major | CRITICAL |
| 06 | Apr–Jun 2023 | Power outage + leaks | Severe | CRITICAL |
| 07 | Aug 2023 | Multi-day leak (350 lb) | Moderate | ALERT |
| 08 | Dec 2023–Jan 2024 | Suspicious activity | Mild | WARNING |
| 10 | Jul 2024 | Extended abnormal | Moderate | ALERT |
| 12 | Dec 2024–Jan 2025 | Suspicious activity | Mild | WARNING |
Why we built it this way
Each architectural choice balances detection sensitivity, false-positive control, and operational practicality.
Single Global Model
One XGBoost model with sub_idx as a feature serves all 162 subsystems, sharing the CT→VPO physics while learning per-unit offsets. More data-efficient and maintainable than 162 separate models.
No VPO Lags
Deliberately excludes autoregressive VPO features. During slow leaks, VPO lags would let the model “explain away” anomalous values, suppressing the residual signal. This sacrifices ~2 R² points for far better anomaly sensitivity.
Multi-Detector Voting
Four detectors with different time horizons vote together. Requiring ≥2/4 agreement dramatically reduces false positives while maintaining detection power across leak profiles (slow, sudden, sustained).
Family Aggregation
Leaks affect shared piping, so multiple subsystems deviate together. The dual threshold (>5% AND ≥3 absolute) filters out single-unit faults while catching real leaks that propagate across families.
Hysteresis State Machine
Escalation requires sustained conditions (4h→12h→24h→48h). The 6-hour downgrade delay prevents alert fatigue from oscillating borderline conditions. Fast-track triggers bypass timers for severe cases.
Healthy-Only Calibration
Thresholds are calibrated at the 99th percentile of healthy-data statistics, never tuned on leak events. This preserves statistical validity of the FPR guarantee and ensures generalisation to novel leak patterns.
Technical foundation
Built with robust ML and data engineering libraries, GPU-accelerated training, and config-driven detection.
What's next
The detection engine is operational. Next steps focus on data quality, deployment, and expansion.
Clean CT Data Recovery
119/162 subsystems reported frozen CT values from Apr–Aug 2025 due to a data export issue. Obtaining clean CT data from alternative source will unlock full-period blind testing.
Real-Time Deployment
Deploy the online engine for live 15-minute monitoring with email/SMS alerting to facility managers. Current pipeline processes events retrospectively — next step is live streaming.
Multi-Store Generalisation
Adapt the model to additional grocery stores with different system configurations. The single-model architecture with sub_idx embedding makes transfer learning straightforward.