Safety IntelligenceFebruary 17, 2026By Patrick Songore, Founder, GangoAI

Deterministic vs Probabilistic: Why Safety-Critical Systems Cannot Guess

When an AI system flags a worker as unfit, what happens next? Someone has a conversation. A shift gets reassigned. A vehicle stays in the yard. These are real decisions with real consequences - for the business and for the person being flagged. The question is: was that flag based on a measurement, or a guess?

How Safety Systems Work Today

There are several categories of fatigue and impairment detection technology deployed across transport, mining, and industrial sectors today. Each takes a different approach. Each has specific limitations that are worth understanding before you invest.

In-cab cameras are the dominant approach. Systems from Seeing Machines (Guardian), Lytx, Netradyne (Driveri), and Caterpillar's MineStar DSS use driver-facing cameras to track eye closure, head position, yawning, and facial expressions in real time. The core metric most rely on is PERCLOS - the percentage of eyelid closure over the pupil over time. When the system detects signs of drowsiness, it triggers an audible alarm, a seat vibration, or both. These systems can trace their alerts to specific observations. A camera can tell you that a driver's eyes were closed for more than one second, or that their head nodded at a specific timestamp. That traceability is real.

Vehicle behaviour monitoring takes a different approach entirely. Systems like Mercedes-Benz Attention Assist, Volvo Driver Alert Control, and similar OEM features do not watch the driver at all. They monitor the vehicle - steering patterns, lane position, pedal inputs. Mercedes Attention Assist analyses over 70 parameters of driving behaviour in the first few minutes of a journey to build a baseline, then watches for deviations that suggest fatigue. These systems are affected by external factors like crosswinds and road conditions, and they measure vehicle behaviour as a proxy for human state rather than measuring the human directly.

Lane Departure Warning Systems (LDWS) use forward-facing cameras to detect lane markings and alert the driver if the vehicle drifts across lanes without indicating. Mandatory on new EU vehicles since 2022, these are a last-resort catch for when fatigue has already caused the vehicle to drift. They do not detect fatigue. They detect the consequences of it.

Wearable devices take a physiological approach. Smart helmets embed EEG electrodes to measure brainwave activity associated with fatigue. Optalert uses specialised glasses with an infrared LED that measures eyelid velocity 500 times per second, producing a drowsiness score on their Johns Drowsiness Scale (JDS). Fatigue Science's ReadiWatch tracks sleep patterns using actigraphy and feeds the data into a biomathematical model originally developed by the US Army to predict fatigue risk up to 18 hours ahead.

Predictive software like Fatigue Science's Readi platform uses machine learning to estimate fatigue risk from shift schedules and sleep data - in some cases without wearables at all, using demographic data and historical patterns from a database of over four million de-identified sleeps. The output is a predicted fatigue score for each worker before their shift begins.

Smartphone-based tests such as the Psychomotor Vigilance Task (PVT), originally developed by NASA, measure reaction times as a proxy for cognitive alertness. These are typically administered before a shift as a point-in-time snapshot.

Each of these approaches has value. Some are well-validated. But they all share one or more of three fundamental limitations: they are reactive, invasive, or probabilistic. Understanding which limitation applies to your current system - and what it means for the decisions being made based on its output - is the point of this article.

The Reactive Problem

In-cab cameras, vehicle behaviour monitoring, lane departure warnings, and wearable devices like Optalert's glasses all share the same structural limitation: they can only assess a person once they are already performing the task. They detect fatigue after it has manifested - after the driver is behind the wheel, on the road, and already a risk.

Peer-reviewed research into in-vehicle fatigue detection systems has noted explicitly that their current focus on detecting fatigue only after it has manifested means there is an opportunity to develop more proactive measures. That is not a criticism of the technology. It does what it is designed to do. But if you are a fleet operator asking "is this person fit to drive today?", a system that answers the question twenty minutes into the journey is answering it too late. The vehicle has already left the yard. The risk has already begun.

Predictive software like Readi takes a different approach - it estimates fatigue risk before the shift begins. That is the right timing. But it does so by prediction, not measurement. It estimates your likely fatigue state based on how statistically similar people have slept and worked. The wearable-free version goes further, predicting your sleep from your schedule and demographics without measuring you at all. Prediction is valuable for planning. It is not the same as evidence.

The False Positive Problem Is Worse Than You Think

Lytx, one of the largest in-cab camera providers, recently launched a fatigue detection feature that they claim achieves 90% accuracy - but only after a human analyst reviews the AI's output. Their own marketing states that most fatigue solutions on the market offer accuracy rates below 50%. That means for the majority of systems in commercial deployment today, more than half of all fatigue alerts are wrong.

Even Lytx's 90% figure requires a person in a monitoring centre to watch the video clip and confirm whether the driver was actually fatigued before the alert reaches the fleet manager. The AI alone is not trusted to make the decision. The system flags, a human checks the flag, and then the alert is sent. That process adds latency and cost - and it does not scale. If you are running a fleet of 500 vehicles, you need a team of analysts watching video clips around the clock.

The research literature explains why camera-based accuracy is so difficult. PERCLOS - the core metric most in-cab systems rely on - is susceptible to head movement taking the eyes out of the camera's field of view, glasses and sunglasses occluding the face, and sunlight reflections, glare, dust, and changes in humidity. These are not edge cases. These are everyday conditions in a commercial vehicle. A peer-reviewed narrative review published in SLEEP Advances concluded that PERCLOS alone may not be sufficiently sensitive for detecting drowsiness caused by factors other than falling asleep, such as inattention or distraction.

Optalert's glasses fare better on accuracy - their JDS produced a third as many false alerts as the Karolinska Sleepiness Scale at 100% sensitivity in comparative testing. But in a controlled laboratory validation study, only 67.2% of data sessions passed signal quality checks. Nearly a third of all data had to be excluded due to improper positioning of the glasses, the wearer looking past the sensor, or other signal quality issues. That was in a laboratory. Real-world conditions would be worse.

Wearable approaches face a different accuracy problem. Physiological signals like heart rate, skin conductance, and EEG vary significantly between individuals and are susceptible to environmental conditions, emotions, and other non-fatigue factors. A worker who has just had a coffee, walked up a flight of stairs, or is anxious about a personal matter will produce different physiological readings - none of which have anything to do with fatigue.

Every false positive erodes trust. A worker flagged incorrectly once will tolerate it. Flagged incorrectly twice, they start to resent the system. Flagged incorrectly three times, and the entire workforce views the technology as unreliable. Research into fatigue detection systems acknowledges that frequent false alarms lead to "alarm fatigue" - where drivers ignore or even deactivate the systems entirely. When your safety system gets switched off because workers do not trust it, you have spent money to make your operation less safe.

The Population Threshold Problem

Most of these systems - cameras, vehicle monitoring, PERCLOS-based wearables - compare against population-level thresholds. The system decides you are fatigued because your eye closure pattern, steering behaviour, or physiological readings look like other people who were fatigued. But what counts as "normal" varies enormously between individuals.

A driver who naturally blinks slowly is not the same as a driver who is falling asleep. A driver who makes frequent small steering corrections is not the same as a driver who is drifting. Without a personal baseline - a measurement of what normal looks like for that specific individual - the system cannot tell the difference. It will consistently flag some people who are perfectly fine and consistently miss others who are genuinely impaired.

Mercedes-Benz Attention Assist is one of the few systems that does build a per-session baseline - it learns your driving patterns in the first few minutes of the journey. But it resets every time you turn off the engine. It has no memory of what your driving normally looks like across days, weeks, or months. It also only works above 37mph and is measuring the vehicle, not the person.

What Deterministic Means in Practice

A deterministic system does not guess. It measures. Every output traces directly to a specific input. When the system flags someone, it can tell you exactly what was measured, how that measurement compares to what is normal for that specific individual, and by how much it deviated. There is no confidence score. There is a measurement and a threshold - calibrated to that person.

The difference between "your eye closure exceeded a population threshold" and "your behavioural pattern this morning deviated significantly from your own established baseline" is not semantic. It is the difference between a system that treats every person the same and one that understands what normal looks like for each individual.

This matters for three reasons that go beyond technical preference.

The conversation with the worker

When a supervisor approaches a worker who has been flagged, the conversation is fundamentally different. "The system thinks you might be fatigued" is accusatory and vague. "Your movement pattern this morning is significantly different from your normal baseline" is specific and objective. One feels like surveillance. The other feels like measurement. Workers respond very differently to each.

The audit trail

When an incident occurs and the investigation begins, regulators do not want to hear that your system was "78% confident" or that a human analyst reviewed a video clip and agreed the driver looked tired. They want to see what was measured, when, and how the decision was made. A deterministic system produces an audit trail that traces every decision to a specific data point for a specific individual. A probabilistic system produces a number that even its own developers may not be able to fully explain.

The timing

An in-cab camera, a pair of drowsiness glasses, or a steering pattern monitor can only assess a driver once they are driving. By definition, they detect impairment after the risk has already begun. A pre-shift system answers the question before the keys are in the ignition. The difference is not incremental. It is the difference between responding to a problem and preventing one.

The Black Box Problem

The EU AI Act introduces specific requirements around explainability for AI systems used in safety-critical applications. High-risk AI systems must provide outputs that are interpretable by the humans overseeing them. Notably, the regulation bans emotion recognition in the workplace but explicitly permits fatigue detection - a distinction that is technical, legal, and increasingly consequential for how these systems are designed.

Any organisation deploying AI in a safety context should be asking: if this system flags someone, can we explain exactly why to that person, to their union representative, to a regulator, and to a court? If the answer involves confidence scores, weighted features, or "the model detected a pattern," that is not an explanation. That is a restatement of the problem.

This is particularly important for union fleets. Some of the largest fleet operators in the world cannot deploy camera-based monitoring because their union contracts prohibit biometric data collection or continuous video surveillance of workers. A system that requires a camera watching the driver's face for the entire shift, or specialised glasses that must be worn throughout the working day, is a non-starter in these environments. A system that performs a brief, non-biometric behavioural assessment before the shift begins does not trigger the same objections.

What to Ask Your Provider

The distinction between deterministic and probabilistic is not always obvious from marketing materials. Most providers describe their systems as "AI-powered" without specifying how decisions are made. Four questions will clarify:

  1. Does your system assess fitness before or after the worker starts their shift? If the answer is "during," the system is reactive by design. It can only catch impairment after the risk has begun. If the answer is "we predict it based on sleep data or shift patterns," the system is probabilistic - it estimates rather than measures.
  2. Does the system compare against a personal baseline or a population threshold? If it uses population-level thresholds - including generic PERCLOS values, standard EEG bands, or fixed steering deviation limits - it will consistently flag some people who are perfectly fine and consistently miss others who are genuinely impaired.
  3. Can a non-technical supervisor understand the output without additional interpretation? If the explanation requires a data scientist, a monitoring centre analyst reviewing video, or a biomathematical model to interpret, it fails the human oversight requirement. A supervisor should be able to look at the output and understand what was measured and what changed.
  4. What is your false positive rate, and how do you measure it? A provider who cannot answer this question precisely has not validated their system rigorously. If the answer involves human review to confirm the AI's output, ask what the accuracy is without that step - because the human review does not scale. If the industry leader claims most competing solutions are below 50% accuracy, ask where your current system sits on that spectrum.

The Standard Should Be Higher

We do not accept guesswork from the systems that monitor structural integrity of bridges, or the instruments that measure air quality in mines, or the gauges that track pressure in pipelines. These are measurement systems. They produce readings, not predictions. They trace to calibrated inputs, not learned patterns.

Safety monitoring of people should meet the same standard. If the output affects whether someone works today, it should be based on what was measured against what is normal for that individual - not what was predicted based on how other people behaved, or what a camera inferred after the shift had already started.

Every Flag Traces to a Measurement

Pre-shift. Deterministic. Personal baselines. Zero biometrics. Zero false positives in validation. Patent pending.