MSc THESIS · UvA SNE × KPMG CYBER
The clean-up step that
broke the classifier
An obvious fix that quietly erases the evidence.
Stable-Edge Filtering for Passive OT Device Classification under Operational Change
Jonathan van den Heuvel · jvdhthesis.tech
THE PROBLEM
We classify what we can't scan
- Passive OT asset discovery is a routine KPMG deliverable.
- You can't active-scan a live plant — so we classify devices from captured traffic.
- But real segments are messy: laptops connect, scanners sweep, maintenance happens.
Does a simple clean-up step keep the classifier robust under that change?
THE IDEA + THE LAB
Keep only the connections that last
- Build the graph of who-talks-to-whom, then drop edges that don't persist over time.
- Intuition: a one-off engineering session or a scan sweep is noise — filter it out.
THE TESTBED
- 20
- HOSTS
- 5
- CLASSES · ctrl · sup · eng · hist · it
- 4
- CHANGE SCENARIOS
- 10×10
- LAB × MODEL SEEDS
- ∴
- INDUCTIVE (held-out hosts) · GraphSAGE
builds on Heo & Shin (2025) · artifacts released
RQ2 · THE RESULT (held-out macro-F1)
Neutral on four.
It breaks the fifth.
- Neutral on 4 of 5 scenarios: steady, onboarding, config drift, benign scanning.
- No robustness benefit anywhere — the upside we hoped for simply isn't there.
- Under maintenance it significantly hurts.
0.45 → 0.36
−0.089 · p = 0.027 · 8/10 runs worse
THE MECHANISM · MAINTENANCE
A paused controller looks like an idle laptop
- 1A controller (PLC) is paused for 40 minutes.
- 2Its polls stop — the “keep persistent edges” filter deletes all 20. in-degree 20→0 · in-bytes 2.1M→0
- 3Stripped bare, it has zero traffic — identical to an idle IT laptop → misread as “IT endpoint.”
A plain random forest breaks the same way — the filter destroys the evidence, not the model.
THE TAKEAWAY
Persistence is the wrong thing to filter on.
A cheap clean-up step can quietly erase the evidence your classifier depends on.
- Don't pre-filter passive OT graphs by recurrence.
- Edge meaning — protocol, direction, endpoint roles — beats persistence.
Reproducible lab · 4 scenarios · code released | jvdhthesis.tech · Jonathan van den Heuvel · jonathan50@live.nl