How to Detect Configuration Drift in MikroTik Routers
Configuration drift is one of the quieter failure modes in network management. Routers that were identical at deployment gradually diverge — through manual fixes, firmware upgrades, emergency changes, and accumulated tweaks. This article explains why drift happens specifically with RouterOS, why it is difficult to detect, and what an effective solution looks like in practice.
The Problem
Configuration drift describes the gap between the intended state of a device and its actual running configuration. For a single router this is manageable. Across a fleet of dozens or hundreds of MikroTik devices, it becomes a real operational hazard.
The pattern is familiar: an engineer connects via WinBox to resolve an outage, adds a static route or adjusts a firewall rule, and moves on. The fix never makes it into documentation or a change ticket. Later, a firmware upgrade silently adds new default values. Someone else modifies the same firewall rule "temporarily" during a maintenance window and forgets to revert it.
After six months, the device is running something no one fully understands. If the hardware fails, reproducing that config from scratch is guesswork.
Why RouterOS Makes This Hard
RouterOS does not include a native mechanism for tracking configuration changes or comparing configs across devices. This is not a complaint — it is just a fact of how the platform is designed, and it matters when you are trying to build operational processes around it.
A few specific pain points:
- No built-in config versioning. RouterOS does not maintain a history of what changed, when, or who changed it. The running config is the only version.
- Export output is not deterministic across firmware versions. The
/exportcommand can produce different ordering, include new default keys, or drop previously explicit values when you upgrade RouterOS. Naively diffing exports from devices on different firmware versions produces noise. - WinBox changes leave no audit trail. Actions taken through the GUI are not logged in a way that survives reboots or is easily queryable.
- No desired-state model. RouterOS does not have a concept of "this is what the config should be." The running config is authoritative by definition. There is nothing to check against.
- Fleet comparison requires external tooling. There is no native way to look across twenty devices and ask which ones have diverged from each other.
Common Workarounds
Engineers who have hit this problem have developed several approaches, each with real limitations.
Scheduled /export to FTP or SFTP. This is the most common approach and it does produce periodic snapshots. The problem is what happens next: text dumps pile up in a directory, and comparing them requires either manual inspection or custom scripting. When a device exports 800 lines of config, spotting a single changed firewall rule by eye is unreliable.
The Dude. MikroTik's own monitoring tool tracks device health and topology well. It does not track configuration changes. It will tell you a router is up; it will not tell you its firewall rules changed overnight.
Custom diff scripts. Some teams build shell scripts that pull exports, normalize whitespace, strip firmware-version noise, and run diff. This can work, but these scripts are fragile. They break on RouterOS upgrades, fail silently when a device is unreachable, and tend to accumulate exceptions and special cases until the person who wrote them is the only one who understands them.
Spreadsheets. For small deployments, a spreadsheet tracking what each site should have configured is better than nothing. It does not scale, and it is only as accurate as the last time someone updated it.
What a Proper Solution Requires
Solving configuration drift effectively requires a few things working together.
First, automated, periodic snapshots from every device. Manual processes do not hold up — the snapshot needs to happen whether or not an engineer remembers to trigger it. The interval should be configurable; some environments need hourly snapshots, others daily.
Second, version history with diff visibility. Storing snapshots is only useful if you can compare them. You need to be able to see exactly what changed between two points in time — not just that something changed, but which lines were added, removed, or modified. A side-by-side diff view makes this fast to review.
Third, alerts when configs change unexpectedly. Drift you don't know about is the dangerous kind. An alert when a device's config changes between polling cycles lets you investigate before that change causes a problem, rather than after.
Fourth, an audit trail tied to user actions. When a config change comes from a push made through your management platform, you want to know which user initiated it, when, and what it contained. This is separate from detecting drift caused by out-of-band changes — you need both.
How The Other Dude Handles Configuration Drift
The Other Dude polls RouterOS devices on a configurable interval using the RouterOS binary API (port 8729, TLS). On each poll cycle it retrieves the full running configuration and stores it in PostgreSQL alongside a complete version history. Every stored snapshot is compared to the previous one; if anything changed, the difference is recorded.
The web UI includes a side-by-side diff viewer. You can select any two snapshots for a device — or compare two different devices — and see exactly which lines differ. This makes it straightforward to answer questions like "what changed on this router between Tuesday and Thursday" or "why does this branch site have different firewall rules than the others."
Config changes pushed through the platform are recorded in an audit trail with full user attribution. If someone pushes a new firewall ruleset or modifies an interface address, that action is logged with the user, timestamp, and the exact config diff applied. Out-of-band changes made directly via WinBox or SSH will show up in the next polling cycle as an unexpected diff.
For safe config pushes, The Other Dude uses a two-phase approach: changes are applied to the device, and the platform waits for a confirmation that the device is still reachable. If the device goes silent after the change — which can happen if a firewall or routing change cuts off the management path — the platform automatically reverts to the previous config. This significantly reduces the risk of locking yourself out of a remote device.
For fleet-scale work, the platform supports config templates with variable substitution. You can define a template for a class of site (branch office, retail location, distribution hub) and push it across a batch of devices with per-device values filled in. This makes it easier to maintain consistency across similar sites and to identify which devices have diverged from that common baseline. To be clear: the current implementation detects config changes between snapshots. Full desired-state compliance checking — where the system continuously validates each device against a canonical template and flags deviations — is not yet implemented, but the snapshot and diff infrastructure is designed to support it.