← Back to Docs
MikroTik Monitoring — The Other Dude

How to Monitor MikroTik Routers at Scale

If you manage more than a handful of MikroTik routers, "monitoring" stops meaning "is this device pingable" and starts meaning something harder. You need to know which of your 200 routers is spiking CPU before a user files a ticket. You need to find the access point with degraded wireless signal before the site calls in. You need bandwidth utilization trends to make capacity decisions, not just point-in-time readings. And you need to know the moment a device goes offline at 2am — not when someone shows up for work.

That's what real mikrotik router monitoring looks like in production.

The Problem with MikroTik Monitoring at Scale

Individual devices are easy. RouterOS has good per-device tooling. The problem is the fleet. When you're managing dozens or hundreds of routers across multiple sites, you have no single place to answer questions like:

These are fleet-level questions. They require a centralized data store, consistent polling, and a UI that surfaces the signal instead of burying you in noise.

Native RouterOS Monitoring Options

RouterOS gives you several monitoring tools. Each has real limitations when applied at fleet scale.

What MikroTik Monitoring Software Should Include

A purpose-built mikrotik monitoring software solution should handle the full picture — not just availability pings.

How The Other Dude Monitors MikroTik Routers

The Other Dude was built specifically for MikroTik fleet management. The monitoring stack is not bolted on — it's the core of what the platform does.

Collection via the RouterOS binary API. The Go-based poller connects to each device over the RouterOS binary API on TLS port 8729. This is not SNMP. There are no OIDs, no MIB files, no polling configuration per metric type. The API returns structured data directly from RouterOS resources, which is faster, more reliable, and requires no per-device SNMP configuration.

Three metric families. Each poll cycle collects health metrics (CPU, memory, disk, temperature), interface metrics (per-interface traffic rates calculated from cumulative counter deltas), and wireless metrics (client count, signal strength in dBm, CCQ per wireless interface). All three are stored in TimescaleDB hypertables with automatic time-based bucketing for efficient range queries.

Real-time browser updates. Metrics flow from the poller into NATS JetStream, then out to connected browsers via Server-Sent Events. The dashboard reflects current device state without polling the database on every page load.

Fleet health dashboard. The main view shows aggregate fleet health — how many devices are online, which have active alerts, uptime sparklines per device, and bandwidth charts for the busiest links. The "APs Needing Attention" card surfaces wireless access points with degraded signal or low CCQ so you can find problems before users do.

Per-device detail. Each device has its own page with health graphs over configurable time windows, per-interface traffic charts, and wireless metrics broken down by interface. You can see exactly what a device was doing at any point in its history.

Alert rules with duration thresholds. Alert rules combine a metric, a threshold, and a duration_polls count. A rule for "CPU > 90%" with duration_polls = 5 only fires after five consecutive polling intervals above the threshold. This eliminates noise from transient spikes. New tenants receive a default set of alert rules covering CPU, memory, disk, offline detection, wireless signal, and CCQ — sensible baselines that you can tune without starting from zero.

Notification channels. Alerts are delivered via email, webhook, or Slack. Maintenance windows let you suppress alerts during planned work without disabling the rules themselves.

Network topology map. An interactive topology view shows device interconnections across your fleet, giving you a structural context for interpreting monitoring data.

Related Guides