TOD — The Other Dude

Fleet management for MikroTik RouterOS devices. Built for MSPs who manage hundreds of routers across multiple tenants. Think “UniFi Controller, but for MikroTik.”

TOD is a self-hosted, multi-tenant platform that gives you centralized visibility, configuration management, real-time monitoring, and zero-knowledge security across your entire MikroTik fleet.

Features

  • Fleet — Dashboard with at-a-glance fleet health, virtual-scrolled device table, geographic map, and subnet scanner for device discovery.
  • Configuration — Config Editor with two-phase safe apply, batch configuration across devices, bulk CLI commands, reusable templates, Simple Config (Linksys/Ubiquiti-style UI), and git-backed config backup with diff viewer.
  • Monitoring — Interactive network topology (ReactFlow + Dagre), real-time metrics via SSE/NATS, configurable alert rules, notification channels (email, webhook, Slack), audit trail, KMS transparency dashboard, and PDF reports.
  • Security — 1Password-style zero-knowledge architecture with SRP-6a auth, 2SKD key derivation, Secret Key with Emergency Kit, OpenBao KMS for per-tenant envelope encryption, Internal CA with SFTP cert deployment, WireGuard VPN, and AES-256-GCM credential encryption.
  • Administration — Full multi-tenancy with PostgreSQL RLS, user management with RBAC, API keys (mktp_ prefix), firmware management, maintenance windows, and setup wizard.
  • UX — Command palette (Cmd+K), Vim-style keyboard shortcuts, dark/light mode, Framer Motion page transitions, and shimmer skeleton loaders.

Tech Stack

LayerTechnology
FrontendReact 19, TanStack Router + Query, Tailwind CSS 3.4, Vite
BackendPython 3.12, FastAPI 0.115, SQLAlchemy 2.0, asyncpg
PollerGo 1.24, go-routeros/v3, pgx/v5, nats.go
DatabasePostgreSQL 17 + TimescaleDB, Row-Level Security
CacheRedis 7
Message BusNATS with JetStream
KMSOpenBao 2.1 (Transit)
AuthSRP-6a (zero-knowledge), JWT

Quick Start

# Clone and configure
cp .env.example .env

# Start infrastructure
docker compose up -d

# Build app images (one at a time to avoid OOM)
docker compose build api
docker compose build poller
docker compose build frontend

# Start the full stack
docker compose up -d

# Verify
curl http://localhost:8001/health
open http://localhost:3000

Environment Profiles

EnvironmentFrontendAPINotes
Devlocalhost:3000localhost:8001Hot-reload, volume-mounted source
Staginglocalhost:3080localhost:8081Built images, staging secrets
Productionlocalhost (port 80)Internal (proxied)Gunicorn workers, log rotation

Deployment

Prerequisites

  • Docker Engine 24+ with Docker Compose v2
  • At least 4 GB RAM (2 GB absolute minimum — builds are memory-intensive)
  • Fast storage recommended for Docker volumes
  • Network access to RouterOS devices on ports 8728 (API) and 8729 (API-SSL)

1. Clone and Configure

git clone <repository-url> tod
cd tod

# Copy environment template
cp .env.example .env.prod

2. Generate Secrets

# Generate JWT secret
python3 -c "import secrets; print(secrets.token_urlsafe(64))"

# Generate credential encryption key (32 bytes, base64-encoded)
python3 -c "import secrets, base64; print(base64.b64encode(secrets.token_bytes(32)).decode())"

Edit .env.prod with the generated values:

ENVIRONMENT=production
JWT_SECRET_KEY=<generated-jwt-secret>
CREDENTIAL_ENCRYPTION_KEY=<generated-encryption-key>
POSTGRES_PASSWORD=<strong-password>

# First admin user (created on first startup)
FIRST_ADMIN_EMAIL=admin@example.com
FIRST_ADMIN_PASSWORD=<strong-password>

3. Build Images

Build images one at a time to avoid out-of-memory crashes on constrained hosts:

docker compose -f docker-compose.yml -f docker-compose.prod.yml build api
docker compose -f docker-compose.yml -f docker-compose.prod.yml build poller
docker compose -f docker-compose.yml -f docker-compose.prod.yml build frontend

4. Start the Stack

docker compose -f docker-compose.yml -f docker-compose.prod.yml --env-file .env.prod up -d

5. Verify

# Check all services are running
docker compose ps

# Check API health (liveness)
curl http://localhost:8000/health

# Check readiness (PostgreSQL, Redis, NATS connected)
curl http://localhost:8000/health/ready

# Access the portal
open http://localhost

Log in with the FIRST_ADMIN_EMAIL and FIRST_ADMIN_PASSWORD credentials set in step 2.

Required Environment Variables

VariableDescriptionExample
ENVIRONMENTDeployment environmentproduction
JWT_SECRET_KEYJWT signing secret (min 32 chars)<generated>
CREDENTIAL_ENCRYPTION_KEYAES-256 key for device credentials (base64)<generated>
POSTGRES_PASSWORDPostgreSQL superuser password<strong-password>
FIRST_ADMIN_EMAILInitial admin account emailadmin@example.com
FIRST_ADMIN_PASSWORDInitial admin account password<strong-password>

Optional Environment Variables

VariableDefaultDescription
GUNICORN_WORKERS2API worker process count
DB_POOL_SIZE20App database connection pool size
DB_MAX_OVERFLOW40Max overflow connections above pool
DB_ADMIN_POOL_SIZE10Admin database connection pool size
DB_ADMIN_MAX_OVERFLOW20Admin max overflow connections
POLL_INTERVAL_SECONDS60Device polling interval
CONNECTION_TIMEOUT_SECONDS10RouterOS connection timeout
COMMAND_TIMEOUT_SECONDS30RouterOS per-command timeout
CIRCUIT_BREAKER_MAX_FAILURES5Consecutive failures before backoff
CIRCUIT_BREAKER_BASE_BACKOFF_SECONDS30Initial backoff duration
CIRCUIT_BREAKER_MAX_BACKOFF_SECONDS900Maximum backoff (15 min)
LOG_LEVELinfoLogging verbosity (debug/info/warn/error)
CORS_ORIGINShttp://localhost:3000Comma-separated CORS origins

Storage Configuration

Docker volumes mount to the host filesystem. Default locations:

  • PostgreSQL data: ./docker-data/postgres
  • Redis data: ./docker-data/redis
  • NATS data: ./docker-data/nats
  • Git store (config backups): ./docker-data/git-store

To change storage locations, edit the volume mounts in docker-compose.yml.

Resource Limits

Container memory limits are enforced in docker-compose.prod.yml to prevent OOM crashes:

ServiceMemory Limit
PostgreSQL512 MB
Redis128 MB
NATS128 MB
API512 MB
Poller256 MB
Frontend64 MB

Adjust under deploy.resources.limits.memory in docker-compose.prod.yml.

Monitoring (Optional)

Enable Prometheus and Grafana monitoring with the observability compose overlay:

docker compose \
  -f docker-compose.yml \
  -f docker-compose.prod.yml \
  -f docker-compose.observability.yml \
  --env-file .env.prod up -d
  • Prometheus: http://localhost:9090
  • Grafana: http://localhost:3001 (default: admin/admin)

Exported Metrics

MetricSourceDescription
http_requests_totalAPIHTTP request count by method, path, status
http_request_duration_secondsAPIRequest latency histogram
mikrotik_poll_totalPollerPoll cycles by status (success/error/skipped)
mikrotik_poll_duration_secondsPollerPoll cycle duration histogram
mikrotik_devices_activePollerNumber of devices being polled
mikrotik_circuit_breaker_skips_totalPollerPolls skipped due to backoff
mikrotik_nats_publish_totalPollerNATS publishes by subject and status

Troubleshooting

IssueSolution
API won’t start with secret errorGenerate production secrets (see step 2 above)
Build crashes with OOMBuild images one at a time (see step 3 above)
Device shows offlineCheck network access to device API port (8728/8729)
Health check failsCheck docker compose logs api for startup errors
Rate limited (429)Wait 60 seconds or check Redis connectivity
Migration failsCheck docker compose logs api for Alembic errors
NATS subscriber won’t startNon-fatal — API runs without NATS; check NATS container health
Poller circuit breaker activeDevice unreachable; check CIRCUIT_BREAKER_* env vars to tune backoff

System Overview

TOD is a containerized MSP fleet management platform for MikroTik RouterOS devices. It uses a three-service architecture: a React frontend, a Python FastAPI backend, and a Go poller. All services communicate through PostgreSQL, Redis, and NATS JetStream. Multi-tenancy is enforced at the database level via PostgreSQL Row-Level Security (RLS).

Architecture Diagram

+--------------+     +------------------+     +---------------+
|   Frontend   |---->|   Backend API    |<--->|   Go Poller   |
|  React/nginx |     |    FastAPI       |     |  go-routeros  |
+--------------+     +--------+---------+     +-------+-------+
                              |                       |
               +--------------+-------------------+---+
               |              |                   |
      +--------v---+   +-----v-------+   +-------v-------+
      |   Redis    |   | PostgreSQL  |   |    NATS       |
      |  locks,    |   | 17+Timescale|   |  JetStream    |
      |  cache     |   | DB + RLS    |   |  pub/sub      |
      +------------+   +-------------+   +-------+-------+
                                                 |
                                          +------v-------+
                                          |   OpenBao    |
                                          | Transit KMS  |
                                          +--------------+

Services

Frontend (React / nginx)

  • Stack: React 19, TypeScript, TanStack Router (file-based routing), TanStack Query (data fetching), Tailwind CSS 3.4, Vite
  • Production: Static build served by nginx on port 80 (exposed as port 3000)
  • Development: Vite dev server with hot module replacement
  • Design system: Geist Sans + Geist Mono fonts, HSL color tokens via CSS custom properties, class-based dark/light mode
  • Real-time: Server-Sent Events (SSE) for live device status updates, alerts, and operation progress
  • Client-side encryption: SRP-6a authentication flow with 2SKD key derivation; Emergency Kit PDF generation
  • UX features: Command palette (Cmd+K), Framer Motion page transitions, collapsible sidebar, skeleton loaders
  • Memory limit: 64 MB

Backend API (FastAPI)

  • Stack: Python 3.12+, FastAPI 0.115+, SQLAlchemy 2.0 async, asyncpg, Gunicorn
  • Two database engines:
    • admin_engine (superuser) — used only for auth/bootstrap and NATS subscribers that need cross-tenant access
    • app_engine (non-superuser app_user role) — used for all device/data routes, enforces RLS
  • Authentication: JWT tokens (15min access, 7d refresh), SRP-6a zero-knowledge proof, RBAC (super_admin, admin, operator, viewer)
  • NATS subscribers: Three independent subscribers for device status, metrics, and firmware events. Non-fatal startup — API serves requests even if NATS is unavailable
  • Background services: APScheduler for nightly config backups and daily firmware version checks
  • Middleware stack (LIFO): RequestID → SecurityHeaders → RateLimiting → CORS → Route handler
  • Health endpoints: /health (liveness), /health/ready (readiness — checks PostgreSQL, Redis, NATS)
  • Memory limit: 512 MB

API Routers

The backend exposes route groups under the /api prefix:

RouterPurpose
authLogin (SRP-6a + legacy), token refresh, registration
tenantsTenant CRUD (super_admin only)
usersUser management, RBAC
devicesDevice CRUD, status, commands
device_groupsLogical device grouping
device_tagsTagging and filtering
metricsTime-series metrics (TimescaleDB)
config_backupsConfiguration backup history
config_editorLive RouterOS config editing
firmwareFirmware version tracking and upgrades
alertsAlert rules and active alerts
eventsDevice event log
device_logsRouterOS system logs
templatesConfiguration templates
clientsConnected client devices
topologyNetwork topology (ReactFlow data)
sseServer-Sent Events streams
audit_logsImmutable audit trail
reportsPDF report generation (Jinja2 + WeasyPrint)
api_keysAPI key management (mktp_ prefix)
maintenance_windowsScheduled maintenance with alert suppression
vpnWireGuard VPN management
certificatesInternal CA and device TLS certificates
transparencyKMS access event dashboard

Go Poller

  • Stack: Go 1.23, go-routeros/v3, pgx/v5, nats.go
  • Polling model: Synchronous per-device polling on a configurable interval (default 60s)
  • Device communication: RouterOS binary API over TLS (port 8729), InsecureSkipVerify for self-signed certs
  • TLS fallback: Three-tier strategy — CA-verified → InsecureSkipVerify → plain API
  • Distributed locking: Redis locks prevent concurrent polling of the same device (safe for multi-instance deployment)
  • Circuit breaker: Backs off from unreachable devices to avoid wasting poll cycles
  • Credential decryption: OpenBao Transit with LRU cache (1024 entries, 5min TTL) to minimize KMS calls
  • Output: Publishes poll results to NATS JetStream; the API’s NATS subscribers process and persist them
  • Database access: Uses poller_user role which bypasses RLS (needs cross-tenant device access)
  • Memory limit: 256 MB

Infrastructure Services

PostgreSQL 17 + TimescaleDB

  • Image: timescale/timescaledb:2.17.2-pg17
  • Row-Level Security (RLS): Enforces tenant isolation at the database level. All data tables have a tenant_id column; RLS policies filter by current_setting('app.tenant_id')
  • Database roles:
    • postgres (superuser) — admin engine, auth/bootstrap, migrations
    • app_user (non-superuser) — RLS-enforced, used by API for data routes
    • poller_user — bypasses RLS, used by Go poller for cross-tenant device access
  • TimescaleDB hypertables: Time-series storage for device metrics (CPU, memory, interface traffic, etc.)
  • Migrations: Alembic, run automatically on API startup
  • Memory limit: 512 MB

Redis

  • Image: redis:7-alpine
  • Distributed locking for the Go poller (prevents concurrent polling of the same device)
  • Rate limiting on auth endpoints (5 requests/min)
  • Credential cache for OpenBao Transit responses
  • Memory limit: 128 MB

NATS JetStream

  • Image: nats:2-alpine
  • Role: Message bus between the Go poller and the Python API
  • Streams: DEVICE_EVENTS (poll results, status changes), ALERT_EVENTS (SSE delivery), OPERATION_EVENTS (SSE delivery)
  • Durable consumers: Ensure no message loss during API restarts
  • Memory limit: 128 MB

OpenBao (HashiCorp Vault fork)

  • Image: openbao/openbao:2.1
  • Transit secrets engine: Provides envelope encryption for device credentials at rest
  • Per-tenant keys: Each tenant gets a dedicated Transit encryption key
  • Memory limit: 256 MB

WireGuard

  • Image: lscr.io/linuxserver/wireguard
  • Role: VPN gateway for reaching RouterOS devices on remote networks
  • Port: 51820/UDP
  • Memory limit: 128 MB

Container Memory Limits

ServiceLimit
PostgreSQL512 MB
API512 MB
Go Poller256 MB
OpenBao256 MB
Redis128 MB
NATS128 MB
WireGuard128 MB
Frontend (nginx)64 MB

Network Ports

ServiceInternal PortExternal PortProtocol
Frontend803000HTTP
API80008001HTTP
PostgreSQL54325432TCP
Redis63796379TCP
NATS42224222TCP
NATS Monitor82228222HTTP
OpenBao82008200HTTP
WireGuard5182051820UDP

Data Flow

Device Polling Cycle

Go Poller        Redis      OpenBao    RouterOS     NATS        API        PostgreSQL
   |               |           |           |           |           |            |
   +--query list-->|           |           |           |           |            |
   |<--------------+           |           |           |           |            |
   +--acquire lock->|          |           |           |           |            |
   |<--lock granted-+          |           |           |           |            |
   +--decrypt creds (miss)---->|           |           |           |            |
   |<--plaintext creds--------+           |           |           |            |
   +--binary API (8729 TLS)--------------->|           |           |            |
   |<--system info, interfaces, metrics---+           |           |            |
   +--publish poll result--------------------------------->|       |            |
   |               |           |           |           |  subscribe>|           |
   |               |           |           |           |           +--upsert--->|
   +--release lock->|          |           |           |           |            |
  1. Poller queries PostgreSQL for the list of active devices
  2. Acquires a Redis distributed lock per device (prevents duplicate polling)
  3. Decrypts device credentials via OpenBao Transit (LRU cache avoids repeated KMS calls)
  4. Connects to the RouterOS binary API on port 8729 over TLS
  5. Collects system info, interface stats, routing tables, and metrics
  6. Publishes results to NATS JetStream
  7. API NATS subscriber processes results and upserts into PostgreSQL
  8. Releases Redis lock

Config Push (Two-Phase with Panic Revert)

Frontend        API           RouterOS
   |              |               |
   +--push config->|              |
   |              +--apply config->|
   |              +--set revert--->|
   |              |<--ack---------+
   |<--pending----+               |
   |              |               |  (timer counting down)
   +--confirm----->|              |
   |              +--cancel timer->|
   |              |<--ack---------+
   |<--confirmed--+               |
  1. Frontend sends config commands to the API
  2. API connects to the device and applies the configuration
  3. Sets a revert timer on the device (RouterOS safe mode / scheduler)
  4. Returns pending status to the frontend
  5. User confirms the change works (e.g., connectivity still up)
  6. If confirmed: API cancels the revert timer, config is permanent
  7. If timeout or rejected: device automatically reverts to the previous configuration

This pattern prevents lockouts from misconfigured firewall rules or IP changes.

SRP-6a Authentication Flow

Browser                     API                   PostgreSQL
   |                          |                       |
   +--register---------------->|                      |
   |  (email, salt, verifier) +--store verifier------>|
   |                          |                       |
   +--login step 1------------>|                      |
   |  (email, client_public)  +--lookup verifier----->|
   |<--(salt, server_public)--+<----------------------+
   |                          |                       |
   +--login step 2------------>|                      |
   |  (client_proof)          +--verify proof---------+
   |<--(server_proof, JWT)----+                       |
  1. Registration: Client derives a verifier from password + secret_key using PBKDF2 (650K iterations) + HKDF + XOR (2SKD). Only the salt and verifier are sent to the server — never the password.
  2. Login step 1: Client sends email and ephemeral public value; server responds with stored salt and its own ephemeral public value.
  3. Login step 2: Client computes a proof from the shared session key; server validates the proof without ever seeing the password.
  4. Token issuance: On successful proof, server issues JWT (15min access + 7d refresh).
  5. Emergency Kit: A downloadable PDF containing the user’s secret key for account recovery.

Multi-Tenancy

TOD enforces tenant isolation at the database level using PostgreSQL Row-Level Security (RLS), making cross-tenant data access structurally impossible.

How It Works

  • Every data table includes a tenant_id column.
  • PostgreSQL RLS policies filter rows by current_setting('app.tenant_id').
  • The API sets tenant context (SET app.tenant_id = ...) on each database session, derived from the authenticated user’s JWT.
  • super_admin role has NULL tenant_id and can access all tenants.
  • poller_user bypasses RLS intentionally (needs cross-tenant device access for polling).
  • Tenant isolation is enforced at the database level, not the application level — even a compromised API cannot leak cross-tenant data through app_user connections.

Database Roles

RoleRLSPurpose
postgresBypasses (superuser)Admin engine, auth/bootstrap, migrations
app_userEnforcedAll device/data routes in the API
poller_userBypassesCross-tenant device access for Go poller

Security Layers

LayerMechanismPurpose
AuthenticationSRP-6aZero-knowledge proof — password never transmitted or stored
Key Derivation2SKD (PBKDF2 650K + HKDF + XOR)Two-secret key derivation from password + secret key
Encryption at RestOpenBao TransitEnvelope encryption for device credentials
Tenant IsolationPostgreSQL RLSDatabase-level row filtering by tenant_id
Access ControlJWT + RBACRole-based permissions (super_admin, admin, operator, viewer)
Rate LimitingRedis-backedAuth endpoints limited to 5 requests/min
TLS CertificatesInternal CACertificate management and deployment to RouterOS devices
Security HeadersMiddlewareCSP, SRI hashes on JS bundles, X-Frame-Options, etc.
Secret ValidationStartup checkRejects known-insecure defaults in non-dev environments

First Login

  1. Navigate to the portal URL provided by your administrator.
  2. Log in with the admin credentials created during initial deployment.
  3. Complete SRP security enrollment — the portal uses zero-knowledge authentication (SRP-6a), so a unique Secret Key is generated for your account.
  4. Save your Emergency Kit PDF immediately. This PDF contains your Secret Key, which you will need to log in from any new browser or device. Without it, you cannot recover access.
  5. Complete the Setup Wizard to create your first organization and add your first device.

Setup Wizard

The Setup Wizard launches automatically for first-time super_admin users. It walks through three steps:

  • Step 1 — Create Organization: Enter a name for your tenant (organization). This is the top-level container for all your devices, users, and configuration.
  • Step 2 — Add Device: Enter the IP address, API port (default 8729 for TLS), and RouterOS credentials for your first device. The portal will attempt to connect and verify the device.
  • Step 3 — Verify & Complete: The portal polls the device to confirm connectivity. Once verified, you are taken to the dashboard.

You can always add more organizations and devices later from the sidebar.

Device Management

Adding Devices

There are three ways to add devices to your fleet:

  1. Setup Wizard — automatically offered on first login.
  2. Fleet Table — click the “Add Device” button from the Devices page.
  3. Subnet Scanner — enter a CIDR range (e.g., 192.168.1.0/24) to auto-discover MikroTik devices on the network.

When adding a device, provide:

  • IP Address — the management IP of the RouterOS device.
  • API Port — default is 8729 (TLS). The portal connects via the RouterOS binary API protocol.
  • Credentials — username and password for the device. Credentials are encrypted at rest with AES-256-GCM.

Device Detail Tabs

TabDescription
OverviewSystem info, uptime, hardware model, RouterOS version, resource usage, and interface status summary.
InterfacesReal-time traffic graphs for each network interface.
ConfigBrowse the full device configuration tree by RouterOS path.
FirewallView and manage firewall filter rules, NAT rules, and address lists.
DHCPActive DHCP leases, server configuration, and address pools.
BackupsConfiguration backup timeline with side-by-side diff viewer to compare changes over time.
ClientsConnected clients and wireless registrations.

Simple Config

Simple Config provides a consumer-router-style interface modeled after Linksys and Ubiquiti UIs. It is designed for operators who prefer guided configuration over raw RouterOS paths.

Seven category tabs:

  1. Internet — WAN connection type, PPPoE, DHCP client settings.
  2. LAN / DHCP — LAN addressing, DHCP server and pool configuration.
  3. WiFi — Wireless SSID, security, and channel settings.
  4. Port Forwarding — NAT destination rules for inbound services.
  5. Firewall — Simplified firewall rule management.
  6. DNS — DNS server and static DNS entries.
  7. System — Device identity, timezone, NTP, admin password.

Toggle between Simple (guided) and Standard (full config editor) modes at any time. Per-device settings are stored in browser localStorage.

Config Editor

The Config Editor provides direct access to RouterOS configuration paths (e.g., /ip/address, /ip/firewall/filter, /interface/bridge).

  • Select a device from the header dropdown.
  • Navigate the configuration tree to browse, add, edit, or delete entries.

Apply Modes

  • Standard Apply — changes are applied immediately.
  • Safe Apply — two-phase commit with automatic panic-revert. Changes are applied, and you have a confirmation window to accept them. If the confirmation times out (device becomes unreachable), changes automatically revert to prevent lockouts.

Safe Apply is strongly recommended for firewall rules and routing changes on remote devices.

Monitoring & Alerts

Alert Rules

Create threshold-based rules that fire when device metrics cross defined boundaries:

  • Select the metric to monitor (CPU, memory, disk, interface traffic, uptime, etc.).
  • Set the threshold value and comparison operator.
  • Choose severity: info, warning, or critical.
  • Assign one or more notification channels.

Notification Channels

ChannelDescription
EmailSMTP-based email notifications. Configure server, port, and recipients.
WebhookHTTP POST to any URL with a JSON payload containing alert details.
SlackSlack incoming webhook with Block Kit formatting for rich alert messages.

Maintenance Windows

  • Define start and end times.
  • Apply to specific devices or fleet-wide.
  • Alerts generated during the window are recorded but do not trigger notifications.
  • Maintenance windows can be recurring or one-time.

Reports

Generate PDF reports from the Reports page. Four report types are available:

ReportContent
Fleet SummaryOverall fleet health, device counts by status, top alerts, and aggregate statistics.
Device HealthPer-device detailed report with hardware info, resource trends, and recent events.
ComplianceSecurity posture audit — firmware versions, default credentials, firewall policy checks.
SLAUptime and availability metrics over a selected period with percentage calculations.

Reports are generated as downloadable PDFs using server-side rendering (Jinja2 + WeasyPrint).

Security Model

TOD implements a 1Password-inspired zero-knowledge security architecture. The server never stores or sees user passwords. All data is stored on infrastructure you own and control — no external telemetry, analytics, or third-party data transmission.

Data Protection

  • Config backups: Encrypted at rest via OpenBao Transit envelope encryption before database storage.
  • Audit logs: Encrypted at rest via Transit encryption — audit log content is protected even from database administrators.
  • Subresource Integrity (SRI): SHA-384 hashes on JavaScript bundles prevent tampering with frontend code.
  • Content Security Policy (CSP): Strict CSP headers prevent XSS, code injection, and unauthorized resource loading.
  • No external dependencies: Fully self-hosted with no external analytics, telemetry, CDNs, or third-party services. The only outbound connections are:
    • RouterOS firmware update checks (no device data sent)
    • SMTP for email notifications (if configured)
    • Webhooks for alerts (if configured)

Security Headers

HeaderValuePurpose
Strict-Transport-Securitymax-age=31536000; includeSubDomainsForce HTTPS connections
X-Content-Type-OptionsnosniffPrevent MIME-type sniffing
X-Frame-OptionsDENYPrevent clickjacking via iframes
Content-Security-PolicyStrict policyPrevent XSS and code injection
Referrer-Policystrict-origin-when-cross-originLimit referrer information leakage

Audit Trail

  • Immutable audit log: All significant actions are recorded — logins, configuration changes, device operations, admin actions.
  • Fire-and-forget logging: The log_action() function records audit events asynchronously without blocking the main request.
  • Per-tenant access: Tenants can only view their own audit logs (enforced by RLS).
  • Encryption at rest: Audit log content is encrypted via OpenBao Transit.
  • CSV export: Audit logs can be exported in CSV format for compliance and reporting.
  • Account deletion: When a user deletes their account, audit log entries are anonymized (PII removed) but the action records are retained for security compliance.

Data Retention

Data TypeRetentionNotes
User accountsUntil deletedUsers can self-delete from Settings
Device metrics90 daysPurged by TimescaleDB retention policy
Configuration backupsIndefiniteStored in git repositories on your server
Audit logsIndefiniteAnonymized on account deletion
API keysUntil revokedCascade-deleted with user account
Encrypted key materialUntil user deletedCascade-deleted with user account
Session data (Redis)15 min / 7 daysAuto-expiring access/refresh tokens
Password reset tokens30 minutesAuto-expire
SRP session stateShort-livedAuto-expire in Redis

GDPR Compliance

  • Right of Access (Art. 15): Users can view their account information on the Settings page.
  • Right to Data Portability (Art. 20): Users can export all personal data in JSON format from Settings.
  • Right to Erasure (Art. 17): Users can permanently delete their account and all associated data. Audit logs are anonymized (PII removed) with a deletion receipt generated for compliance verification.
  • Right to Rectification (Art. 16): Account information can be updated by the tenant administrator.

As a self-hosted application, the deployment operator is the data controller and is responsible for compliance with applicable data protection laws.

Authentication

SRP-6a Zero-Knowledge Proof

TOD uses the Secure Remote Password (SRP-6a) protocol for authentication, ensuring the server never receives, transmits, or stores user passwords.

  • SRP-6a protocol: Password is verified via a zero-knowledge proof — only a cryptographic verifier derived from the password is stored on the server, never the password itself.
  • Session management: JWT tokens with 15-minute access token lifetime and 7-day refresh token lifetime, delivered via httpOnly cookies.
  • SRP session state: Ephemeral SRP handshake data stored in Redis with automatic expiration.

Authentication Flow

Client                                Server
  |                                     |
  |  POST /auth/srp/init {email}        |
  |------------------------------------>|
  |  {salt, server_ephemeral_B}         |
  |<------------------------------------|
  |                                     |
  |  [Client derives session key from   |
  |   password + Secret Key + salt + B] |
  |                                     |
  |  POST /auth/srp/verify {A, M1}      |
  |------------------------------------>|
  |  [Server verifies M1 proof]         |
  |  {M2, access_token, refresh_token}  |
  |<------------------------------------|

Two-Secret Key Derivation (2SKD)

Combines the user password with a 128-bit Secret Key using a multi-step derivation process, ensuring that compromise of either factor alone is insufficient:

  • PBKDF2 with 650,000 iterations stretches the password.
  • HKDF expansion derives the final key material.
  • XOR combination of both factors produces the verifier input.

Secret Key & Emergency Kit

  • Secret Key format: A3-XXXXXX (128-bit), stored exclusively in the browser’s IndexedDB. The server never sees or stores the Secret Key.
  • Emergency Kit: Downloadable PDF containing the Secret Key for account recovery. Generated client-side.

Encryption

Credential Encryption

Device credentials (RouterOS usernames and passwords) are encrypted at rest using envelope encryption:

  • Encryption algorithm: AES-256-GCM (via Fernet symmetric encryption).
  • Key management: OpenBao Transit secrets engine provides the master encryption keys.
  • Per-tenant isolation: Each tenant has its own encryption key in OpenBao Transit.
  • Envelope encryption: Data is encrypted with a data encryption key (DEK), which is itself encrypted by the tenant’s Transit key.

Go Poller LRU Cache

The Go poller decrypts credentials at runtime via the Transit API, with an LRU cache (1,024 entries, 5-minute TTL) to reduce KMS round-trips. Cache hits avoid OpenBao API calls entirely.

Additional Encryption

  • CA private keys: Encrypted with AES-256-GCM before database storage. PEM key material is never logged.
  • Config backups: Encrypted at rest via OpenBao Transit before database storage.
  • Audit logs: Content encrypted via Transit — protected even from database administrators.

RBAC & Tenants

Role-Based Access Control

RoleScopeCapabilities
super_adminGlobalFull system access, tenant management, user management across all tenants
adminTenantManage devices, users, settings, certificates within their tenant
operatorTenantDevice operations, configuration changes, monitoring
viewerTenantRead-only access to devices, metrics, and dashboards
  • RBAC is enforced at both the API middleware layer and database level.
  • API keys inherit the operator permission level and are scoped to a single tenant.
  • API key tokens use the mktp_ prefix and are stored as SHA-256 hashes (the plaintext token is shown once at creation and never stored).

Tenant Isolation via RLS

Multi-tenancy is enforced at the database level via PostgreSQL Row-Level Security (RLS). The app_user database role automatically filters all queries by the authenticated user’s tenant_id. Super admins operate outside tenant scope.

Internal CA & TLS Fallback

TOD includes a per-tenant Internal Certificate Authority for managing TLS certificates on RouterOS devices:

  • Per-tenant CA: Each tenant can generate its own self-signed Certificate Authority.
  • Deployment: Certificates are deployed to devices via SFTP.
  • Three-tier TLS fallback: The Go poller attempts connections in order:
    1. CA-verified TLS (using the tenant’s CA certificate)
    2. InsecureSkipVerify TLS (for self-signed RouterOS certs)
    3. Plain API connection (fallback)
  • Key protection: CA private keys are encrypted with AES-256-GCM before database storage.

API Endpoints

Overview

TOD exposes a REST API built with FastAPI. Interactive documentation is available at:

  • Swagger UI: http://<host>:<port>/docs (dev environment only)
  • ReDoc: http://<host>:<port>/redoc (dev environment only)

Both Swagger and ReDoc are disabled in staging/production environments.

Endpoint Groups

All API routes are mounted under the /api prefix.

GroupPrefixDescription
Auth/api/auth/*Login, register, SRP exchange, password reset, token refresh
Tenants/api/tenants/*Tenant/organization CRUD
Users/api/users/*User management, RBAC role assignment
Devices/api/devices/*Device CRUD, scanning, status
Device Groups/api/device-groups/*Logical device grouping
Device Tags/api/device-tags/*Tag-based device labeling
Metrics/api/metrics/*TimescaleDB device metrics (CPU, memory, traffic)
Config Backups/api/config-backups/*Automated RouterOS config backup history
Config Editor/api/config-editor/*Live RouterOS config browsing and editing
Firmware/api/firmware/*RouterOS firmware version management and upgrades
Alerts/api/alerts/*Alert rule CRUD, alert history
Events/api/events/*Device event log
Device Logs/api/device-logs/*RouterOS syslog entries
Templates/api/templates/*Config templates for batch operations
Clients/api/clients/*Connected client (DHCP lease) data
Topology/api/topology/*Network topology map data
SSE/api/sse/*Server-Sent Events for real-time updates
Audit Logs/api/audit-logs/*Immutable audit trail
Reports/api/reports/*PDF report generation (Jinja2 + WeasyPrint)
API Keys/api/api-keys/*API key CRUD
Maintenance Windows/api/maintenance-windows/*Scheduled maintenance window management
VPN/api/vpn/*WireGuard VPN tunnel management
Certificates/api/certificates/*Internal CA and device certificate management
Transparency/api/transparency/*KMS access event dashboard

Health Checks

EndpointTypeDescription
GET /healthLivenessAlways returns 200 if the API process is alive. Response includes version.
GET /health/readyReadinessReturns 200 only when PostgreSQL, Redis, and NATS are all healthy. Returns 503 otherwise.
GET /api/healthLivenessBackward-compatible alias under /api prefix.

API Authentication

SRP-6a Login

  • POST /api/auth/login — SRP-6a authentication (returns JWT access + refresh tokens)
  • POST /api/auth/refresh — Refresh an expired access token
  • POST /api/auth/logout — Invalidate the current session

All authenticated endpoints require one of:

  • Authorization: Bearer <token> header
  • httpOnly cookie (set automatically by the login flow)

Access tokens expire after 15 minutes. Refresh tokens are valid for 7 days.

API Key Authentication

  • Create API keys in Admin > API Keys
  • Use header: X-API-Key: mktp_<key>
  • Keys have operator-level RBAC permissions
  • Prefix: mktp_, stored as SHA-256 hash

Rate Limiting

  • Auth endpoints: 5 requests/minute per IP
  • General endpoints: no global rate limit (per-route limits may apply)

Rate limit violations return HTTP 429 with a JSON error body.

RBAC Roles

RoleScopeDescription
super_adminGlobal (no tenant)Full platform access, tenant management
adminTenantFull access within their tenant
operatorTenantDevice operations, config changes
viewerTenantRead-only access

Error Handling

Error Format

All error responses use a standard JSON format:

{
  "detail": "Human-readable error message"
}

Status Codes

CodeMeaning
400Bad request / validation error
401Unauthorized (missing or expired token)
403Forbidden (insufficient RBAC permissions)
404Resource not found
409Conflict (duplicate resource)
422Unprocessable entity (Pydantic validation)
429Rate limit exceeded
500Internal server error
503Service unavailable (readiness check failed)

Environment Variables

TOD uses Pydantic Settings for configuration. All values can be set via environment variables or a .env file in the backend working directory.

Application

VariableDefaultDescription
APP_NAMETOD - The Other DudeApplication display name
APP_VERSION0.1.0Semantic version string
ENVIRONMENTdevRuntime environment: dev, staging, or production
DEBUGfalseEnable debug mode
CORS_ORIGINShttp://localhost:3000,...Comma-separated list of allowed CORS origins
APP_BASE_URLhttp://localhost:5173Frontend base URL (used in password reset emails)

Authentication & JWT

VariableDefaultDescription
JWT_SECRET_KEY(insecure dev default)HMAC signing key for JWTs. Must be changed in production.
JWT_ALGORITHMHS256JWT signing algorithm
JWT_ACCESS_TOKEN_EXPIRE_MINUTES15Access token lifetime in minutes
JWT_REFRESH_TOKEN_EXPIRE_DAYS7Refresh token lifetime in days
PASSWORD_RESET_TOKEN_EXPIRE_MINUTES30Password reset link validity in minutes

Database

VariableDefaultDescription
DATABASE_URLpostgresql+asyncpg://postgres:postgres@localhost:5432/mikrotikAdmin (superuser) async database URL. Used for migrations and bootstrap.
SYNC_DATABASE_URLpostgresql+psycopg2://postgres:postgres@localhost:5432/mikrotikSynchronous URL used by Alembic migrations only.
APP_USER_DATABASE_URLpostgresql+asyncpg://app_user:app_password@localhost:5432/mikrotikNon-superuser async URL. Enforces PostgreSQL RLS for tenant isolation.
DB_POOL_SIZE20App user connection pool size
DB_MAX_OVERFLOW40App user pool max overflow connections
DB_ADMIN_POOL_SIZE10Admin connection pool size
DB_ADMIN_MAX_OVERFLOW20Admin pool max overflow connections

Security

VariableDefaultDescription
CREDENTIAL_ENCRYPTION_KEY(insecure dev default)AES-256-GCM encryption key for device credentials at rest. Must be exactly 32 bytes, base64-encoded. Must be changed in production.

OpenBao / Vault (KMS)

VariableDefaultDescription
OPENBAO_ADDRhttp://localhost:8200OpenBao Transit server address for per-tenant envelope encryption
OPENBAO_TOKEN(insecure dev default)OpenBao authentication token. Must be changed in production.

NATS

VariableDefaultDescription
NATS_URLnats://localhost:4222NATS JetStream server URL for pub/sub between Go poller and Python API

Redis

VariableDefaultDescription
REDIS_URLredis://localhost:6379/0Redis URL for caching, distributed locks, and rate limiting

SMTP (Notifications)

VariableDefaultDescription
SMTP_HOSTlocalhostSMTP server hostname
SMTP_PORT587SMTP server port
SMTP_USER(none)SMTP authentication username
SMTP_PASSWORD(none)SMTP authentication password
SMTP_USE_TLSfalseEnable STARTTLS for SMTP connections
SMTP_FROM_ADDRESSnoreply@mikrotik-portal.localSender address for outbound emails

Firmware

VariableDefaultDescription
FIRMWARE_CACHE_DIR/data/firmware-cachePath to firmware download cache (PVC mount in production)
FIRMWARE_CHECK_INTERVAL_HOURS24Hours between automatic RouterOS version checks

Storage Paths

VariableDefaultDescription
GIT_STORE_PATH./git-storePath to bare git repos for config backup history. In production: /data/git-store on a ReadWriteMany PVC.
WIREGUARD_CONFIG_PATH/data/wireguardShared volume path for WireGuard configuration files

Bootstrap

VariableDefaultDescription
FIRST_ADMIN_EMAIL(none)Email for the initial super_admin user. Only used if no users exist in the database.
FIRST_ADMIN_PASSWORD(none)Password for the initial super_admin user. The user is created with must_upgrade_auth=True, triggering SRP registration on first login.

Production Safety

TOD refuses to start in staging or production environments if any of these variables still have their insecure dev defaults:

  • JWT_SECRET_KEY
  • CREDENTIAL_ENCRYPTION_KEY
  • OPENBAO_TOKEN

The process exits with code 1 and a clear error message indicating which variable needs to be rotated.

Docker Compose

Profiles

ProfileCommandServices
(default)docker compose up -dInfrastructure only: PostgreSQL, Redis, NATS, OpenBao
fulldocker compose --profile full up -dAll services: infrastructure + API, Poller, Frontend

Container Memory Limits

All containers have enforced memory limits to prevent OOM on the host:

ServiceMemory Limit
PostgreSQL512 MB
Redis128 MB
NATS128 MB
API512 MB
Poller256 MB
Frontend64 MB

Build Docker images sequentially (not in parallel) to avoid OOM during builds.