TOD — The Other Dude
Fleet management for MikroTik RouterOS devices. Built for MSPs who manage hundreds of routers across multiple tenants. Think “UniFi Controller, but for MikroTik.”
TOD is a self-hosted, multi-tenant platform that gives you centralized visibility, configuration management, real-time monitoring, and zero-knowledge security across your entire MikroTik fleet.
Features
- Fleet — Dashboard with at-a-glance fleet health, virtual-scrolled device table, geographic map, and subnet scanner for device discovery.
- Configuration — Config Editor with two-phase safe apply, batch configuration across devices, bulk CLI commands, reusable templates, Simple Config (Linksys/Ubiquiti-style UI), and git-backed config backup with diff viewer.
- Monitoring — Interactive network topology (ReactFlow + Dagre), real-time metrics via SSE/NATS, configurable alert rules, notification channels (email, webhook, Slack), audit trail, KMS transparency dashboard, and PDF reports.
- Security — 1Password-style zero-knowledge architecture with SRP-6a auth, 2SKD key derivation, Secret Key with Emergency Kit, OpenBao KMS for per-tenant envelope encryption, Internal CA with SFTP cert deployment, WireGuard VPN, and AES-256-GCM credential encryption.
- Administration — Full multi-tenancy with PostgreSQL RLS, user management with RBAC, API keys (
mktp_prefix), firmware management, maintenance windows, and setup wizard. - UX — Command palette (Cmd+K), Vim-style keyboard shortcuts, dark/light mode, Framer Motion page transitions, and shimmer skeleton loaders.
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | React 19, TanStack Router + Query, Tailwind CSS 3.4, Vite |
| Backend | Python 3.12, FastAPI 0.115, SQLAlchemy 2.0, asyncpg |
| Poller | Go 1.24, go-routeros/v3, pgx/v5, nats.go |
| Database | PostgreSQL 17 + TimescaleDB, Row-Level Security |
| Cache | Redis 7 |
| Message Bus | NATS with JetStream |
| KMS | OpenBao 2.1 (Transit) |
| Auth | SRP-6a (zero-knowledge), JWT |
Quick Start
# Clone and configure
cp .env.example .env
# Start infrastructure
docker compose up -d
# Build app images (one at a time to avoid OOM)
docker compose build api
docker compose build poller
docker compose build frontend
# Start the full stack
docker compose up -d
# Verify
curl http://localhost:8001/health
open http://localhost:3000
Environment Profiles
| Environment | Frontend | API | Notes |
|---|---|---|---|
| Dev | localhost:3000 | localhost:8001 | Hot-reload, volume-mounted source |
| Staging | localhost:3080 | localhost:8081 | Built images, staging secrets |
| Production | localhost (port 80) | Internal (proxied) | Gunicorn workers, log rotation |
Deployment
Prerequisites
- Docker Engine 24+ with Docker Compose v2
- At least 4 GB RAM (2 GB absolute minimum — builds are memory-intensive)
- Fast storage recommended for Docker volumes
- Network access to RouterOS devices on ports 8728 (API) and 8729 (API-SSL)
1. Clone and Configure
git clone <repository-url> tod
cd tod
# Copy environment template
cp .env.example .env.prod
2. Generate Secrets
# Generate JWT secret
python3 -c "import secrets; print(secrets.token_urlsafe(64))"
# Generate credential encryption key (32 bytes, base64-encoded)
python3 -c "import secrets, base64; print(base64.b64encode(secrets.token_bytes(32)).decode())"
Edit .env.prod with the generated values:
ENVIRONMENT=production
JWT_SECRET_KEY=<generated-jwt-secret>
CREDENTIAL_ENCRYPTION_KEY=<generated-encryption-key>
POSTGRES_PASSWORD=<strong-password>
# First admin user (created on first startup)
FIRST_ADMIN_EMAIL=admin@example.com
FIRST_ADMIN_PASSWORD=<strong-password>
3. Build Images
Build images one at a time to avoid out-of-memory crashes on constrained hosts:
docker compose -f docker-compose.yml -f docker-compose.prod.yml build api
docker compose -f docker-compose.yml -f docker-compose.prod.yml build poller
docker compose -f docker-compose.yml -f docker-compose.prod.yml build frontend
4. Start the Stack
docker compose -f docker-compose.yml -f docker-compose.prod.yml --env-file .env.prod up -d
5. Verify
# Check all services are running
docker compose ps
# Check API health (liveness)
curl http://localhost:8000/health
# Check readiness (PostgreSQL, Redis, NATS connected)
curl http://localhost:8000/health/ready
# Access the portal
open http://localhost
Log in with the FIRST_ADMIN_EMAIL and FIRST_ADMIN_PASSWORD credentials set in step 2.
Required Environment Variables
| Variable | Description | Example |
|---|---|---|
ENVIRONMENT | Deployment environment | production |
JWT_SECRET_KEY | JWT signing secret (min 32 chars) | <generated> |
CREDENTIAL_ENCRYPTION_KEY | AES-256 key for device credentials (base64) | <generated> |
POSTGRES_PASSWORD | PostgreSQL superuser password | <strong-password> |
FIRST_ADMIN_EMAIL | Initial admin account email | admin@example.com |
FIRST_ADMIN_PASSWORD | Initial admin account password | <strong-password> |
Optional Environment Variables
| Variable | Default | Description |
|---|---|---|
GUNICORN_WORKERS | 2 | API worker process count |
DB_POOL_SIZE | 20 | App database connection pool size |
DB_MAX_OVERFLOW | 40 | Max overflow connections above pool |
DB_ADMIN_POOL_SIZE | 10 | Admin database connection pool size |
DB_ADMIN_MAX_OVERFLOW | 20 | Admin max overflow connections |
POLL_INTERVAL_SECONDS | 60 | Device polling interval |
CONNECTION_TIMEOUT_SECONDS | 10 | RouterOS connection timeout |
COMMAND_TIMEOUT_SECONDS | 30 | RouterOS per-command timeout |
CIRCUIT_BREAKER_MAX_FAILURES | 5 | Consecutive failures before backoff |
CIRCUIT_BREAKER_BASE_BACKOFF_SECONDS | 30 | Initial backoff duration |
CIRCUIT_BREAKER_MAX_BACKOFF_SECONDS | 900 | Maximum backoff (15 min) |
LOG_LEVEL | info | Logging verbosity (debug/info/warn/error) |
CORS_ORIGINS | http://localhost:3000 | Comma-separated CORS origins |
Storage Configuration
Docker volumes mount to the host filesystem. Default locations:
- PostgreSQL data:
./docker-data/postgres - Redis data:
./docker-data/redis - NATS data:
./docker-data/nats - Git store (config backups):
./docker-data/git-store
To change storage locations, edit the volume mounts in docker-compose.yml.
Resource Limits
Container memory limits are enforced in docker-compose.prod.yml to prevent OOM crashes:
| Service | Memory Limit |
|---|---|
| PostgreSQL | 512 MB |
| Redis | 128 MB |
| NATS | 128 MB |
| API | 512 MB |
| Poller | 256 MB |
| Frontend | 64 MB |
Adjust under deploy.resources.limits.memory in docker-compose.prod.yml.
Monitoring (Optional)
Enable Prometheus and Grafana monitoring with the observability compose overlay:
docker compose \
-f docker-compose.yml \
-f docker-compose.prod.yml \
-f docker-compose.observability.yml \
--env-file .env.prod up -d
- Prometheus:
http://localhost:9090 - Grafana:
http://localhost:3001(default: admin/admin)
Exported Metrics
| Metric | Source | Description |
|---|---|---|
http_requests_total | API | HTTP request count by method, path, status |
http_request_duration_seconds | API | Request latency histogram |
mikrotik_poll_total | Poller | Poll cycles by status (success/error/skipped) |
mikrotik_poll_duration_seconds | Poller | Poll cycle duration histogram |
mikrotik_devices_active | Poller | Number of devices being polled |
mikrotik_circuit_breaker_skips_total | Poller | Polls skipped due to backoff |
mikrotik_nats_publish_total | Poller | NATS publishes by subject and status |
Troubleshooting
| Issue | Solution |
|---|---|
| API won’t start with secret error | Generate production secrets (see step 2 above) |
| Build crashes with OOM | Build images one at a time (see step 3 above) |
| Device shows offline | Check network access to device API port (8728/8729) |
| Health check fails | Check docker compose logs api for startup errors |
| Rate limited (429) | Wait 60 seconds or check Redis connectivity |
| Migration fails | Check docker compose logs api for Alembic errors |
| NATS subscriber won’t start | Non-fatal — API runs without NATS; check NATS container health |
| Poller circuit breaker active | Device unreachable; check CIRCUIT_BREAKER_* env vars to tune backoff |
System Overview
TOD is a containerized MSP fleet management platform for MikroTik RouterOS devices. It uses a three-service architecture: a React frontend, a Python FastAPI backend, and a Go poller. All services communicate through PostgreSQL, Redis, and NATS JetStream. Multi-tenancy is enforced at the database level via PostgreSQL Row-Level Security (RLS).
Architecture Diagram
+--------------+ +------------------+ +---------------+
| Frontend |---->| Backend API |<--->| Go Poller |
| React/nginx | | FastAPI | | go-routeros |
+--------------+ +--------+---------+ +-------+-------+
| |
+--------------+-------------------+---+
| | |
+--------v---+ +-----v-------+ +-------v-------+
| Redis | | PostgreSQL | | NATS |
| locks, | | 17+Timescale| | JetStream |
| cache | | DB + RLS | | pub/sub |
+------------+ +-------------+ +-------+-------+
|
+------v-------+
| OpenBao |
| Transit KMS |
+--------------+
Services
Frontend (React / nginx)
- Stack: React 19, TypeScript, TanStack Router (file-based routing), TanStack Query (data fetching), Tailwind CSS 3.4, Vite
- Production: Static build served by nginx on port 80 (exposed as port 3000)
- Development: Vite dev server with hot module replacement
- Design system: Geist Sans + Geist Mono fonts, HSL color tokens via CSS custom properties, class-based dark/light mode
- Real-time: Server-Sent Events (SSE) for live device status updates, alerts, and operation progress
- Client-side encryption: SRP-6a authentication flow with 2SKD key derivation; Emergency Kit PDF generation
- UX features: Command palette (Cmd+K), Framer Motion page transitions, collapsible sidebar, skeleton loaders
- Memory limit: 64 MB
Backend API (FastAPI)
- Stack: Python 3.12+, FastAPI 0.115+, SQLAlchemy 2.0 async, asyncpg, Gunicorn
- Two database engines:
admin_engine(superuser) — used only for auth/bootstrap and NATS subscribers that need cross-tenant accessapp_engine(non-superuserapp_userrole) — used for all device/data routes, enforces RLS
- Authentication: JWT tokens (15min access, 7d refresh), SRP-6a zero-knowledge proof, RBAC (super_admin, admin, operator, viewer)
- NATS subscribers: Three independent subscribers for device status, metrics, and firmware events. Non-fatal startup — API serves requests even if NATS is unavailable
- Background services: APScheduler for nightly config backups and daily firmware version checks
- Middleware stack (LIFO): RequestID → SecurityHeaders → RateLimiting → CORS → Route handler
- Health endpoints:
/health(liveness),/health/ready(readiness — checks PostgreSQL, Redis, NATS) - Memory limit: 512 MB
API Routers
The backend exposes route groups under the /api prefix:
| Router | Purpose |
|---|---|
auth | Login (SRP-6a + legacy), token refresh, registration |
tenants | Tenant CRUD (super_admin only) |
users | User management, RBAC |
devices | Device CRUD, status, commands |
device_groups | Logical device grouping |
device_tags | Tagging and filtering |
metrics | Time-series metrics (TimescaleDB) |
config_backups | Configuration backup history |
config_editor | Live RouterOS config editing |
firmware | Firmware version tracking and upgrades |
alerts | Alert rules and active alerts |
events | Device event log |
device_logs | RouterOS system logs |
templates | Configuration templates |
clients | Connected client devices |
topology | Network topology (ReactFlow data) |
sse | Server-Sent Events streams |
audit_logs | Immutable audit trail |
reports | PDF report generation (Jinja2 + WeasyPrint) |
api_keys | API key management (mktp_ prefix) |
maintenance_windows | Scheduled maintenance with alert suppression |
vpn | WireGuard VPN management |
certificates | Internal CA and device TLS certificates |
transparency | KMS access event dashboard |
Go Poller
- Stack: Go 1.23, go-routeros/v3, pgx/v5, nats.go
- Polling model: Synchronous per-device polling on a configurable interval (default 60s)
- Device communication: RouterOS binary API over TLS (port 8729), InsecureSkipVerify for self-signed certs
- TLS fallback: Three-tier strategy — CA-verified → InsecureSkipVerify → plain API
- Distributed locking: Redis locks prevent concurrent polling of the same device (safe for multi-instance deployment)
- Circuit breaker: Backs off from unreachable devices to avoid wasting poll cycles
- Credential decryption: OpenBao Transit with LRU cache (1024 entries, 5min TTL) to minimize KMS calls
- Output: Publishes poll results to NATS JetStream; the API’s NATS subscribers process and persist them
- Database access: Uses
poller_userrole which bypasses RLS (needs cross-tenant device access) - Memory limit: 256 MB
Infrastructure Services
PostgreSQL 17 + TimescaleDB
- Image:
timescale/timescaledb:2.17.2-pg17 - Row-Level Security (RLS): Enforces tenant isolation at the database level. All data tables have a
tenant_idcolumn; RLS policies filter bycurrent_setting('app.tenant_id') - Database roles:
postgres(superuser) — admin engine, auth/bootstrap, migrationsapp_user(non-superuser) — RLS-enforced, used by API for data routespoller_user— bypasses RLS, used by Go poller for cross-tenant device access
- TimescaleDB hypertables: Time-series storage for device metrics (CPU, memory, interface traffic, etc.)
- Migrations: Alembic, run automatically on API startup
- Memory limit: 512 MB
Redis
- Image:
redis:7-alpine - Distributed locking for the Go poller (prevents concurrent polling of the same device)
- Rate limiting on auth endpoints (5 requests/min)
- Credential cache for OpenBao Transit responses
- Memory limit: 128 MB
NATS JetStream
- Image:
nats:2-alpine - Role: Message bus between the Go poller and the Python API
- Streams: DEVICE_EVENTS (poll results, status changes), ALERT_EVENTS (SSE delivery), OPERATION_EVENTS (SSE delivery)
- Durable consumers: Ensure no message loss during API restarts
- Memory limit: 128 MB
OpenBao (HashiCorp Vault fork)
- Image:
openbao/openbao:2.1 - Transit secrets engine: Provides envelope encryption for device credentials at rest
- Per-tenant keys: Each tenant gets a dedicated Transit encryption key
- Memory limit: 256 MB
WireGuard
- Image:
lscr.io/linuxserver/wireguard - Role: VPN gateway for reaching RouterOS devices on remote networks
- Port: 51820/UDP
- Memory limit: 128 MB
Container Memory Limits
| Service | Limit |
|---|---|
| PostgreSQL | 512 MB |
| API | 512 MB |
| Go Poller | 256 MB |
| OpenBao | 256 MB |
| Redis | 128 MB |
| NATS | 128 MB |
| WireGuard | 128 MB |
| Frontend (nginx) | 64 MB |
Network Ports
| Service | Internal Port | External Port | Protocol |
|---|---|---|---|
| Frontend | 80 | 3000 | HTTP |
| API | 8000 | 8001 | HTTP |
| PostgreSQL | 5432 | 5432 | TCP |
| Redis | 6379 | 6379 | TCP |
| NATS | 4222 | 4222 | TCP |
| NATS Monitor | 8222 | 8222 | HTTP |
| OpenBao | 8200 | 8200 | HTTP |
| WireGuard | 51820 | 51820 | UDP |
Data Flow
Device Polling Cycle
Go Poller Redis OpenBao RouterOS NATS API PostgreSQL
| | | | | | |
+--query list-->| | | | | |
|<--------------+ | | | | |
+--acquire lock->| | | | | |
|<--lock granted-+ | | | | |
+--decrypt creds (miss)---->| | | | |
|<--plaintext creds--------+ | | | |
+--binary API (8729 TLS)--------------->| | | |
|<--system info, interfaces, metrics---+ | | |
+--publish poll result--------------------------------->| | |
| | | | | subscribe>| |
| | | | | +--upsert--->|
+--release lock->| | | | | |
- Poller queries PostgreSQL for the list of active devices
- Acquires a Redis distributed lock per device (prevents duplicate polling)
- Decrypts device credentials via OpenBao Transit (LRU cache avoids repeated KMS calls)
- Connects to the RouterOS binary API on port 8729 over TLS
- Collects system info, interface stats, routing tables, and metrics
- Publishes results to NATS JetStream
- API NATS subscriber processes results and upserts into PostgreSQL
- Releases Redis lock
Config Push (Two-Phase with Panic Revert)
Frontend API RouterOS
| | |
+--push config->| |
| +--apply config->|
| +--set revert--->|
| |<--ack---------+
|<--pending----+ |
| | | (timer counting down)
+--confirm----->| |
| +--cancel timer->|
| |<--ack---------+
|<--confirmed--+ |
- Frontend sends config commands to the API
- API connects to the device and applies the configuration
- Sets a revert timer on the device (RouterOS safe mode / scheduler)
- Returns pending status to the frontend
- User confirms the change works (e.g., connectivity still up)
- If confirmed: API cancels the revert timer, config is permanent
- If timeout or rejected: device automatically reverts to the previous configuration
This pattern prevents lockouts from misconfigured firewall rules or IP changes.
SRP-6a Authentication Flow
Browser API PostgreSQL
| | |
+--register---------------->| |
| (email, salt, verifier) +--store verifier------>|
| | |
+--login step 1------------>| |
| (email, client_public) +--lookup verifier----->|
|<--(salt, server_public)--+<----------------------+
| | |
+--login step 2------------>| |
| (client_proof) +--verify proof---------+
|<--(server_proof, JWT)----+ |
- Registration: Client derives a verifier from
password + secret_keyusing PBKDF2 (650K iterations) + HKDF + XOR (2SKD). Only the salt and verifier are sent to the server — never the password. - Login step 1: Client sends email and ephemeral public value; server responds with stored salt and its own ephemeral public value.
- Login step 2: Client computes a proof from the shared session key; server validates the proof without ever seeing the password.
- Token issuance: On successful proof, server issues JWT (15min access + 7d refresh).
- Emergency Kit: A downloadable PDF containing the user’s secret key for account recovery.
Multi-Tenancy
TOD enforces tenant isolation at the database level using PostgreSQL Row-Level Security (RLS), making cross-tenant data access structurally impossible.
How It Works
- Every data table includes a
tenant_idcolumn. - PostgreSQL RLS policies filter rows by
current_setting('app.tenant_id'). - The API sets tenant context (
SET app.tenant_id = ...) on each database session, derived from the authenticated user’s JWT. super_adminrole has NULLtenant_idand can access all tenants.poller_userbypasses RLS intentionally (needs cross-tenant device access for polling).- Tenant isolation is enforced at the database level, not the application level — even a compromised API cannot leak cross-tenant data through
app_userconnections.
Database Roles
| Role | RLS | Purpose |
|---|---|---|
postgres | Bypasses (superuser) | Admin engine, auth/bootstrap, migrations |
app_user | Enforced | All device/data routes in the API |
poller_user | Bypasses | Cross-tenant device access for Go poller |
Security Layers
| Layer | Mechanism | Purpose |
|---|---|---|
| Authentication | SRP-6a | Zero-knowledge proof — password never transmitted or stored |
| Key Derivation | 2SKD (PBKDF2 650K + HKDF + XOR) | Two-secret key derivation from password + secret key |
| Encryption at Rest | OpenBao Transit | Envelope encryption for device credentials |
| Tenant Isolation | PostgreSQL RLS | Database-level row filtering by tenant_id |
| Access Control | JWT + RBAC | Role-based permissions (super_admin, admin, operator, viewer) |
| Rate Limiting | Redis-backed | Auth endpoints limited to 5 requests/min |
| TLS Certificates | Internal CA | Certificate management and deployment to RouterOS devices |
| Security Headers | Middleware | CSP, SRI hashes on JS bundles, X-Frame-Options, etc. |
| Secret Validation | Startup check | Rejects known-insecure defaults in non-dev environments |
First Login
- Navigate to the portal URL provided by your administrator.
- Log in with the admin credentials created during initial deployment.
- Complete SRP security enrollment — the portal uses zero-knowledge authentication (SRP-6a), so a unique Secret Key is generated for your account.
- Save your Emergency Kit PDF immediately. This PDF contains your Secret Key, which you will need to log in from any new browser or device. Without it, you cannot recover access.
- Complete the Setup Wizard to create your first organization and add your first device.
Setup Wizard
The Setup Wizard launches automatically for first-time super_admin users. It walks through three steps:
- Step 1 — Create Organization: Enter a name for your tenant (organization). This is the top-level container for all your devices, users, and configuration.
- Step 2 — Add Device: Enter the IP address, API port (default 8729 for TLS), and RouterOS credentials for your first device. The portal will attempt to connect and verify the device.
- Step 3 — Verify & Complete: The portal polls the device to confirm connectivity. Once verified, you are taken to the dashboard.
You can always add more organizations and devices later from the sidebar.
Device Management
Adding Devices
There are three ways to add devices to your fleet:
- Setup Wizard — automatically offered on first login.
- Fleet Table — click the “Add Device” button from the Devices page.
- Subnet Scanner — enter a CIDR range (e.g.,
192.168.1.0/24) to auto-discover MikroTik devices on the network.
When adding a device, provide:
- IP Address — the management IP of the RouterOS device.
- API Port — default is 8729 (TLS). The portal connects via the RouterOS binary API protocol.
- Credentials — username and password for the device. Credentials are encrypted at rest with AES-256-GCM.
Device Detail Tabs
| Tab | Description |
|---|---|
| Overview | System info, uptime, hardware model, RouterOS version, resource usage, and interface status summary. |
| Interfaces | Real-time traffic graphs for each network interface. |
| Config | Browse the full device configuration tree by RouterOS path. |
| Firewall | View and manage firewall filter rules, NAT rules, and address lists. |
| DHCP | Active DHCP leases, server configuration, and address pools. |
| Backups | Configuration backup timeline with side-by-side diff viewer to compare changes over time. |
| Clients | Connected clients and wireless registrations. |
Simple Config
Simple Config provides a consumer-router-style interface modeled after Linksys and Ubiquiti UIs. It is designed for operators who prefer guided configuration over raw RouterOS paths.
Seven category tabs:
- Internet — WAN connection type, PPPoE, DHCP client settings.
- LAN / DHCP — LAN addressing, DHCP server and pool configuration.
- WiFi — Wireless SSID, security, and channel settings.
- Port Forwarding — NAT destination rules for inbound services.
- Firewall — Simplified firewall rule management.
- DNS — DNS server and static DNS entries.
- System — Device identity, timezone, NTP, admin password.
Toggle between Simple (guided) and Standard (full config editor) modes at any time. Per-device settings are stored in browser localStorage.
Config Editor
The Config Editor provides direct access to RouterOS configuration paths (e.g., /ip/address, /ip/firewall/filter, /interface/bridge).
- Select a device from the header dropdown.
- Navigate the configuration tree to browse, add, edit, or delete entries.
Apply Modes
- Standard Apply — changes are applied immediately.
- Safe Apply — two-phase commit with automatic panic-revert. Changes are applied, and you have a confirmation window to accept them. If the confirmation times out (device becomes unreachable), changes automatically revert to prevent lockouts.
Safe Apply is strongly recommended for firewall rules and routing changes on remote devices.
Monitoring & Alerts
Alert Rules
Create threshold-based rules that fire when device metrics cross defined boundaries:
- Select the metric to monitor (CPU, memory, disk, interface traffic, uptime, etc.).
- Set the threshold value and comparison operator.
- Choose severity: info, warning, or critical.
- Assign one or more notification channels.
Notification Channels
| Channel | Description |
|---|---|
| SMTP-based email notifications. Configure server, port, and recipients. | |
| Webhook | HTTP POST to any URL with a JSON payload containing alert details. |
| Slack | Slack incoming webhook with Block Kit formatting for rich alert messages. |
Maintenance Windows
- Define start and end times.
- Apply to specific devices or fleet-wide.
- Alerts generated during the window are recorded but do not trigger notifications.
- Maintenance windows can be recurring or one-time.
Reports
Generate PDF reports from the Reports page. Four report types are available:
| Report | Content |
|---|---|
| Fleet Summary | Overall fleet health, device counts by status, top alerts, and aggregate statistics. |
| Device Health | Per-device detailed report with hardware info, resource trends, and recent events. |
| Compliance | Security posture audit — firmware versions, default credentials, firewall policy checks. |
| SLA | Uptime and availability metrics over a selected period with percentage calculations. |
Reports are generated as downloadable PDFs using server-side rendering (Jinja2 + WeasyPrint).
Security Model
TOD implements a 1Password-inspired zero-knowledge security architecture. The server never stores or sees user passwords. All data is stored on infrastructure you own and control — no external telemetry, analytics, or third-party data transmission.
Data Protection
- Config backups: Encrypted at rest via OpenBao Transit envelope encryption before database storage.
- Audit logs: Encrypted at rest via Transit encryption — audit log content is protected even from database administrators.
- Subresource Integrity (SRI): SHA-384 hashes on JavaScript bundles prevent tampering with frontend code.
- Content Security Policy (CSP): Strict CSP headers prevent XSS, code injection, and unauthorized resource loading.
- No external dependencies: Fully self-hosted with no external analytics, telemetry, CDNs, or third-party services. The only outbound connections are:
- RouterOS firmware update checks (no device data sent)
- SMTP for email notifications (if configured)
- Webhooks for alerts (if configured)
Security Headers
| Header | Value | Purpose |
|---|---|---|
Strict-Transport-Security | max-age=31536000; includeSubDomains | Force HTTPS connections |
X-Content-Type-Options | nosniff | Prevent MIME-type sniffing |
X-Frame-Options | DENY | Prevent clickjacking via iframes |
Content-Security-Policy | Strict policy | Prevent XSS and code injection |
Referrer-Policy | strict-origin-when-cross-origin | Limit referrer information leakage |
Audit Trail
- Immutable audit log: All significant actions are recorded — logins, configuration changes, device operations, admin actions.
- Fire-and-forget logging: The
log_action()function records audit events asynchronously without blocking the main request. - Per-tenant access: Tenants can only view their own audit logs (enforced by RLS).
- Encryption at rest: Audit log content is encrypted via OpenBao Transit.
- CSV export: Audit logs can be exported in CSV format for compliance and reporting.
- Account deletion: When a user deletes their account, audit log entries are anonymized (PII removed) but the action records are retained for security compliance.
Data Retention
| Data Type | Retention | Notes |
|---|---|---|
| User accounts | Until deleted | Users can self-delete from Settings |
| Device metrics | 90 days | Purged by TimescaleDB retention policy |
| Configuration backups | Indefinite | Stored in git repositories on your server |
| Audit logs | Indefinite | Anonymized on account deletion |
| API keys | Until revoked | Cascade-deleted with user account |
| Encrypted key material | Until user deleted | Cascade-deleted with user account |
| Session data (Redis) | 15 min / 7 days | Auto-expiring access/refresh tokens |
| Password reset tokens | 30 minutes | Auto-expire |
| SRP session state | Short-lived | Auto-expire in Redis |
GDPR Compliance
- Right of Access (Art. 15): Users can view their account information on the Settings page.
- Right to Data Portability (Art. 20): Users can export all personal data in JSON format from Settings.
- Right to Erasure (Art. 17): Users can permanently delete their account and all associated data. Audit logs are anonymized (PII removed) with a deletion receipt generated for compliance verification.
- Right to Rectification (Art. 16): Account information can be updated by the tenant administrator.
As a self-hosted application, the deployment operator is the data controller and is responsible for compliance with applicable data protection laws.
Authentication
SRP-6a Zero-Knowledge Proof
TOD uses the Secure Remote Password (SRP-6a) protocol for authentication, ensuring the server never receives, transmits, or stores user passwords.
- SRP-6a protocol: Password is verified via a zero-knowledge proof — only a cryptographic verifier derived from the password is stored on the server, never the password itself.
- Session management: JWT tokens with 15-minute access token lifetime and 7-day refresh token lifetime, delivered via httpOnly cookies.
- SRP session state: Ephemeral SRP handshake data stored in Redis with automatic expiration.
Authentication Flow
Client Server
| |
| POST /auth/srp/init {email} |
|------------------------------------>|
| {salt, server_ephemeral_B} |
|<------------------------------------|
| |
| [Client derives session key from |
| password + Secret Key + salt + B] |
| |
| POST /auth/srp/verify {A, M1} |
|------------------------------------>|
| [Server verifies M1 proof] |
| {M2, access_token, refresh_token} |
|<------------------------------------|
Two-Secret Key Derivation (2SKD)
Combines the user password with a 128-bit Secret Key using a multi-step derivation process, ensuring that compromise of either factor alone is insufficient:
- PBKDF2 with 650,000 iterations stretches the password.
- HKDF expansion derives the final key material.
- XOR combination of both factors produces the verifier input.
Secret Key & Emergency Kit
- Secret Key format:
A3-XXXXXX(128-bit), stored exclusively in the browser’s IndexedDB. The server never sees or stores the Secret Key. - Emergency Kit: Downloadable PDF containing the Secret Key for account recovery. Generated client-side.
Encryption
Credential Encryption
Device credentials (RouterOS usernames and passwords) are encrypted at rest using envelope encryption:
- Encryption algorithm: AES-256-GCM (via Fernet symmetric encryption).
- Key management: OpenBao Transit secrets engine provides the master encryption keys.
- Per-tenant isolation: Each tenant has its own encryption key in OpenBao Transit.
- Envelope encryption: Data is encrypted with a data encryption key (DEK), which is itself encrypted by the tenant’s Transit key.
Go Poller LRU Cache
The Go poller decrypts credentials at runtime via the Transit API, with an LRU cache (1,024 entries, 5-minute TTL) to reduce KMS round-trips. Cache hits avoid OpenBao API calls entirely.
Additional Encryption
- CA private keys: Encrypted with AES-256-GCM before database storage. PEM key material is never logged.
- Config backups: Encrypted at rest via OpenBao Transit before database storage.
- Audit logs: Content encrypted via Transit — protected even from database administrators.
RBAC & Tenants
Role-Based Access Control
| Role | Scope | Capabilities |
|---|---|---|
super_admin | Global | Full system access, tenant management, user management across all tenants |
admin | Tenant | Manage devices, users, settings, certificates within their tenant |
operator | Tenant | Device operations, configuration changes, monitoring |
viewer | Tenant | Read-only access to devices, metrics, and dashboards |
- RBAC is enforced at both the API middleware layer and database level.
- API keys inherit the
operatorpermission level and are scoped to a single tenant. - API key tokens use the
mktp_prefix and are stored as SHA-256 hashes (the plaintext token is shown once at creation and never stored).
Tenant Isolation via RLS
Multi-tenancy is enforced at the database level via PostgreSQL Row-Level Security (RLS). The app_user database role automatically filters all queries by the authenticated user’s tenant_id. Super admins operate outside tenant scope.
Internal CA & TLS Fallback
TOD includes a per-tenant Internal Certificate Authority for managing TLS certificates on RouterOS devices:
- Per-tenant CA: Each tenant can generate its own self-signed Certificate Authority.
- Deployment: Certificates are deployed to devices via SFTP.
- Three-tier TLS fallback: The Go poller attempts connections in order:
- CA-verified TLS (using the tenant’s CA certificate)
- InsecureSkipVerify TLS (for self-signed RouterOS certs)
- Plain API connection (fallback)
- Key protection: CA private keys are encrypted with AES-256-GCM before database storage.
API Endpoints
Overview
TOD exposes a REST API built with FastAPI. Interactive documentation is available at:
- Swagger UI:
http://<host>:<port>/docs(dev environment only) - ReDoc:
http://<host>:<port>/redoc(dev environment only)
Both Swagger and ReDoc are disabled in staging/production environments.
Endpoint Groups
All API routes are mounted under the /api prefix.
| Group | Prefix | Description |
|---|---|---|
| Auth | /api/auth/* | Login, register, SRP exchange, password reset, token refresh |
| Tenants | /api/tenants/* | Tenant/organization CRUD |
| Users | /api/users/* | User management, RBAC role assignment |
| Devices | /api/devices/* | Device CRUD, scanning, status |
| Device Groups | /api/device-groups/* | Logical device grouping |
| Device Tags | /api/device-tags/* | Tag-based device labeling |
| Metrics | /api/metrics/* | TimescaleDB device metrics (CPU, memory, traffic) |
| Config Backups | /api/config-backups/* | Automated RouterOS config backup history |
| Config Editor | /api/config-editor/* | Live RouterOS config browsing and editing |
| Firmware | /api/firmware/* | RouterOS firmware version management and upgrades |
| Alerts | /api/alerts/* | Alert rule CRUD, alert history |
| Events | /api/events/* | Device event log |
| Device Logs | /api/device-logs/* | RouterOS syslog entries |
| Templates | /api/templates/* | Config templates for batch operations |
| Clients | /api/clients/* | Connected client (DHCP lease) data |
| Topology | /api/topology/* | Network topology map data |
| SSE | /api/sse/* | Server-Sent Events for real-time updates |
| Audit Logs | /api/audit-logs/* | Immutable audit trail |
| Reports | /api/reports/* | PDF report generation (Jinja2 + WeasyPrint) |
| API Keys | /api/api-keys/* | API key CRUD |
| Maintenance Windows | /api/maintenance-windows/* | Scheduled maintenance window management |
| VPN | /api/vpn/* | WireGuard VPN tunnel management |
| Certificates | /api/certificates/* | Internal CA and device certificate management |
| Transparency | /api/transparency/* | KMS access event dashboard |
Health Checks
| Endpoint | Type | Description |
|---|---|---|
GET /health | Liveness | Always returns 200 if the API process is alive. Response includes version. |
GET /health/ready | Readiness | Returns 200 only when PostgreSQL, Redis, and NATS are all healthy. Returns 503 otherwise. |
GET /api/health | Liveness | Backward-compatible alias under /api prefix. |
API Authentication
SRP-6a Login
POST /api/auth/login— SRP-6a authentication (returns JWT access + refresh tokens)POST /api/auth/refresh— Refresh an expired access tokenPOST /api/auth/logout— Invalidate the current session
All authenticated endpoints require one of:
Authorization: Bearer <token>header- httpOnly cookie (set automatically by the login flow)
Access tokens expire after 15 minutes. Refresh tokens are valid for 7 days.
API Key Authentication
- Create API keys in Admin > API Keys
- Use header:
X-API-Key: mktp_<key> - Keys have operator-level RBAC permissions
- Prefix:
mktp_, stored as SHA-256 hash
Rate Limiting
- Auth endpoints: 5 requests/minute per IP
- General endpoints: no global rate limit (per-route limits may apply)
Rate limit violations return HTTP 429 with a JSON error body.
RBAC Roles
| Role | Scope | Description |
|---|---|---|
super_admin | Global (no tenant) | Full platform access, tenant management |
admin | Tenant | Full access within their tenant |
operator | Tenant | Device operations, config changes |
viewer | Tenant | Read-only access |
Error Handling
Error Format
All error responses use a standard JSON format:
{
"detail": "Human-readable error message"
}
Status Codes
| Code | Meaning |
|---|---|
| 400 | Bad request / validation error |
| 401 | Unauthorized (missing or expired token) |
| 403 | Forbidden (insufficient RBAC permissions) |
| 404 | Resource not found |
| 409 | Conflict (duplicate resource) |
| 422 | Unprocessable entity (Pydantic validation) |
| 429 | Rate limit exceeded |
| 500 | Internal server error |
| 503 | Service unavailable (readiness check failed) |
Environment Variables
TOD uses Pydantic Settings for configuration. All values can be set via environment variables or a .env file in the backend working directory.
Application
| Variable | Default | Description |
|---|---|---|
APP_NAME | TOD - The Other Dude | Application display name |
APP_VERSION | 0.1.0 | Semantic version string |
ENVIRONMENT | dev | Runtime environment: dev, staging, or production |
DEBUG | false | Enable debug mode |
CORS_ORIGINS | http://localhost:3000,... | Comma-separated list of allowed CORS origins |
APP_BASE_URL | http://localhost:5173 | Frontend base URL (used in password reset emails) |
Authentication & JWT
| Variable | Default | Description |
|---|---|---|
JWT_SECRET_KEY | (insecure dev default) | HMAC signing key for JWTs. Must be changed in production. |
JWT_ALGORITHM | HS256 | JWT signing algorithm |
JWT_ACCESS_TOKEN_EXPIRE_MINUTES | 15 | Access token lifetime in minutes |
JWT_REFRESH_TOKEN_EXPIRE_DAYS | 7 | Refresh token lifetime in days |
PASSWORD_RESET_TOKEN_EXPIRE_MINUTES | 30 | Password reset link validity in minutes |
Database
| Variable | Default | Description |
|---|---|---|
DATABASE_URL | postgresql+asyncpg://postgres:postgres@localhost:5432/mikrotik | Admin (superuser) async database URL. Used for migrations and bootstrap. |
SYNC_DATABASE_URL | postgresql+psycopg2://postgres:postgres@localhost:5432/mikrotik | Synchronous URL used by Alembic migrations only. |
APP_USER_DATABASE_URL | postgresql+asyncpg://app_user:app_password@localhost:5432/mikrotik | Non-superuser async URL. Enforces PostgreSQL RLS for tenant isolation. |
DB_POOL_SIZE | 20 | App user connection pool size |
DB_MAX_OVERFLOW | 40 | App user pool max overflow connections |
DB_ADMIN_POOL_SIZE | 10 | Admin connection pool size |
DB_ADMIN_MAX_OVERFLOW | 20 | Admin pool max overflow connections |
Security
| Variable | Default | Description |
|---|---|---|
CREDENTIAL_ENCRYPTION_KEY | (insecure dev default) | AES-256-GCM encryption key for device credentials at rest. Must be exactly 32 bytes, base64-encoded. Must be changed in production. |
OpenBao / Vault (KMS)
| Variable | Default | Description |
|---|---|---|
OPENBAO_ADDR | http://localhost:8200 | OpenBao Transit server address for per-tenant envelope encryption |
OPENBAO_TOKEN | (insecure dev default) | OpenBao authentication token. Must be changed in production. |
NATS
| Variable | Default | Description |
|---|---|---|
NATS_URL | nats://localhost:4222 | NATS JetStream server URL for pub/sub between Go poller and Python API |
Redis
| Variable | Default | Description |
|---|---|---|
REDIS_URL | redis://localhost:6379/0 | Redis URL for caching, distributed locks, and rate limiting |
SMTP (Notifications)
| Variable | Default | Description |
|---|---|---|
SMTP_HOST | localhost | SMTP server hostname |
SMTP_PORT | 587 | SMTP server port |
SMTP_USER | (none) | SMTP authentication username |
SMTP_PASSWORD | (none) | SMTP authentication password |
SMTP_USE_TLS | false | Enable STARTTLS for SMTP connections |
SMTP_FROM_ADDRESS | noreply@mikrotik-portal.local | Sender address for outbound emails |
Firmware
| Variable | Default | Description |
|---|---|---|
FIRMWARE_CACHE_DIR | /data/firmware-cache | Path to firmware download cache (PVC mount in production) |
FIRMWARE_CHECK_INTERVAL_HOURS | 24 | Hours between automatic RouterOS version checks |
Storage Paths
| Variable | Default | Description |
|---|---|---|
GIT_STORE_PATH | ./git-store | Path to bare git repos for config backup history. In production: /data/git-store on a ReadWriteMany PVC. |
WIREGUARD_CONFIG_PATH | /data/wireguard | Shared volume path for WireGuard configuration files |
Bootstrap
| Variable | Default | Description |
|---|---|---|
FIRST_ADMIN_EMAIL | (none) | Email for the initial super_admin user. Only used if no users exist in the database. |
FIRST_ADMIN_PASSWORD | (none) | Password for the initial super_admin user. The user is created with must_upgrade_auth=True, triggering SRP registration on first login. |
Production Safety
TOD refuses to start in staging or production environments if any of these variables still have their insecure dev defaults:
JWT_SECRET_KEYCREDENTIAL_ENCRYPTION_KEYOPENBAO_TOKEN
The process exits with code 1 and a clear error message indicating which variable needs to be rotated.
Docker Compose
Profiles
| Profile | Command | Services |
|---|---|---|
| (default) | docker compose up -d | Infrastructure only: PostgreSQL, Redis, NATS, OpenBao |
full | docker compose --profile full up -d | All services: infrastructure + API, Poller, Frontend |
Container Memory Limits
All containers have enforced memory limits to prevent OOM on the host:
| Service | Memory Limit |
|---|---|
| PostgreSQL | 512 MB |
| Redis | 128 MB |
| NATS | 128 MB |
| API | 512 MB |
| Poller | 256 MB |
| Frontend | 64 MB |
Build Docker images sequentially (not in parallel) to avoid OOM during builds.