Avaya IP Office SNMP Trap Monitoring

Problem

Avaya IP Office (IPO) telephone systems are deployed across dozens of customer sites at a Nordic MSP. When an IPO system has a hardware fault, a trunk failure, or a license issue, it sends SNMP traps. These traps contain structured information using the IPO-MIB format — alarm severity, entity identifier, alarm description, and timestamps.

The challenge was routing these traps into Zabbix in a way that respects the MSP’s multi-tenant architecture: each customer’s IPO system should appear as its own host in Zabbix, traps should trigger alerts at the correct severity, and unknown IPO systems (new installations that are still unregistered in Zabbix) should be flagged for investigation rather than silently dropped.

Constraints

SNMP traps are unsolicited. Unlike polled monitoring, traps arrive whenever the device decides to send them. The receiver must be always-running, handle bursts, and never lose a trap.
IPO-MIB parsing. The traps use Avaya’s proprietary IPO-MIB OID structure (.1.3.6.1.4.1.6889.*). The receiver must understand this MIB to extract meaningful fields rather than just forwarding raw OID/value pairs.
Multi-tenant routing. A single trap receiver serves all customers. Each trap must be routed to the correct Zabbix host based on the source IP address. Unknown source IPs must be handled gracefully.
Security-first host model. The MSP explicitly required that new IPO systems NOT be automatically created in Zabbix. Unknown devices must route to a designated discovery host for manual review and creation. This prevents rogue devices from appearing in the monitoring system.

Architecture

The system is a Python daemon managed by systemd, sitting between snmptrapd (the standard SNMP trap receiver) and Zabbix:

snmptrapd listens on UDP port 162 and writes incoming traps to a log file at /var/log/snmptrapd/traps.log. It handles the low-level SNMP protocol and trap reception.

trap_receiver.py (500 lines) is a Python daemon that continuously tails the trap log file, parsing each new entry. When it detects an IPO-MIB trap, it extracts:

Source IP: Which IPO system sent the trap
Alarm severity: Critical, Major, Minor, or Informational (from ipoGTEventStdSeverity OID)
Alarm entity: What component is affected (from ipoGTEventStdAlarmEntity)
Description: Human-readable description (from ipoGTEventStdAlarmDescription)

The parsed trap is formatted as JSON and sent to the appropriate Zabbix host via zabbix_sender.

Host routing logic:

Look up the source IP in a configuration mapping of known hosts
If found → route the trap to that specific Zabbix host using its configured hostname
If not found → route to the IPO-Unknown-Devices host, which triggers an alert for the NOC team to investigate and register the new device

Batch processing. Traps are collected into batches (configurable size, default 50) with a maximum wait time (configurable, default 5 seconds). When either the batch size or the time interval is reached, the batch is sent to Zabbix in a single zabbix_sender call. This reduces the overhead of individual sends during trap storms while keeping latency low during quiet periods. Batch management is thread-safe.

The end-to-end pipeline is therefore: Avaya IP Office emits an SNMP trap, snmptrapd writes it to the log file, the Python daemon parses the IPO-MIB entry, and the result is routed to the correct Zabbix host via zabbix_sender.

Key Engineering Decisions

Log file tailing, not direct SNMP reception. I chose to have the daemon tail snmptrapd’s log file rather than implementing direct SNMP trap reception in Python. This has two advantages: snmptrapd is battle-tested for high-volume trap reception and handles the UDP socket management, community string validation, and trap format normalization. The Python daemon only needs to parse a text file, which is simpler and more debuggable.

Manual host creation model. The easiest approach would have been to auto-create Zabbix hosts for unknown IPs. I deliberately chose not to do this. In an MSP environment, automatic host creation is a security risk — a misconfigured device or a device on a newly acquired network could create spurious hosts in the monitoring system. The unknown-device routing pattern ensures that every monitored device has been explicitly approved.

Severity mapping. IPO-MIB alarm severities map to Zabbix trigger priorities: Critical → Disaster/High, Major → Average, Minor → Warning, Informational → Information. The Zabbix template includes trigger prototypes at each severity level, so a Critical alarm from an IPO creates a High-priority trigger in Zabbix, which routes through the correct escalation path.

Graceful shutdown with trap buffering. When the daemon receives a SIGTERM (e.g., during a systemd service restart), it flushes any pending trap batch to Zabbix before exiting. This prevents trap loss during daemon restarts or system maintenance.

Challenges and Trade-offs

IPO-MIB complexity. The Avaya IPO-MIB is a 2,500-line MIB definition with nested trap structures. Not all traps follow the same field structure — some include extended enterprise-specific fields, others use the standard alarm structure. The parser handles the common cases and logs unparseable traps for manual review rather than attempting to handle every possible trap variation.

Configuration management. The host routing configuration (IP-to-hostname mappings) is maintained in a config.ini file. When a new IPO system is installed at a customer site, someone must add the mapping manually. I considered integrating with the Zabbix API to pull host-to-IP mappings dynamically, but the added complexity wasn’t justified for the number of IPO systems currently managed. A separate configure_zabbix.py script automates the Zabbix side of the setup (host creation, template linking, group assignment) when the manual registration happens.

Single point of failure. The trap receiver daemon runs on one host. If that host goes down, traps are lost (they’re UDP — there’s no retry). Mitigation: systemd’s Restart=always with a 10-second delay ensures the daemon comes back after crashes, and the Zabbix template includes a nodata trigger that fires if no trap data arrives for 10 minutes.

Outcome

Brings Avaya IP Office systems across the MSP’s customer sites under unified Zabbix monitoring
Real-time trap processing with severity-appropriate alerting
Unknown device detection prevents unregistered devices from creating noise in the monitoring system
Batch processing handles trap storms without overwhelming Zabbix
systemd service with security hardening (NoNewPrivileges, PrivateTmp, ProtectSystem=strict)
Open-sourced with documentation for deployment across the MSP

Tech Stack

Daemon: Python 3 (500-line trap receiver, systemd managed)
SNMP: snmptrapd (trap reception), IPO-MIB (trap parsing)
Monitoring: Zabbix 7.0 (trapper items, severity-mapped triggers)
Data Delivery: zabbix_sender (batch mode)
Configuration: Python helper script for Zabbix API host setup
Security: systemd hardening, IP allowlisting, manual host creation model