Problem
“Are the backups OK?” is a question an MSP has to answer for every customer, on every Veeam backup server, all the time. The honest pre-automation answer was that someone checked the Veeam console occasionally, or noticed when a job alert came through. There was no consistent, on-demand, per-server signal that fed into the same ITSM system as everything else — so backup health lived in a different place from the tickets, the changes, and the rest of the automation estate.
I wanted a building block: given a customer’s Veeam backup server, produce a clean success/failure signal with a short summary, on demand or on a schedule, using the infrastructure that’s already there — no new agent on the Veeam box beyond the BigFix agent it already has, and the same MID Server / BigFix / drop-and-read pattern as the other automations. This is a deliberately small, composable automation, and it doubled as the reference template for building new full flows on the stack.
Constraints
- Only BigFix on the target. The Veeam server is a customer production box. The only thing I can count on is its BigFix agent — so the check has to be delivered as a script through BigFix and clean up after itself, not installed.
- Cross trust boundary, credentials stay home. The BigFix API credential must never sit on the customer side, and must not be a standing ServiceNow credential record either.
- No callback. As with everything on this stack, BigFix runs the action eventually and writes a file; there’s no completion event. The result is retrieved by reading that file back.
- Testable without a Veeam server. I needed to build and wire the whole ServiceNow → BigFix → endpoint → read-back chain before, and independently of, having a real Veeam server with the PowerShell module installed to point it at. The pipeline’s correctness shouldn’t depend on live backup infrastructure being present.
- Reuse, don’t reinvent. The result read-back, the credential handling, and the script-delivery pattern already exist as shared components. This automation should consume them, not fork them.
Architecture
A ServiceNow flow resolves the customer’s Veeam server from the CMDB, ships a small PowerShell check to it through BigFix, reads the result back through the shared polling subflow, and returns a success/failure plus a summary.
Press play to walk a single check end to end — resolving the server, the credential, the drop-and-read on the Veeam box, and the result coming back. Click any node to see what’s running there.
The flow is a 7-step orchestration. It calculates the BigFix site from the customer number, resolves the Veeam server from cmdb_ci_server (asset_tag='VEEAM', scoped to the customer), fetches the versioned veeam_backup_integrity_check script from the database, builds and escapes the input payload, Base64-encodes the script and POSTs a BES action through the MID Server, then hands the read-back to the shared BigFix – Poll for Result (AKV) subflow and parses the result.
On the endpoint, BigFix stages payload.json and the Base64 script in a per-run working directory, certutil -decodes the script, runs it hidden, and the script writes result.json. The poll subflow resolves the server’s BigFix Computer ID and reads that file back through the BigFix Client Query API, validating it belongs to this run.
What the check actually verifies
The check is intentionally session-result-centric, not a full restore test. It probes for the Veeam.Backup.PowerShell module, connects locally with Connect-VBRServer, enumerates jobs with Get-VBRJob (all jobs, or one named job), and for each job inspects the backup sessions whose EndTimeUTC falls inside a date window (24h → 24 hours, 7d → 168, 30d → 720; default 7d). It sorts those sessions newest-first, takes the latest, and marks the job OK if and only if that session’s Result is Success. Overall success means every job is OK. It returns a per-job summary ({job_name, sessions_in_window, latest_result, latest_end, ok}) plus a human-readable detail string.
Restore-point integrity, repository free-space/health, and SureBackup application-group verification are deliberately out of scope — this check is session-result-centric by design. The case study is honest about that line: the latest backup session per job must report Success within the window, which is a meaningful and cheap signal, not a guarantee that a restore would succeed.
Key Engineering Decisions
Module-probe → shape-stable mock fallback
The decision that earns its keep is the first thing the script does: probe for Veeam.Backup.PowerShell, and if it’s absent, return a shape-identical mock success (success=true, data.mock=true, echoing the host, requested job and window) and exit 0. The real path and the mock path emit the same {success, detail, data, ritm} envelope.
This buys three things:
- Pipeline validation without a Veeam server. The entire chain — BigFix POST, drop-and-read, poll, parse, flow outputs — can be exercised on any Windows endpoint. Building and testing the ServiceNow/BigFix integration is decoupled from having live backup infrastructure.
- A parser with no special case. Because the mock returns the same shape, the
Check Veeam Resultaction needs zero branching for it — one parser handles both. - Safe-by-default onboarding. A new customer can be wired and smoke-tested whether or not the Veeam module is present on the target; the real check engages automatically when the module is installed — no flow change.
The trade-off is deliberate and worth stating plainly: the mock returns success. So any alerting built on top has to key on data.mock rather than on success alone, or a validated-but-mocked run would read as a real pass. The mock and real paths emit the same JSON shape, so the parser never branches; that data.mock signal is preserved end-to-end precisely so dashboards can tell the two apart.
Script-from-database, delivered via BigFix
The PowerShell body lives in a database table (x_msp_bigfix_scripts), not on the MID Server or baked into the BES XML. The flow fetches it at runtime, Base64-encodes it in the POST action, and BigFix decodes it on the endpoint with certutil. There’s no inline script in the action, the script is the single source of truth, and tightening the check is a body edit plus a version bump — no flow re-publish. This is the same pattern the Disk Cleanup and Compliance Audit automations use; here it’s the headline because the whole flow is small enough to see the pattern clearly.
Reuse the shared credential and polling components
BigFix Basic Auth comes from the ServiceNow Credential Resolver (com.snc.discovery.CredentialResolver, WCM) with the secret in Azure Key Vault (and a discovery_credentials fallback): the token is GlideEncrypter-encrypted in transit and decrypted in-action for a single HTTPS call, then nulled. The result read-back is the shared BigFix – Poll for Result (AKV) subflow. This automation owns almost none of its own plumbing — it composes the credential resolution and the polling loop that Disk Cleanup and Compliance Audit also use, which is exactly the point of building those as shared subflows.
Built as the reference deployer
The whole flow — three custom actions plus the reused poll subflow, seven inputs and three outputs — was created end-to-end through the Table API by an idempotent deployer, with each action’s data pills pre-wired as gzipped values payloads and attached to both the live flow and its master snapshot. That deployer is the canonical template for spinning up a new full flow on this stack, so this small automation pulls double duty as the worked example.
Challenges and Trade-offs
Session result is not a restore guarantee. The biggest honest caveat: a green result here means recent backup sessions reported Success, not that a restore would actually work. SureBackup-style verification is the deeper answer; the check deliberately trades that depth for a cheap, frequent, no-extra-infrastructure signal. The summary makes the scope explicit so nobody reads more into a pass than is there.
The mock can mask reality. The mock fallback is what makes the pipeline testable, but a misconfigured server with the module uninstalled would return a mock success — which is why data.mock is propagated all the way out and why alerting must consume it. It’s a sharp edge that’s been deliberately surfaced rather than hidden.
Programmatic flow deployment has its own failure modes. Building Flow Designer actions through the Table API means dealing with the snapshot model directly: each action’s inputs/outputs have to be duplicated under the snapshot namespace or the UI renders the action box as “corrupted”, and PowerShell 5.1’s habit of unwrapping single-element arrays has to be guarded when shaping the JSON. These are deployment-engineering details, but they’re the difference between a flow that opens cleanly in the designer and one that doesn’t.
PowerShell version quirks on the endpoint. The endpoint runs Windows PowerShell 5.1; a single-job or single-session result can come back as a scalar rather than an array and has to be normalised before it’s serialised, or the result JSON shape drifts between the one-job and many-jobs cases.
Outcome
This is built and deployed on the MSP’s ServiceNow estate — the flow, the three custom actions, the endpoint script, and the CMDB/credential wiring are in place, reusing the shared BigFix engine, Credential Resolver and Poll-for-Result subflow. The automation rolls out per customer as each Veeam server is wired into the CMDB.
- A composable, on-demand per-server backup-health signal that lands in the same ITSM system as everything else, without a new agent on the Veeam box.
- The module-probe → mock fallback made the entire integration testable before any live Veeam server was wired in — and keeps new-customer onboarding safe by default.
- The flow’s deployer is the reference template for building new full flows on the stack.
- Runs either on demand from a request or on a schedule, returning a clean per-server pass/fail and a per-job summary that slots straight into the existing ITSM workflows.
Tech Stack
- Orchestration: ServiceNow Flow Designer — a 7-step flow (
Integration - Veeam Backup Integrity Check) with three custom actions (Build Payload, Build BES XML & POST, Check Result) plus the sharedBigFix – Poll for Result (AKV)subflow - Auth: ServiceNow Credential Resolver (
com.snc.discovery.CredentialResolver) backed by Azure Key Vault (discovery_credentialsfallback);GlideEncryptervia a globalBigFixCredHelper, decrypted in-action and nulled - Endpoint Execution: IBM BigFix (BES XML actions, Base64 +
certutil -decodescript delivery, BigFix Client Query result retrieval) - Endpoint script: PowerShell
veeam_backup_integrity_check—Veeam.Backup.PowerShellmodule probe,Connect-VBRServer/Get-VBRJob/Get-VBRBackupSession, latest-session-Result check within a date window, shape-stable mock fallback - Resolution:
cmdb_ci_serverbyasset_tag='VEEAM'per customer; versioned script inx_msp_bigfix_scripts - Bridge: ServiceNow MID Server (ECC Queue cloud-to-on-prem transport)
- Deployment: idempotent Table-API deployer (9-phase action chain, snapshot dual-attach, gzipped pre-wired data pills) — the reference template for new full-flow builds