Back to projects
automation

Compliance Audit Automation

Self-service CIS Level 1 security-baseline scanning of Windows Servers across an MSP fleet — read-only by design, BigFix-delivered, results persisted and trended in ServiceNow.

ServiceNow Flow Designer BigFix Client Query API PowerShell CIS Benchmarks Azure Key Vault Security Compliance

Problem

An MSP that manages Windows Servers for regulated customers — banks, public sector, healthcare — periodically has to answer a simple-sounding question for any given server: is it configured to a security baseline? In practice the answer was assembled by hand. An engineer would remote onto the box, run a grab-bag of commands and registry queries, eyeball the output against a benchmark, and paste a summary into a ticket. It was slow, inconsistent between engineers, and — because it meant logging into production servers with privileged access — it was itself a risk surface.

I wanted to turn that into a self-service action: a customer or an internal operator requests a compliance audit of a server from a portal, and a few minutes later there’s a structured, repeatable, CIS Benchmarks Level 1 result attached to a ticket and stored for trending — without anyone logging into the box, and without the audit ever changing a single setting on the target.

Constraints

  • Read-only is non-negotiable. This is an audit, not a remediation. The automation must be incapable of modifying the target — not “configured not to”, but structurally read-only — because it runs against production servers for customers who would (rightly) never accept a “scanner” that can also change their configuration.
  • No new agent on the target. The only thing we can rely on being present is the BigFix agent. There’s no line-of-sight from ServiceNow’s cloud to the customer’s servers, and no appetite for installing yet another tool. Whatever runs on the endpoint has to be delivered through BigFix and cleaned up after itself.
  • No callback from BigFix. Same constraint as every other automation on this stack: BigFix accepts an action and runs it eventually; there is no completion event. The result has to be retrieved from a file the script writes.
  • Multi-tenant, credential-isolated. One scoped app, one BigFix Root, many customers. Customer scope flows through a config table; BigFix API credentials must never be stored as a plaintext or even a standing ServiceNow credential record.
  • The output has to be queryable, not just readable. A PDF in an attachment is useless for trending. The result needs to land as structured data so the portal can show “how many servers, how many passed, how has this server trended over the last five audits”.
  • It has to scale to a whole fleet without a second codebase. Auditing one server and auditing a customer’s entire Windows estate should be the same pipeline, not two.

Architecture

A request from the Service Portal creates a Requested Item; an after-insert Business Rule triggers a 12-step Flow Designer subflow; the subflow ships a versioned PowerShell audit script to the target via BigFix “Drop-and-Read”, reads the result back through a shared polling subflow, and persists one row per run to a result table before closing the RITM.

Press play to walk a single-server audit end to end — every step the subflow runs, the read-only scan on the endpoint, and the result coming back. Click any node to see what’s running there.

Compliance Audit — CIS L1 Read-Only Scan
Ready·Service Portal → RITM → BigFix drop-and-read → 31 CIS checks → result table
Step 01 / 15
--:--:--Press play to start the simulation.
01 / 15GLIDE-AJAXRequest Compliance AuditCatalog + RuleUser · Service Portal → ServiceNow Cloud
Operator requests Compliance Audit · single server or customer fleet
01 / 15

Live walkthrough — Service Portal request → BigFix drop-and-read → 31 CIS L1 checks → result persisted → RITM closed

The request — Service Portal, read-only on its face

The front-end is a Service Portal catalog item, Request Compliance Audit, rendered by a custom widget. Before the operator can do anything it shows live prereq pills (is there a MID Server, is the BigFix credential present, how many customers are registered, which endpoint-script version is deployed). It offers a mode (Single Server / Customer Fleet), customer and server pickers with a per-server history timeline, and read-only / simulation toggles. The read-only nature is surfaced as a badge and a banner — and, crucially, it’s not just UI copy; it’s enforced at the bottom of the stack by the endpoint script.

“Order Now” creates a normal sc_req_item. An after-insert Business Rule, filtered to this catalog item, resolves the customer’s name and number, reads the default MID Server and BigFix credential, and calls the Flow API — the single-server subflow, or the fleet driver in fleet mode.

The subflow — the canonical RITM-driven shape

The 12-step subflow (integration__compliance_audit) is the template every RITM-driven automation on this stack now clones. In order: Lookup Customer Context (the customer must exist in the config table or the run hard-fails; the target hostname is taken from the explicit input or dereferenced from the CMDB record) → post a “started” work-note → resolve the BigFix credential → Get Automation Script (compliance_audit_windows v2.0, fetched from a database table, not stored on the MID) → build the input payload → escape it for ActionScript → Build BES XML & POST through the MID Server → delegate the read-back to the shared poll subflow → Check ResultFinalize → update the RITM → Persist one result row.

Drop-and-read on the endpoint

The POST action drops two files into a per-RITM working directory on the target (payload.json and the Base64-encoded script), certutil -decodes the script, runs it hidden with Sysnative PowerShell, and deletes the encoded copy. The script runs 31 checks across 8 CIS Level 1 categories — account policy (6), audit policy (4), local security (4), Microsoft Defender (3), firewall (4), patch state (3), remote access (3), and TLS/Schannel (4) — and writes a result.json, where each check returns id, category, severity, pass, actual vs expected, and a detail string. Then the shared BigFix – Poll for Result (AKV) subflow resolves the target’s BigFix Computer ID and reads that file back through the BigFix Client Query API, validating that the result belongs to this RITM and not a previous run.

The result — one queryable row per run

Check Result parses the JSON and surfaces the summary counts and the full structured data. Persist Compliance Result then inserts exactly one row into a custom table — RITM references, customer, target, framework (CIS_L1), baseline version, scan time, overall success, total/passed/failed counts, the BigFix action id, and the entire result.json as a structured_data column for drill-down. That row is what powers the portal’s trending: audited-server count, 30-day audit count, pass ratio, and the per-server last-five timeline.

Key Engineering Decisions

Read-only as a structural guarantee, not a promise

The most important decision was to make the audit incapable of writing. The endpoint script gathers everything it needs with reads only — net accounts, auditpol, registry reads, Get-LocalUser, Get-SmbServerConfiguration, Get-MpComputerStatus, Get-NetFirewallProfile, Get-HotFix, Get-Service — and contains no Set-*, no service control, no registry writes. That means even a bug can’t change the target. For an MSP scanning regulated customers’ production servers, “the scanner physically cannot modify the box” is a far stronger statement than “the scanner is configured not to”, and it’s the thing that made the automation acceptable to run unattended.

Finalize-and-notify collapsed into one decision action

Rather than scatter the end-of-run logic across flow logic blocks, a single Finalize action computes everything: the final_status (COMPLETED / COMPLETED_WITH_FAILURES / FAILED_DISPATCH / FAILED_PARSE), the final_message, the automation_error, the encoded-query update_values for the final RITM update, and — when enabled and an email domain exists — it sends the failure notification. One action owns the decision, which makes the flow easy to read and made it the clean template to clone for the Veeam check and others.

Credential Resolver + Azure Key Vault — no stored BigFix credential

BigFix API authentication doesn’t come from a standing ServiceNow credential record. It’s resolved through the ServiceNow Credential Resolver (com.snc.discovery.CredentialResolver), with the secret held in Azure Key Vault. A flow step pulls the secret, composes base64(user:pass), and returns it GlideEncrypter-encrypted so it travels between steps as an opaque string. Each BigFix action — the POST, and the Resolve/Fetch actions inside the poll subflow — decrypts it in-memory at the moment of the HTTPS call via a global BigFixCredHelper, then nulls it. Nothing readable lands in the flow run log or the exported snapshot.

One pipeline for one server or a whole fleet

Fleet mode reuses the exact single-server pipeline. A separate driver queries the CMDB for the customer’s operational Windows servers, then loops sequentially: per host it creates a child RITM and calls the single-server subflow synchronously, aggregating per-host outcomes, before a Finalize Fleet Audit computes the fleet status. Two decisions make this clean: child RITMs are created without a catalog item set, so the catalog Business Rule doesn’t re-fire for them (a deliberate recursion guard); and one RITM per host means every server keeps its own auditable history and reuses the identical scanning path — there is no second code path for the actual checks.

Challenges and Trade-offs

Sequential by design, with an honest ceiling. The fleet driver runs hosts one at a time, not in parallel. That keeps execution deterministic, traceable (one RITM per host), and gentle on the BigFix Root and the MID Server’s queue — but it means wall-clock scales linearly with the size of the fleet, so very large estates are segmented by region or environment and run in batches. I chose deterministic and quota-safe over fast-and-parallel, on the basis that a baseline audit is not time-critical and a flaky parallel fan-out across customer production is the worse failure mode.

The 1,024-character ceiling, again. Like the other automations on this stack, the Client Query read-back caps the returned value, while a full 31-check result is much larger. The endpoint writes a result the read-back can consume while the full structured data is preserved end-to-end into the structured_data column; the format has to be mindful of what gets returned versus what’s stored.

“Pass” is only as good as the check definitions. A CIS L1 baseline is opinionated, and a handful of checks are environment-sensitive (a server legitimately running without a given role can “fail” a control that doesn’t apply to it). Each check records actual vs expected and a detail string precisely so a human reviewing the row can tell a real finding from a not-applicable one — the automation reports, it doesn’t adjudicate.

Adding a check is a script edit, deliberately. The 31 checks live in the versioned compliance_audit_windows script in the database, keyed by baseline version. Adding or tightening a check is a script-body edit plus a version bump — no flow change. That keeps the benchmark evolving independently of the orchestration, at the cost of the checks not being declaratively visible in ServiceNow.

Outcome

This is built and deployed on the MSP’s ServiceNow estate — the catalog item, the 12-step subflow, the result table, the per-customer config, and the fleet driver are all in place. It runs per-customer across the managed Windows fleet, sharing the BigFix engine and the Poll-for-Result subflow with the other automations.

  • One self-service action replaces an ad-hoc, log-into-the-box manual audit with a repeatable, structured CIS L1 result — without privileged interactive access to the target.
  • Every run lands as a queryable row, so the portal trends compliance per server and per customer rather than leaving results buried in ticket text — feeding an audited-server count, recent audit volume, fleet pass ratio, and a per-server history timeline.
  • The single-server pipeline and the fleet driver share one scanning path, so a fix to the checks or the orchestration lands once.
  • The Finalize-and-notify subflow has become the canonical template for new RITM-driven automations on the stack.

Tech Stack

  • Front-end: ServiceNow Service Portal catalog item + custom widget (prereq pills, mode selector, customer/server pickers, history timeline, read-only / simulation toggles)
  • Orchestration: ServiceNow Flow Designer — a 12-step subflow (integration__compliance_audit) plus a fleet driver flow, an after-insert Business Rule for dispatch, and the shared BigFix – Poll for Result (AKV) subflow for result read-back
  • Auth: ServiceNow Credential Resolver (com.snc.discovery.CredentialResolver) backed by Azure Key Vault; GlideEncrypter via a global BigFixCredHelper, decrypted in-action and nulled
  • Endpoint Execution: IBM BigFix (BES XML actions, Base64 + certutil -decode script delivery, BigFix Client Query result retrieval)
  • Endpoint script: PowerShell compliance_audit_windows v2.0 — 31 read-only checks across 8 CIS Level 1 categories (account policy, audit policy, local security, Defender, firewall, patch state, remote access, TLS/Schannel)
  • Data: custom …_compliance_result table (one row per run, ~22 columns incl. full structured_data JSON), per-customer config table for tenant resolution and group rules
  • Bridge: ServiceNow MID Server (ECC Queue cloud-to-on-prem transport)
  • Standard: CIS Benchmarks Level 1 (Windows Server)