Back to projects
automation

Hybrid Identity Automation

A self-service portal that lets a Nordic MSP's customers create, update and disable their own users, groups and shared mailboxes across on-prem Active Directory, Entra ID and Exchange Online — backed by one multi-tenant ServiceNow + BigFix engine.

PowerShell ServiceNow BigFix Active Directory Graph API Exchange Online REST API OAuth 2.0 Power Automate

A Nordic MSP runs Active Directory, Microsoft 365 and Exchange for 100+ enterprise customers. Every new hire, leaver and group change used to be a hand-worked ticket — per tenant, per engineer, slow and inconsistent. I turned it into a self-service product: one entity-first portal a customer’s own service desk drives to manage users, groups and shared mailboxes across their on-prem AD, Entra ID and Exchange Online — while the MSP keeps running the infrastructure behind it.

Here is the actual tool, end to end — click any shot to zoom:

The Service Portal, live
Entity-first home — pick User, Group or Shared mailbox. The directory pre-indexes in the background so the next steps are instant.
User actions — create, reset password, manage group memberships, update details, or disable (block sign-in, never delete).
Create a user — live username Check Availability, a searchable multi-DC OU tree, and 30+ AD / Exchange-hybrid attributes.
Memberships by transfer-list, then a pre-submit Verify that runs uniqueness + validity checks across AD and Entra. Submit stays locked until Verify passes.
Reset password — type-ahead search resolves the user live against the directory.
Reset & send by SMS — a random password is set straight on the domain controller (never shown) and texted to the user.
Manage group memberships — find the user, then continue to edit.
Live state across AD + Entra + Exchange — AD groups, cloud groups and shared mailboxes loaded on demand (here, the Shared Mailboxes tab), then edited as Added / Removed deltas.
Update an existing user — Fetch pulls the account's current AD / Entra / Exchange state and all group memberships in seconds, then every attribute and the target OU become editable.

Captured in a test domain. Hostnames, customer OU names and distribution lists are redacted — everything else is the real tool.

The portal is just the front door — behind it, a multi-tenant ServiceNow + BigFix engine reaches each customer’s on-prem domain controllers, runs an Azure AD Connect sync, and finishes in the Microsoft cloud.

Problem

User-lifecycle work was manual, ticket-by-ticket, and done separately for every tenant. It didn’t scale with the customer base and it drifted between engineers. The brief: make it multi-tenant self-service — one engine any customer can drive to provision across their own AD, Entra ID and Exchange Online — and make it the foundation every later automation on the platform could reuse, including one a non-ServiceNow front-end could safely call over the public internet.

Constraints

  • No cloud-to-on-prem connectivity. ServiceNow is in the cloud; each customer’s domain controller sits on-prem behind its own firewalls, with no VPN between them.
  • BigFix is fire-and-forget. You POST an action; there is no webhook, no completion event, no return channel.
  • The MID Server is the only bridge — and the only host allowed to make outbound Exchange Online PowerShell calls (ServiceNow itself can’t load the module).
  • Multi-tenant by design. Every customer has its own AD domain, BigFix site, M365 tenant, EXO certificate and credential set.
  • Exchange Hybrid. Mail attributes must be written on the on-prem DC, not the cloud — which forces a specific operation order.

Architecture

One shared engine, two self-service front-ends, multi-tenant from the ground up. Every request resolves to its own customer’s AD domain, BigFix site, M365 tenant and credential set, so a single deployment serves the whole base. The engine is a 41-step Flow Designer subflow that owns four phases and the request lifecycle:

  1. Phase 1 — Active Directory on the customer’s DC, executed via BigFix (user/group/attribute deltas, Exchange-hybrid attributes).
  2. Phase 2 — Azure AD Connect delta sync, so the change is visible in Entra ID.
  3. Phase 3 — the cloud finish: Microsoft Graph for the M365 license (Create only) and Entra-group membership, and Exchange Online PowerShell for shared-mailbox delegation.

The Service Portal front-end — the one in the screenshots above — is fully ServiceNow-native: the operator fetches the target’s current state across AD + Entra + Exchange, edits it, and submits only the {Added, Removed} deltas. Role and tenant govern who sees what; a customer’s staff only ever touch their own directory. The same entity-first portal also manages groups and shared mailboxes directly.

AD User Provisioning — Service Portal
Ready·Catalog widget · fetch current state · edit deltas · same 41-step engine
Step 01 / 16
--:--:--Press play to start the simulation.
01 / 16GLIDE-AJAXProvision FormWidget ServerUser Browser → ServiceNow Cloud
Operator opens Provision form · Action=Update · JDOE01
01 / 16

Service Portal — fetch current state, edit deltas, submit into the shared 41-step engine

The second front-end is an OAuth-protected Scripted REST API for a customer with its own UI: Power Automate POSTs the payload, the API creates the RITM and triggers the subflow inBackground(), returning 202 Accepted with the RITM number immediately. Both front-ends converge on the identical engine — below is that engine walked end-to-end, every console line reproduced from the real sys_journal_field worknotes.

AD User Provisioning — 4-Phase Orchestration
Ready·RITM0001234 · Update JDOE01 · Customer A
Step 01 / 25
--:--:--Press play to start the simulation.
01 / 25INTERNALSharePoint FormPower AutomateMicrosoft Cloud · Caller
SharePoint form submit · payload built
01 / 25

The shared engine — RITM walked through all four phases (Scripted REST path shown)

Key Engineering Decisions

Most of the real work was at the boundaries — multi-tenant security, the platform’s hard edges, and onboarding a hundred customers’ wildly different directories without a code change. For each: the decision, the alternative I rejected, and what it cost.

The secret never reaches the endpoint; the new password never reaches the platform

Each customer has its own BigFix, AD, Graph and Exchange credentials, resolved at runtime by a naming convention keyed off the customer identifier — so onboarding is a few credential records, not a code change. The decision that mattered was where the secret is allowed to exist. Credentials live in ServiceNow’s encrypted store and are resolved and injected by the MID Server, so they’re never written into the BigFix action that lands on the customer’s DC and never appear in an endpoint log — the endpoint receives a script and data, never a password. The reset path is the sharpest case: the new password is generated on the domain controller itself, so it never travels inbound, never sits in a payload, and is delivered to the user by SMS. The rejected alternative — embedding credentials in the dispatched action, as the naive BigFix integration does — would scatter every customer’s secrets across endpoint action-histories and logs. The newer sibling automations standardised this into a single Credential Resolver (WCM, with sources such as Azure Key Vault behind it); this flow predates that and resolves from the encrypted credential tables directly.

The browser never names the target tenant

A request to act on “user X” still has to resolve to some customer’s domain controller. The easy version trusts the page — the front-end says which tenant, the engine obeys — which is a cross-tenant privilege-escalation hole: anyone who can shape a request points it at another customer’s AD. So for the portal the target customer, DC, BigFix site and credentials are resolved server-side from the signed-in user’s customer association, never from anything the browser sent. The browser is treated as hostile. (The one caller permitted to name a tenant is the server-to-server REST front-end, which authenticates as an OAuth 2.0 confidential client — a trusted backend, not an end user.) The cost is wiring each user to a customer at onboarding; the benefit is that no crafted request can cross a tenant boundary, because the boundary isn’t carried in the request.

The domain controller describes itself — inference-based onboarding

Onboarding a customer means learning their directory: FQDN, NetBIOS name, base DN, the default OU for new users, the email domain, whether they’re Exchange-hybrid, and which OUs are privileged. The two obvious paths are a manual intake form (slow, error-prone, and customers often don’t know the answers) or blind auto-configuration (dangerous). I built a third: a PowerShell survey that runs on the DC itself and infers each fact, emitting it as a {value, confidence, evidence} triple — not a bare value, but the value, how sure it is, and the AD object it read to decide. A human adjudicates the low-confidence ones. The machine does the forensics; the person makes the call — and onboarding a new customer becomes a config row rather than a code change.

Tier-0 is off-limits to everyone — enforced on the DC, not in the browser

The model is fail-closed from the first insert: creating a customer’s config stamps deny-first rules — MSP staff get scoped access, a customer’s own users default to deny. On top of that, a protected set — the well-known privileged SIDs (Domain Admins, Enterprise Admins, the Tier-0 accounts), plus any restricted OUs the onboarding survey flagged — is blocked for everyone, including MSP staff, and the protection is transitive: nested membership is caught with the LDAP matching-rule-in-chain OID 1.2.840.113556.1.4.1941, so a Tier-0 account can’t be smuggled in behind a benign-looking group. Crucially the gate is enforced in PowerShell on the domain controller — re-resolving SIDs, walking the chain, locale-independent — not in the UI. The browser guard only mirrors it; the real one runs where the change actually happens, because anything the client enforces, the client can bypass.

”Done” means fresh data, not HTTP 200

BigFix is fire-and-forget: you POST an action and get a dispatch acknowledgement — no webhook, no completion event, no return channel. The trap is treating the 200 on dispatch as “finished”; it only means “accepted.” So completion is gated on data freshness, not transport status: the endpoint script writes a result.json stamped with this run’s RITM, and ServiceNow reads it back through the BigFix Client Query API in a Do-While loop — 15 s for AD ops, 5 min for the AAD sync — comparing the embedded RITM before trusting the file, so a stale result from a previous run can never be mistaken for this one. Client Query, not the older Analysis-property approach, because it’s built for one-shot interactive reads. This drop-and-read round-trip became the standard ServiceNow-to-BigFix pattern across the estate, factored into a reusable Poll for Result subflow.

The focused view below isolates just that round-trip — the ad-hoc BigFix action and the Client Query read-back that pre-populate the form:

AD State Fetch — On-Demand AD Lookup
Ready·Service Portal widget · BigFix Action · Client Query · (fetch leg of the portal flow)
Step 01 / 13
--:--:--Press play to start the simulation.
01 / 13GLIDE-AJAXService PortalProvision WidgetUser Browser → ServiceNow Cloud
User clicks Fetch · username: JDOE01
01 / 13

AD State Fetch — the BigFix Client Query round-trip in isolation

Attack the root cause (synchronous-on-load), not the symptom (slow polls)

The portal felt slow because an on-load call warmed a catalog synchronously, holding a shared platform semaphore for several seconds — and the hold scaled with viewers, not actions, so a handful of admins opening the portal at once could starve a node. The tempting fix is to tune the poll interval, but that’s a constant-factor win that leaves the thread-hold in place. The real fix was to take the synchronous work off the load path entirely: the portal does an instant, tenant-scoped cache read while the catalog warms on a background worker, out of band. Onboarding fires an async event and returns rather than blocking on the build, and if a load ever hits a cold cache it doesn’t error — it shows a graceful “indexing…” and triggers the warm itself, so the system repairs its own cold-start. A read that held a thread for seconds now returns in milliseconds, and concurrency stopped being a scaling axis.

Endpoint scripts are versioned data, not escaped strings

The brittle way to run PowerShell on a DC through BigFix is to inline the script as a string — but BigFix ActionScript, the JSON payload and the BES XML envelope each claim {} and XML entities as their own delimiters, so an inline script means three nested layers of escaping fighting each other (plus the Windows-1252 characters in Scandinavian names that the BES parser rejects outright). Instead the scripts live in a table as versioned records, are Base64-transported so no layer can misread them, and are decoded and run on the endpoint. Escaping stops being a problem because nothing is ever escaped — the payload is opaque bytes until it lands on disk. The bonus: every endpoint runs a known, auditable, versioned script, not a string assembled at dispatch time.

Deltas, not desired state — with a protected set that can’t be removed

Every membership field is {Added:[], Removed:[]}, never “the desired full list”. A replayed or partial HR update therefore can’t silently strip a user out of groups that weren’t part of the change — the worst case is a no-op, not data loss. The rejected alternative, full desired-state sync, is cleaner on paper but wipes any membership the caller didn’t know to include, which for a directory you don’t fully own is unacceptable. The protected groups are also un-removable through the payload: a well-formed Removed entry targeting them is dropped server-side. Names Phase 1 can’t find on-prem aren’t treated as failures either — they route forward: an Entra group goes to Graph, a mailbox to Exchange Online, so the caller never has to know which system owns each name.

Config over code — and the platform-edge war stories

Config over code. Customers describe the same thing differently — DK vs Denmark, the c / co / countryCode triple that has to stay consistent — so attribute mapping is declarative, normalised by a resolver with a safe fallback rather than branching code per customer. Onboarding variance is absorbed as configuration, not engine changes.

A few platform edges I had to engineer around, kept here for the record:

  • Cookie stripping. Cookie-based session auth to BigFix worked in direct testing but failed through the MID Server — the ECC Queue strips Cookie headers when it serialises a REST call into a queue record. Fix: Basic auth with the credential decrypted in-script (GlideEncrypter) and the Base64 header built by hand, because a standard header survives serialisation.
  • Password(2Way) corruption. Flow Designer’s two-way password variables arrived truncated and garbled between steps. Rather than fight the platform, every script that needs a secret resolves it at the moment of use — more code, but deterministic and observable.
  • Exchange Online needs a real host. Connect-ExchangeOnline won’t load inside ServiceNow, so Phase 3’s mailbox work runs on the MID Server itself, authenticating with a per-customer certificate pinned in its Windows certificate store; the cert never leaves the box and the app registration is scoped to mailbox management only, so a compromise elsewhere can’t forge mailbox permissions.

Challenges and Trade-offs

  • No rollback. If Phase 1 succeeds but a later phase fails, there’s no automatic undo across four target systems. The engine logs enough context for manual intervention and closes the RITM Closed Incomplete. Building cross-system rollback wasn’t worth it for a rare failure mode.
  • Polling latency. Drop-and-read adds latency — quick for the on-prem AD phase, longer while it waits on the AAD sync to confirm. Fine for provisioning; it would not be for a real-time automation.
  • Shared-mailbox ambiguity. Form users treat distribution lists, Entra groups and shared mailboxes as interchangeable. The engine discovers at runtime which back-end owns each name, so a misspelled mailbox fails quietly at the cloud phase and has to be read off the RITM worknotes.
  • Testing in production. MID Server, BigFix, DCs and Exchange are all customer production infrastructure with no faithful staging clone. Testing leaned on a dedicated test OU and shared mailbox, a dev ServiceNow instance wired to the production MID Server, and a PowerShell test suite that exercises every operation.

Outcome

This was the foundational automation on the MSP’s ServiceNow + BigFix platform. It turned manual, per-tenant user administration into a multi-tenant self-service capability customers run themselves, and it established the patterns every later automation builds on — MID Server bridging, BigFix action execution, async drop-and-read polling, OAuth-fronted entry, delta payloads and MID-hosted Exchange Online.

  • Turned a multi-step manual provisioning job — separate hand-worked tickets for the directory, licensing and mailbox delegation — into a single self-service request that runs end to end without an engineer.
  • Runs across the customer base as one multi-tenant deployment, with every request scoped to its own tenant’s directory, BigFix site and credentials.
  • Customers self-serve users, groups and shared mailboxes — scoped to their own tenant — instead of raising tickets.
  • The drop-and-read polling pattern is now the standard for every ServiceNow-to-BigFix integration across the Nordic operation.
  • Onboarding a new customer is a set of credential records and a CMDB entry — no code change.

Tech Stack

  • Front-end: ServiceNow Service Portal catalog item + custom widget (primary, customer-operated); plus a SharePoint Online form + Power Automate for a customer with its own UI
  • Auth: OAuth 2.0 client-credentials into ServiceNow; per-tenant Microsoft Identity Platform app registrations for Graph; certificate auth for Exchange Online
  • Orchestration: ServiceNow Flow Designer — a shared 41-step subflow behind two entry paths
  • Endpoint execution: IBM BigFix (BES XML actions, BigFix Client Query result retrieval)
  • Scripting: PowerShell 5.1 on the DC; ExchangeOnlineManagement on the MID Server
  • Cloud: Microsoft Graph (licensing, Entra group membership); Exchange Online PowerShell V3 (mailbox delegation)
  • Bridge: ServiceNow MID Server (ECC Queue cloud-to-on-prem transport; also the EXO PowerShell host)
  • Identity: Active Directory, Azure AD Connect, Exchange Hybrid, Entra ID
  • CMDB: cmdb_ci_server records drive per-customer DC and AAD Connect selection