Grounding an AI Coding Assistant in a Codebase It Can’t See

How I built iid-mcp: an MCP server that puts a private SailPoint IdentityIQ codebase in front of Claude Code, Copilot, and Claude Desktop, so they stop guessing.

I write a lot of SailPoint IdentityIQ code: rules, workflows, tasks, plugins, and XML configs that an importer validates against a DTD with more than a thousand elements. So when the good LLM coding assistants showed up, I did the obvious thing and asked one to write a rule for me.

What I got back looked completely plausible and was subtly wrong. It called a method that doesn’t exist on SailPointContext. It invented a helper in com.identityworksllc.iiq.common that nobody ever wrote. It set a type= attribute on a <Rule> that the importer rejects on sight. The code compiled in its head and fell over in mine.

The reason’s simple. The model has read a lot of public Java, but it’s never seen the libraries I actually build against: Instrumental Identity’s iiqcommon, the iiq-common-library, the in-house plugin set, and SailPoint’s own shipped API surface. So it does what models do when they don’t know. It produces a confident average of everything it’s seen and presents it as fact. For IIQ work that average is worse than useless, because it’s wrong in ways that take someone who knows the platform to catch.

The fix isn’t a better prompt. The fix is to stop letting the model lean on memory and hand it the real source instead. That’s what iid-mcp does.

What it is

iid-mcp is a Model Context Protocol server. MCP is the standard that lets an AI client (Claude Code, Claude Desktop, GitHub Copilot in agent mode) call tools you define, over a well-specified protocol, with no bespoke glue per client. You stand up one server, expose a set of tools, and every MCP-aware assistant can use them.

iid-mcp exposes seven tools. Two are live: they reach into Instrumental Identity’s GitLab and search or fetch real library source on demand. The other five are reference tools. They serve curated, queryable knowledge built straight from SailPoint’s own shipped artifacts: the database DDL, the Javadoc, the example configs, the XML DTD.

Tool	What it does
`search_iiq(query, max_results)`	GitLab blob search across every in-scope project, in parallel
`fetch_iiq_file(project, path, ref)`	Fetch one file’s full contents from GitLab
`get_iiq_patterns()`	Return the hand-curated patterns reference
`get_iiq_schema(table?)`	IIQ database schema: table inventory, or full column detail for one table
`get_iiq_api(name?)`	IIQ + Instrumental Identity Javadoc: package index, a package’s classes, or one class
`get_iiq_examples(key?)`	SailPoint’s bundled config examples by rule type, workflow, form
`get_iiq_dtd(element?)`	The IIQ XML DTD: element index, or one element’s children and attributes

The whole thing’s about 1,200 lines of Python plus the generated reference data. It’s small on purpose. The hard part was never the code. It was deciding what to feed the model and how to keep that data honest.

The shape of the problem

There are two different jobs hiding inside “help me write IIQ code,” and they want different tools.

The first job is find me real usage. When I’m writing against a helper, I want to see how it’s actually called in code that ships, not how the model imagines it’s called. That’s a search problem against private repos. It has to be live, because the libraries change on the order of days.

The second job is tell me the legal shape of this thing. What columns does spt_identity have on 8.5? What does SailPointContext.getObjects return? Which children is a <Workflow> allowed to have? What Rule types does the DTD actually permit? None of that changes between Tuesday and Thursday. It changes when SailPoint ships a new version. That’s a reference problem, and the right move is to precompute it from the source of truth and serve it fast.

I built both halves into one server because, from the agent’s point of view, they’re the same task: ground this artifact in reality before you generate it. The server’s own instructions even spell out the ordering. Load the patterns, pull the matching example, check the API signature, verify the XML shape against the DTD, then search the live code to confirm. Generate last.

Live search over GitLab

Scope resolution

You configure what the server can see with one environment variable, IID_MCP_IIQ_SCOPE, a comma-separated list. Each entry is either a project path (pub/iiqcommon, used as-is) or a group path (idw/idw-sailpoint/iiq-plugins, expanded to every non-archived project inside it, recursively into subgroups). The default scope resolves to roughly 38 projects.

Resolution caches in memory for the life of the process. The first search triggers it. Every search after that reuses it. Adding a new project to a configured group means restarting the server so the cache re-resolves, which is fine, because the process holds no other state and restarting it is free.

One detail I care about more than it probably deserves: the cold-resolve path is locked. If four search_iiq calls land at once on a fresh process, you don’t want all four fanning out the full scope-resolution traffic against GitLab. The first caller takes an asyncio.Lock, does the work, and populates the cache. The other three wait, then fall straight into the cache hit. The fast path, where the cache is already populated, does a single atomic pointer read and never touches the lock at all.

Parallel fan-out, bounded twice

search_iiq runs a GitLab blob search against every project in scope at once. With 38 projects that’s a lot of concurrent HTTP, so it’s bounded in two places.

Within a single search call, a semaphore caps how many per-project requests are in flight (IID_MCP_SCOPE_MAX_CONCURRENT_SEARCHES, default 8). Across the whole process, a second semaphore caps how many search_iiq calls run at once (IID_MCP_MAX_CONCURRENT_SEARCH_CALLS, default 4). Callers over the limit queue on the semaphore. They don’t get rejected. At the ceiling that’s 8 times 4, so 32 in-flight GitLab requests, which is the number I tuned the timeout and the rate-limit headroom around. The second semaphore has to be built lazily, on first call, because asyncio primitives bind to the running event loop and there’s no loop at import time.

One slow repo shouldn’t sink the search

This is the part I got wrong first and then fixed. The fix is the whole philosophy of the project in miniature.

The original search_iiq did the obvious thing. Gather all the per-project searches, return the combined hits. Then production logs showed a pattern where three of four search calls would fail outright. The cause was iiq-common-library. Its blob search legitimately runs around 11 seconds (every other project comes back in under 3.4), and under load it would tip past the timeout. One project’s ReadTimeout, propagating up through asyncio.gather, took the entire search down with it. The 37 healthy projects had already returned useful hits, and the caller saw nothing but an exception.

So the rule now is: a per-project failure is data, not an exception. Each project’s search is wrapped to catch GitLabError and any httpx.HTTPError, which covers ReadTimeout, ConnectError, and friends. On failure it returns the exception instead of raising. The aggregator splits outcomes into hits and a structured errors list. You get the 37 projects that worked plus a note saying which one timed out and why. Partial results stay useful. The same instinct shows up again in the cache and in the transport layer: infrastructure trouble degrades the answer, it never destroys it.

A returned hit carries everything the agent needs to act: project, path, ref, line number, the matched snippet, and a web_url that deep-links straight to the line in GitLab. The usual loop is search, read the snippets, fetch_iiq_file the one or two that matter, write code grounded in what came back.

The reference tools

The five reference tools share one idea. SailPoint already ships the ground truth for most of what an agent needs to know. It just ships it in formats built for a human with a browser, not a model with a tool call. So I wrote build scripts that parse those artifacts into markdown shaped for retrieval, and tools that serve slices of it on demand.

Schema (get_iiq_schema). SailPoint bundles the database DDL with every release: create scripts and upgrade patch scripts for Oracle, SQL Server, MySQL, and PostgreSQL. A build script parses the base files, applies the patch files in order to get the effective schema, diffs consecutive versions, and generates a reference covering IIQ 8.4 and 8.5 including patch levels. get_iiq_schema() returns the table inventory and common query patterns. get_iiq_schema(table='spt_link') returns the full column list with per-database type differences and the 8.4-to-8.5 diff.

It also flags a real footgun. Oracle and PostgreSQL put function-based UPPER() indexes on string columns, so a query that filters on native_identity without wrapping the bind parameter in UPPER() silently bypasses the index and table-scans. Columns that need it are marked in the reference, so the agent writes the indexed form the first time instead of the slow form you discover in production.

API (get_iiq_api). Javadoc, converted to per-package markdown, merged across three sources: SailPoint IIQ 8.5, pub/iiqcommon, and iiq-common-library. About 940 classes across 93 packages. A class lookup like get_iiq_api('SailPointContext') spans all three sources, so the agent doesn’t need to know which library a helper lives in. Per-class sections get extracted on demand at request time, which means a package file can hold 569 classes (sailpoint.object does) without ever returning all of it in one response. The build script handles both the legacy table-based Javadoc HTML the SailPoint distribution still ships and the modern flexbox layout current JDKs emit, because Instrumental Identity’s libraries get freshly generated docs on a newer toolchain.

Examples (get_iiq_examples). SailPoint ships example configs with IIQ: rules, workflows, forms, quicklinks, dynamic scopes, scoring, email templates. The 8.5 build covers 185 rule examples across 93 rule types, plus the rest. When the agent’s about to write a Correlation rule, get_iiq_examples('Correlation') hands it SailPoint’s own example for that exact type, so the input and output contract and the idiom come from the vendor rather than from the model’s imagination.

DTD (get_iiq_dtd). Every IIQ artifact is XML, and the importer validates it against sailpoint.dtd. That file’s generated dynamically by IIQ’s own DTDGenerator, so I regenerate it straight from the matching jar. The 8.5 DTD has 1,108 elements and 110 legal values in the Rule.type enum. get_iiq_dtd('Workflow') returns the legal children and attributes of a <Workflow> before the agent generates one. Checking here prevents the single most common authoring failure, which is XML that looks right and gets rejected at import time.

Patterns (get_iiq_patterns). The one tool that isn’t generated. It’s a hand-curated reference of the things that come up constantly when writing against these libraries: logging idioms, task base classes, plugin anatomy, the namespace gotcha where iiqcommon and iiq-common-library share a package prefix but hold different classes. The curation rules are strict and I enforce them on myself. Every entry is sourced (file path plus class or method), verified (someone actually ran it), concise (a paragraph and a short code block), and current. Stale entries get removed, not left to rot, because the whole point of the file is that the agent trusts it and skips re-verifying. A stale pattern is worse than no pattern. It turns “the model doesn’t know” into “the model is confidently wrong, and I told it to be.”

One shared trick across the generated references: the build scripts preserve hand-written prose. Anything I write inside a ... block survives every rebuild, while the mechanical tables get overwritten. So I can layer human context onto generated data and not lose it when the next IIQ version ships and I regenerate everything.

The cache, and why it’s allowed to fail

iiq-common-library taking 11 seconds isn’t just an error-handling problem. Because search fans out in parallel, the slowest project IS the floor of total search time. Caching that response collapses the warm cost of every repeated search.

So GitLab responses get cached in Redis when IID_MCP_REDIS_URL is set. Search results, plus the project and group lookups used during scope resolution. A few deliberate choices:

It degrades gracefully, always. No Redis configured: the cache is just a no-op singleton and every call passes straight through. Redis configured but unreachable mid-call: the exception is swallowed, a structured cache.error event is logged, and the code falls back to a live GitLab fetch. The server NEVER fails a tool call because the cache is unhappy. A cache is an optimization, and an optimization that can take down the thing it’s optimizing is a liability.
JSON values, not pickle. Cached payloads are pydantic model dumps, re-validated on read. redis-cli GET returns readable JSON, and there’s no pickle-version coupling to trip over on a redeploy.
Versioned, hashed keys. iid_mcp:v1:<namespace>:<sha256-of-args>. If a payload shape ever changes incompatibly, I bump the version prefix. The old entries become unreachable and expire on their TTL. No migration, no flush.

Default TTL is four hours. Scoped repos change on the order of days, so the staleness window is bounded while the hit-ratio compounding is large. The whole thing’s a drop-in for ElastiCache the day this moves to AWS, with no code change.

Observability that survives a redeploy

Every meaningful thing the server does emits one JSON object on stdout: server.startup, search_iiq.start, search_iiq.project, search_iiq.complete, cache.hit, cache.miss, fetch_iiq_file.complete, and the rest. A timed_event context manager wraps an operation. On exit it logs the event with duration_ms filled in automatically, or on exception it logs at error level with the exception type and message attached. So “how long do searches take in real use” is a jq one-liner against the logs, with no separate metrics stack.

Two things I’m quietly proud of here. First, secrets never reach the logs. The startup event records gitlab_token_present as a boolean, not the token, and the Redis URL gets its credentials redacted to redis://***@host. You can answer “what config did this process boot with” months later without ever having logged a secret. Second, getting clean JSON out of an async web server is a fight, because uvicorn installs its own plain-text handlers that interleave INFO: Started server process lines into your structured stream and break every jq parse. Taming that took mutating uvicorn’s logging config dict in place (reassigning it is too late, the class default already captured the original object reference) and forcing its loggers to propagate up to the JSON handler. More annoying to track down than it should’ve been. 🙂

In production the logs go to the host’s systemd journal via the journald Docker driver, not the default json-file. That’s deliberate. docker logs dies with the container, and I redeploy on every push to main. journald outlives the container, so a week of search-latency and error history survives a redeploy. You query it with journalctl CONTAINER_NAME=iid-mcp -o cat | jq -R 'fromjson? | select(...)', where fromjson? quietly drops any non-JSON startup banner that slipped through.

The transport migration: SSE to Streamable HTTP

The server originally spoke MCP over SSE, a long-lived server-sent-events stream at /sse. That was a mistake for this deployment, and the symptom taught me why. The production server sits behind a Cloudflare Tunnel, and long-lived SSE streams went stale through Cloudflare on container restart or after an idle stretch. The connection looked alive and was dead. Clients had to be restarted to recover.

Streamable HTTP fixes this by construction. It uses short-lived, per-call HTTP requests against a single /mcp endpoint instead of one stream held open for the session’s life. There’s nothing to keep alive, so there’s nothing to go stale. I cut production over to streamable-http only. SSE still exists in the codebase and can be switched back on by changing one line in the Dockerfile, but the deployed container runs the new transport exclusively.

The cutover surfaced a spec-versus-reality wrinkle worth recording. The Streamable HTTP spec says a client MUST send Accept: application/json, text/event-stream on every POST, and the reference SDK enforces it with a 406 Not Acceptable when either type is missing. Real clients don’t all comply. Claude Desktop’s mcp-remote bridge, among others, sends only one type, or a bare */*, and got bounced. The spec’s actual intent is “don’t hand a client a representation it won’t accept,” so I relaxed the guard to exactly that. Accept the request if it’ll take either representation we might return. Reject only a request that accepts neither and offers no wildcard. It’s a small monkeypatch over one SDK method that both code paths derive from, and it logs accept_header.patch_applied once at startup so the deviation from stock behavior is never a mystery.

Deploy and ops

The design rule is one sentence: state is externalized, the process is killable. There’s no local database, no session store, no on-disk cache. Everything lives in GitLab and Redis. That makes the container disposable, which is why redeploy-on-every-push and journald-for-logs both work, and it’s why moving from a Proxmox VM today to ECS or Fargate tomorrow is a deploy change, not a code change.

The current production setup:

A single-stage Docker image built on the official uv Python 3.12 base, running as a non-root user, with dependencies layered separately from source so they cache independently.
A Proxmox VM behind a Cloudflare Tunnel and Cloudflare Access. Browsers get gated by M365 Entra SSO. Programmatic clients use per-user Cloudflare Access service tokens, so revocation is per-user instead of all-or-nothing.
GitLab CI on every push to main: lint, test, build, deploy. A docs-only push skips the build and deploy stages.
118 tests, every one of them mocking GitLab at the HTTP layer with respx and faking Redis with fakeredis, so the suite’s deterministic and never touches the network or a developer’s real .env.

Auth on the server itself is currently a pass-through stub. Cloudflare Access does the real gating at the edge. When this server eventually leaves the controlled network, Entra OAuth replaces the stub, and the seam for it is already there.

What I kept coming back to

A few principles ended up driving most of the decisions. They’re worth stating plainly, because they generalize past this one server.

Ground the model in real source. Never trust its memory of your private code. Every tool exists to replace a confident guess with a fact the model can cite.
Infrastructure degrades the answer, it never destroys it. A slow repo becomes one line in an error list. A dead cache becomes a live fetch. A single-type Accept header still connects. The tool call comes back.
Externalize all state. A killable process is a deployable process, a redeployable process, and a portable process.
Curated knowledge has to stay honest or it’s worse than nothing. The model trusts what you give it and skips re-checking. That trust is the whole value. It’s also the liability the moment the data goes stale, so the discipline around the references isn’t optional.

What’s next

IIQ is the first product, not the only one. The package layout already reserves space for sibling toolsets (tools/isc/, and later Evolvum Midpoint and Fischer Identity) that plug into the same scope, cache, transport, and observability machinery. Each new product brings its own GitLab paths and its own reference data and reuses everything else.

Looking back

If I were starting over, I’d build the partial-failure handling on day one instead of bolting it on after the logs embarrassed me into it. 🙂 Everything good about this server is downstream of one decision: treat the model as something to ground, not something to trust, and never let a piece of infrastructure between it and the truth fail loudly enough to matter.

For now it’s doing the job it was built for. Claude Code sessions write IIQ code against real library source instead of a plausible hallucination of it, the cost of a wrong line shows up at authoring time instead of at import time, and I stopped being the human who has to catch the method that doesn’t exist. That last part is the whole point.