Grounding an AI Coding Assistant in a Codebase It Can’t See

How I built iid-mcp: an MCP server that puts a private SailPoint IdentityIQ codebase in front of Claude Code, Copilot, and Claude Desktop, so they stop guessing.

I write a lot of SailPoint IdentityIQ code: rules, workflows, tasks, plugins, and XML configs that an importer validates against a DTD with more than a thousand elements. So when the good LLM coding assistants showed up, I did the obvious thing and asked one to write a rule for me.

What I got back looked completely plausible and was subtly wrong. It called a method that doesn’t exist on SailPointContext. It invented a helper in com.identityworksllc.iiq.common that nobody ever wrote. It set a type= attribute on a <Rule> that the importer rejects on sight. The code compiled in its head and fell over in mine.

The reason’s simple. The model has read a lot of public Java, but it’s never seen the libraries I actually build against: Instrumental Identity’s iiqcommon, the iiq-common-library, the in-house plugin set, and SailPoint’s own shipped API surface. So it does what models do when they don’t know. It produces a confident average of everything it’s seen and presents it as fact. For IIQ work that average is worse than useless, because it’s wrong in ways that take someone who knows the platform to catch.

The fix isn’t a better prompt. The fix is to stop letting the model lean on memory and hand it the real source instead. That’s what iid-mcp does.

What it is

iid-mcp is a Model Context Protocol server. MCP is the standard that lets an AI client (Claude Code, Claude Desktop, GitHub Copilot in agent mode) call tools you define, over a well-specified protocol, with no bespoke glue per client. You stand up one server, expose a set of tools, and every MCP-aware assistant can use them.

iid-mcp exposes seven tools. Two are live: they reach into Instrumental Identity’s GitLab and search or fetch real library source on demand. The other five are reference tools. They serve curated, queryable knowledge built straight from SailPoint’s own shipped artifacts: the database DDL, the Javadoc, the example configs, the XML DTD.

Tool What it does
search_iiq(query, max_results) GitLab blob search across every in-scope project, in parallel
fetch_iiq_file(project, path, ref) Fetch one file’s full contents from GitLab
get_iiq_patterns() Return the hand-curated patterns reference
get_iiq_schema(table?) IIQ database schema: table inventory, or full column detail for one table
get_iiq_api(name?) IIQ + Instrumental Identity Javadoc: package index, a package’s classes, or one class
get_iiq_examples(key?) SailPoint’s bundled config examples by rule type, workflow, form
get_iiq_dtd(element?) The IIQ XML DTD: element index, or one element’s children and attributes

The whole thing’s about 1,200 lines of Python plus the generated reference data. It’s small on purpose. The hard part was never the code. It was deciding what to feed the model and how to keep that data honest.

The shape of the problem

There are two different jobs hiding inside “help me write IIQ code,” and they want different tools.

The first job is find me real usage. When I’m writing against a helper, I want to see how it’s actually called in code that ships, not how the model imagines it’s called. That’s a search problem against private repos. It has to be live, because the libraries change on the order of days.

The second job is tell me the legal shape of this thing. What columns does spt_identity have on 8.5? What does SailPointContext.getObjects return? Which children is a <Workflow> allowed to have? What Rule types does the DTD actually permit? None of that changes between Tuesday and Thursday. It changes when SailPoint ships a new version. That’s a reference problem, and the right move is to precompute it from the source of truth and serve it fast.

I built both halves into one server because, from the agent’s point of view, they’re the same task: ground this artifact in reality before you generate it. The server’s own instructions even spell out the ordering. Load the patterns, pull the matching example, check the API signature, verify the XML shape against the DTD, then search the live code to confirm. Generate last.

Live search over GitLab

Scope resolution

You configure what the server can see with one environment variable, IID_MCP_IIQ_SCOPE, a comma-separated list. Each entry is either a project path (pub/iiqcommon, used as-is) or a group path (idw/idw-sailpoint/iiq-plugins, expanded to every non-archived project inside it, recursively into subgroups). The default scope resolves to roughly 38 projects.

Resolution caches in memory for the life of the process. The first search triggers it. Every search after that reuses it. Adding a new project to a configured group means restarting the server so the cache re-resolves, which is fine, because the process holds no other state and restarting it is free.

One detail I care about more than it probably deserves: the cold-resolve path is locked. If four search_iiq calls land at once on a fresh process, you don’t want all four fanning out the full scope-resolution traffic against GitLab. The first caller takes an asyncio.Lock, does the work, and populates the cache. The other three wait, then fall straight into the cache hit. The fast path, where the cache is already populated, does a single atomic pointer read and never touches the lock at all.

Parallel fan-out, bounded twice

search_iiq runs a GitLab blob search against every project in scope at once. With 38 projects that’s a lot of concurrent HTTP, so it’s bounded in two places.

Within a single search call, a semaphore caps how many per-project requests are in flight (IID_MCP_SCOPE_MAX_CONCURRENT_SEARCHES, default 8). Across the whole process, a second semaphore caps how many search_iiq calls run at once (IID_MCP_MAX_CONCURRENT_SEARCH_CALLS, default 4). Callers over the limit queue on the semaphore. They don’t get rejected. At the ceiling that’s 8 times 4, so 32 in-flight GitLab requests, which is the number I tuned the timeout and the rate-limit headroom around. The second semaphore has to be built lazily, on first call, because asyncio primitives bind to the running event loop and there’s no loop at import time.

One slow repo shouldn’t sink the search

This is the part I got wrong first and then fixed. The fix is the whole philosophy of the project in miniature.

The original search_iiq did the obvious thing. Gather all the per-project searches, return the combined hits. Then production logs showed a pattern where three of four search calls would fail outright. The cause was iiq-common-library. Its blob search legitimately runs around 11 seconds (every other project comes back in under 3.4), and under load it would tip past the timeout. One project’s ReadTimeout, propagating up through asyncio.gather, took the entire search down with it. The 37 healthy projects had already returned useful hits, and the caller saw nothing but an exception.

So the rule now is: a per-project failure is data, not an exception. Each project’s search is wrapped to catch GitLabError and any httpx.HTTPError, which covers ReadTimeout, ConnectError, and friends. On failure it returns the exception instead of raising. The aggregator splits outcomes into hits and a structured errors list. You get the 37 projects that worked plus a note saying which one timed out and why. Partial results stay useful. The same instinct shows up again in the cache and in the transport layer: infrastructure trouble degrades the answer, it never destroys it.

A returned hit carries everything the agent needs to act: project, path, ref, line number, the matched snippet, and a web_url that deep-links straight to the line in GitLab. The usual loop is search, read the snippets, fetch_iiq_file the one or two that matter, write code grounded in what came back.

The reference tools

The five reference tools share one idea. SailPoint already ships the ground truth for most of what an agent needs to know. It just ships it in formats built for a human with a browser, not a model with a tool call. So I wrote build scripts that parse those artifacts into markdown shaped for retrieval, and tools that serve slices of it on demand.

Schema (get_iiq_schema). SailPoint bundles the database DDL with every release: create scripts and upgrade patch scripts for Oracle, SQL Server, MySQL, and PostgreSQL. A build script parses the base files, applies the patch files in order to get the effective schema, diffs consecutive versions, and generates a reference covering IIQ 8.4 and 8.5 including patch levels. get_iiq_schema() returns the table inventory and common query patterns. get_iiq_schema(table='spt_link') returns the full column list with per-database type differences and the 8.4-to-8.5 diff.

It also flags a real footgun. Oracle and PostgreSQL put function-based UPPER() indexes on string columns, so a query that filters on native_identity without wrapping the bind parameter in UPPER() silently bypasses the index and table-scans. Columns that need it are marked in the reference, so the agent writes the indexed form the first time instead of the slow form you discover in production.

API (get_iiq_api). Javadoc, converted to per-package markdown, merged across three sources: SailPoint IIQ 8.5, pub/iiqcommon, and iiq-common-library. About 940 classes across 93 packages. A class lookup like get_iiq_api('SailPointContext') spans all three sources, so the agent doesn’t need to know which library a helper lives in. Per-class sections get extracted on demand at request time, which means a package file can hold 569 classes (sailpoint.object does) without ever returning all of it in one response. The build script handles both the legacy table-based Javadoc HTML the SailPoint distribution still ships and the modern flexbox layout current JDKs emit, because Instrumental Identity’s libraries get freshly generated docs on a newer toolchain.

Examples (get_iiq_examples). SailPoint ships example configs with IIQ: rules, workflows, forms, quicklinks, dynamic scopes, scoring, email templates. The 8.5 build covers 185 rule examples across 93 rule types, plus the rest. When the agent’s about to write a Correlation rule, get_iiq_examples('Correlation') hands it SailPoint’s own example for that exact type, so the input and output contract and the idiom come from the vendor rather than from the model’s imagination.

DTD (get_iiq_dtd). Every IIQ artifact is XML, and the importer validates it against sailpoint.dtd. That file’s generated dynamically by IIQ’s own DTDGenerator, so I regenerate it straight from the matching jar. The 8.5 DTD has 1,108 elements and 110 legal values in the Rule.type enum. get_iiq_dtd('Workflow') returns the legal children and attributes of a <Workflow> before the agent generates one. Checking here prevents the single most common authoring failure, which is XML that looks right and gets rejected at import time.

Patterns (get_iiq_patterns). The one tool that isn’t generated. It’s a hand-curated reference of the things that come up constantly when writing against these libraries: logging idioms, task base classes, plugin anatomy, the namespace gotcha where iiqcommon and iiq-common-library share a package prefix but hold different classes. The curation rules are strict and I enforce them on myself. Every entry is sourced (file path plus class or method), verified (someone actually ran it), concise (a paragraph and a short code block), and current. Stale entries get removed, not left to rot, because the whole point of the file is that the agent trusts it and skips re-verifying. A stale pattern is worse than no pattern. It turns “the model doesn’t know” into “the model is confidently wrong, and I told it to be.”

One shared trick across the generated references: the build scripts preserve hand-written prose. Anything I write inside a <!-- desc:KEY -->...<!-- /desc --> block survives every rebuild, while the mechanical tables get overwritten. So I can layer human context onto generated data and not lose it when the next IIQ version ships and I regenerate everything.

The cache, and why it’s allowed to fail

iiq-common-library taking 11 seconds isn’t just an error-handling problem. Because search fans out in parallel, the slowest project IS the floor of total search time. Caching that response collapses the warm cost of every repeated search.

So GitLab responses get cached in Redis when IID_MCP_REDIS_URL is set. Search results, plus the project and group lookups used during scope resolution. A few deliberate choices:

  • It degrades gracefully, always. No Redis configured: the cache is just a no-op singleton and every call passes straight through. Redis configured but unreachable mid-call: the exception is swallowed, a structured cache.error event is logged, and the code falls back to a live GitLab fetch. The server NEVER fails a tool call because the cache is unhappy. A cache is an optimization, and an optimization that can take down the thing it’s optimizing is a liability.
  • JSON values, not pickle. Cached payloads are pydantic model dumps, re-validated on read. redis-cli GET returns readable JSON, and there’s no pickle-version coupling to trip over on a redeploy.
  • Versioned, hashed keys. iid_mcp:v1:<namespace>:<sha256-of-args>. If a payload shape ever changes incompatibly, I bump the version prefix. The old entries become unreachable and expire on their TTL. No migration, no flush.

Default TTL is four hours. Scoped repos change on the order of days, so the staleness window is bounded while the hit-ratio compounding is large. The whole thing’s a drop-in for ElastiCache the day this moves to AWS, with no code change.

Observability that survives a redeploy

Every meaningful thing the server does emits one JSON object on stdout: server.startup, search_iiq.start, search_iiq.project, search_iiq.complete, cache.hit, cache.miss, fetch_iiq_file.complete, and the rest. A timed_event context manager wraps an operation. On exit it logs the event with duration_ms filled in automatically, or on exception it logs at error level with the exception type and message attached. So “how long do searches take in real use” is a jq one-liner against the logs, with no separate metrics stack.

Two things I’m quietly proud of here. First, secrets never reach the logs. The startup event records gitlab_token_present as a boolean, not the token, and the Redis URL gets its credentials redacted to redis://***@host. You can answer “what config did this process boot with” months later without ever having logged a secret. Second, getting clean JSON out of an async web server is a fight, because uvicorn installs its own plain-text handlers that interleave INFO: Started server process lines into your structured stream and break every jq parse. Taming that took mutating uvicorn’s logging config dict in place (reassigning it is too late, the class default already captured the original object reference) and forcing its loggers to propagate up to the JSON handler. More annoying to track down than it should’ve been. 🙂

In production the logs go to the host’s systemd journal via the journald Docker driver, not the default json-file. That’s deliberate. docker logs dies with the container, and I redeploy on every push to main. journald outlives the container, so a week of search-latency and error history survives a redeploy. You query it with journalctl CONTAINER_NAME=iid-mcp -o cat | jq -R 'fromjson? | select(...)', where fromjson? quietly drops any non-JSON startup banner that slipped through.

The transport migration: SSE to Streamable HTTP

The server originally spoke MCP over SSE, a long-lived server-sent-events stream at /sse. That was a mistake for this deployment, and the symptom taught me why. The production server sits behind a Cloudflare Tunnel, and long-lived SSE streams went stale through Cloudflare on container restart or after an idle stretch. The connection looked alive and was dead. Clients had to be restarted to recover.

Streamable HTTP fixes this by construction. It uses short-lived, per-call HTTP requests against a single /mcp endpoint instead of one stream held open for the session’s life. There’s nothing to keep alive, so there’s nothing to go stale. I cut production over to streamable-http only. SSE still exists in the codebase and can be switched back on by changing one line in the Dockerfile, but the deployed container runs the new transport exclusively.

The cutover surfaced a spec-versus-reality wrinkle worth recording. The Streamable HTTP spec says a client MUST send Accept: application/json, text/event-stream on every POST, and the reference SDK enforces it with a 406 Not Acceptable when either type is missing. Real clients don’t all comply. Claude Desktop’s mcp-remote bridge, among others, sends only one type, or a bare */*, and got bounced. The spec’s actual intent is “don’t hand a client a representation it won’t accept,” so I relaxed the guard to exactly that. Accept the request if it’ll take either representation we might return. Reject only a request that accepts neither and offers no wildcard. It’s a small monkeypatch over one SDK method that both code paths derive from, and it logs accept_header.patch_applied once at startup so the deviation from stock behavior is never a mystery.

Deploy and ops

The design rule is one sentence: state is externalized, the process is killable. There’s no local database, no session store, no on-disk cache. Everything lives in GitLab and Redis. That makes the container disposable, which is why redeploy-on-every-push and journald-for-logs both work, and it’s why moving from a Proxmox VM today to ECS or Fargate tomorrow is a deploy change, not a code change.

The current production setup:

  • A single-stage Docker image built on the official uv Python 3.12 base, running as a non-root user, with dependencies layered separately from source so they cache independently.
  • A Proxmox VM behind a Cloudflare Tunnel and Cloudflare Access. Browsers get gated by M365 Entra SSO. Programmatic clients use per-user Cloudflare Access service tokens, so revocation is per-user instead of all-or-nothing.
  • GitLab CI on every push to main: lint, test, build, deploy. A docs-only push skips the build and deploy stages.
  • 118 tests, every one of them mocking GitLab at the HTTP layer with respx and faking Redis with fakeredis, so the suite’s deterministic and never touches the network or a developer’s real .env.

Auth on the server itself is currently a pass-through stub. Cloudflare Access does the real gating at the edge. When this server eventually leaves the controlled network, Entra OAuth replaces the stub, and the seam for it is already there.

What I kept coming back to

A few principles ended up driving most of the decisions. They’re worth stating plainly, because they generalize past this one server.

  • Ground the model in real source. Never trust its memory of your private code. Every tool exists to replace a confident guess with a fact the model can cite.
  • Infrastructure degrades the answer, it never destroys it. A slow repo becomes one line in an error list. A dead cache becomes a live fetch. A single-type Accept header still connects. The tool call comes back.
  • Externalize all state. A killable process is a deployable process, a redeployable process, and a portable process.
  • Curated knowledge has to stay honest or it’s worse than nothing. The model trusts what you give it and skips re-checking. That trust is the whole value. It’s also the liability the moment the data goes stale, so the discipline around the references isn’t optional.

What’s next

IIQ is the first product, not the only one. The package layout already reserves space for sibling toolsets (tools/isc/, and later Evolvum Midpoint and Fischer Identity) that plug into the same scope, cache, transport, and observability machinery. Each new product brings its own GitLab paths and its own reference data and reuses everything else.

Looking back

If I were starting over, I’d build the partial-failure handling on day one instead of bolting it on after the logs embarrassed me into it. 🙂 Everything good about this server is downstream of one decision: treat the model as something to ground, not something to trust, and never let a piece of infrastructure between it and the truth fail loudly enough to matter.

For now it’s doing the job it was built for. Claude Code sessions write IIQ code against real library source instead of a plausible hallucination of it, the cost of a wrong line shows up at authoring time instead of at import time, and I stopped being the human who has to catch the method that doesn’t exist. That last part is the whole point.

Building a Personal AI Assistant (FinkBot)

I work across too many contexts at once. Three email accounts. Two calendars. Work Teams. Personal Slack. A Gitea instance for side projects. Home Assistant watching over a thousand devices. Notes in Joplin. Tasks scattered across wherever I dumped them last week.

For years I tried to manage this with dashboards, integrations, and sheer willpower. None of it stuck. What I actually wanted was something that already knew what was going on and could just tell me, without me having to ask in exactly the right way or remember which system held which piece of information.

So I built FinkBot.


What It Does

FinkBot is a personal AI assistant that runs entirely on my home network. It continuously ingests data from every corner of my digital life: email, calendar, chat, code repos, smart home sensors, task lists, notes. It indexes everything into a searchable memory store and uses that context to:

  • Send a morning and evening briefing every day
  • Prep me for meetings 30 minutes before they start, pulling together context on who I’m meeting with and what we’ve been working on
  • Alert me to things that matter: security sensor trips, appliances left on, unusual comms gaps, infrastructure pressure
  • Answer ad-hoc questions in Discord (“what did I send Dave last week?”, “who is attending this meeting?”)
  • Suggest home automation actions and let me approve them with a single reaction
  • Announce context-aware briefs to my Echo Show when I walk into a room
  • Run a weekly self-reflection that evaluates its own output quality and proposes improvements

All without sending a single byte of personal data outside my LAN.


The Stack

Data Sources

FinkBot polls and ingests from:

  • Email: three accounts, IMAP for personal and university, Microsoft Graph API for work, Dovecot archive for historical backfill
  • Calendar: Microsoft 365 plus Nextcloud CalDAV for personal events
  • Chat: Microsoft Teams DMs and Slack channels
  • Code: Gitea commits, issues, and PRs across all my repos
  • Notes: Joplin (everything I have ever written down)
  • Tasks: Nextcloud Tasks via CalDAV
  • Smart home: Home Assistant with over 1,000 entities covering presence, appliances, energy, sensors, climate, plus dedicated monitors for TrueNAS pool health and Proxmox container pressure

Each source has its own Prefect flow with a schedule tuned to how often that source changes. Email checks every 5 to 10 minutes. Smart home every 5. Git repos every 2 hours. Joplin and Slack every 30 minutes.

Two-Tier Memory

FinkBot uses a two-tier memory architecture, and the distinction matters.

ChromaDB (chroma.crosscreek) is the raw recall layer. Over 257,000 documents, everything ingested from every source, embedded and stored. When FinkBot needs to answer “what did I send Dave last week?”, this is where it looks. Semantic search, no structure, just relevance.

MemU (memu.crosscreek) is the distilled long-term layer. It accumulates synthesized facts over time: things I’ve manually added with /context add, outputs approved from the weekly self-reflection, curated patterns. It is not a replacement for Chroma. It is the part of memory that has been thought about. Raw search and distilled knowledge serve different purposes and live in different stores.

A middleware layer in the memory client handles all query logging transparently. Individual flows don’t think about it. Queries from Discord get tagged separately from background automation queries so the self-reflection loop can distinguish what I’m actually asking about from what the system is doing on its own.

The Entity Graph

Layered on top of the vector stores is a structured knowledge graph backed by Neo4j Community Edition, running at neo4j.crosscreek. It tracks four node types: Person, Company, Project, and Topic. Relationships include EMAILEDWORKS_ATINVOLVED_INCOLLABORATES_WITH, and others.

The move to Neo4j from an earlier embedded graph database was driven by one practical problem: the embedded approach had a single-writer bottleneck. Multiple flows running concurrently would contend for the write lock. Neo4j’s MVCC gives concurrent readers and writers without coordination overhead, and the Bolt protocol means any flow or API endpoint can connect remotely without file locking concerns.

The graph gets populated automatically from email processing, Joplin notes, Slack analysis docs, calendar attendee extraction, and a deterministic upsert on every Gitea repo. A dedicated backfill flow processes the historical Dovecot email archive on an hourly schedule, steadily growing the graph from years of past correspondence. A separate curation flow runs weekly to merge duplicate nodes and filter noise.

The entity graph answers questions the vector store cannot. “Who am I meeting with today, and what do I know about them?” The daily briefing pulls today’s calendar attendees, looks each one up in the graph, and injects a “People you’ll meet today” section into the prompt before it ever touches Chroma. Structured facts first, semantic context second.

The Flows Layer

All orchestration runs on Prefect, deployed to a 4GB/4-core LXC (prefect.crosscreek). Prefect replaced an earlier n8n-based setup. The GUI-drag-drop approach was fine until I needed version control, testability, and the ability to do something non-trivial in a node. Python and git won.

Active flows currently running:

FlowSchedulePurpose
ha_monitorevery 5 minSecurity, appliances, presence, TrueNAS, Proxmox
meeting_prepevery 5 min30-min lookahead prep briefs
calendar_actionsevery 5 minRule engine over upcoming meetings x HA state (pauses media, etc.)
ha_suggestionsevery 5 minRule-based HA suggestions with one-tap Discord approval
proactive_voiceevery 60 secPresence transitions trigger TTS brief on Echo Show
pattern_briefingsevery 5 minFires focused briefs 15 min before scheduled pattern moments
email_monitor_*5 to 15 minThree accounts, IMAP + Graph API
calendar_syncevery 30 minM365 + Nextcloud to memory
slack_ingestorevery 30 minSlack channels to memory
joplin_ingestorevery 30 minNotes to memory
chat_ingestorhourlyTeams DMs to memory
entity_backfillhourlyHistorical Dovecot emails to Neo4j entity graph
daily_briefing8am + 5pm ETMorning and evening briefings
graph_curationweeklyNeo4j merge suggestions, noise filtering
memu_curationweeklyMemU near-duplicate cleanup
self_reflectionSunday 8pm ETWeekly synthesis, proposals, draft PRs
memory_defragSaturday 7pm ETExpire stale entries, corpus stats
watchdogcontinuousAuto-cancel stuck runs, hard-kill past threshold

A few things I have learned running these in production:

In-process concurrency guards beat deployment-level limits. High-frequency flows call check_self_concurrency() at startup and exit cleanly if another instance is already running. Relying on Prefect’s deployment-level concurrency limits alone left edge cases where crashed runs didn’t release their slots. Explicit guards are more reliable.

Startup reconciliation matters. After a crash or restart, the server can have zombie “running” states for flows that are no longer actually running. A startup reconciliation pass cleans these before new runs start, preventing phantom concurrency blocks.

CalDAV clients need timeouts. My NAS can be slow. A DAVClient without timeout=10 will hang indefinitely. Flows that run every 60 seconds cannot afford that.

Don’t alert on things you can’t fix. Meeting prep runs every 5 minutes. CalDAV errors go to print(), not the Discord error channel. Nobody needs a ping every 5 minutes because the CalDAV server hiccupped.

The API Bridge

A FastAPI service on port 8003 acts as the hub connecting flows, the Discord bot, and the kiosk. The bot never touches memory or the entity graph directly. It POSTs to the API and gets a response. Logic stays centralized.

Key endpoint groups: briefing and prep triggers, Home Assistant action execution (with Discord approval gating), entity graph CRUD, MemU memory management, pattern automation management, Prefect watchdog controls, kiosk announcement queue, and a transcription and TTS pipeline for the Echo Show.

There is also an HTTPS endpoint on port 8443 serving the kiosk dashboard. It required HTTPS because getUserMedia only works in a secure context.

The Echo Show Kiosk

The dashboard on my Echo Show (running vanilla android) has become more than a status screen. It shows live calendar, unread message counts, current tasks, and a Home Assistant home status panel. More interestingly, it now listens for a wake word via openwakeword, streaming audio from the browser to a Python WebSocket server. When the wake word fires, it triggers a context-aware TTS brief. The proactive voice flow also pushes announcements to the kiosk when presence transitions happen, so walking into the room can trigger a summary of what’s coming up.

The Discord Bot

Discord is my primary interface. Current slash commands:

  • /brief – trigger a morning or evening briefing on demand
  • /prep – meeting prep for a specific event
  • /search and /search-email – semantic memory and Dovecot archive search
  • /remember – store a memory manually
  • /who – entity graph person lookup
  • /context – manage persistent knowledge file without SSH
  • /task and /quicktask – Nextcloud Tasks management
  • /cal – natural language calendar query
  • /memory – corpus management (stats, forget, curate)
  • /reflect – trigger self-reflection immediately
  • /status – system health check
  • /ark – ARK server management (yes, the game server lives here too)

Reaction handlers let me take action on messages without typing. Thumbs up and thumbs down on briefings feed the engagement feedback loop. Checkmark or X on HA suggestion messages triggers or dismisses the action. A book reaction on meeting prep saves notes to Joplin. A no-entry reaction blocks an email sender. These reactions are the primary interface for approving anything the system proposes.


Design Decisions

Local-First, Always

No personal data leaves 192.168.48.0/24. That is the constraint the architecture is built around. Email, calendar, chat, smart home state: none of it touches an external service.

For LLM inference, the system runs a tiered approach. A 16GB M4 Mac Mini runs Ollama with qwen2.5:14b for heavy inference (briefings, meeting prep, self-reflection) and qwen2.5:7b for lighter work. Anthropic’s API is configured as a fallback and is also used for high-value one-off tasks like code change proposals, where output quality matters more than token cost. But the default path for everything is local.

This split was a deliberate choice. High-volume extraction work like the entity backfill runs local exclusively. Running the full Dovecot email archive through the Anthropic API would cost real money and send personal email content to an external service. Neither is acceptable. With local Ollama it costs nothing and stays on the network.

The fallback to Anthropic exists for resilience, not as a cost optimization. If the Mac Mini is down, the system degrades gracefully rather than going silent.

Why Qwen2.5? Instruction following is strong, JSON output mode is reliable (critical for entity extraction), and the quantized models fit the hardware. The 14B at Q4_K_M runs comfortably within 16GB unified memory while leaving headroom for everything else.

The Entity Graph Upgrade

The move from an embedded graph database to Neo4j is the biggest architectural change in the last few months. The embedded approach worked fine for read-heavy lookups but fell apart when multiple flows needed to write simultaneously. Kuzu’s single-writer model meant contention, and flows running on tight schedules can’t afford to queue behind each other for graph writes.

Neo4j’s MVCC handles concurrent writers cleanly. The Bolt protocol means any process on the network can connect without worrying about file locking. And having a proper query interface makes ad-hoc exploration and curation much easier. The graph curation flow runs weekly, suggesting node merges and filtering noise via Cypher queries that would have been awkward to express in the embedded model.

Two-Tier Memory Is Not a Migration

An earlier version of the architecture treated MemU as a replacement for ChromaDB, something to migrate to once the hardware was ready. The current design treats them as doing different things.

Chroma is raw storage. Everything ingested lands there. Semantic search across 257,000 documents is fast and works well. The limitation is that it treats every document as equally relevant. There is no way to ask “what do I actually know about this person?” and get a distilled answer rather than a pile of email snippets.

MemU is for things that have been synthesized. Approved self-reflection outputs. Manually added context notes. Curated patterns. It is smaller, more intentional, and represents knowledge that has been validated rather than just observed. Briefings and prep queries can pull from both layers and get different things from each.

The Self-Learning Loop

Every Sunday at 8pm, self_reflection runs. It reads two weeks of feedback: briefing reaction rates, Discord query patterns, entity graph growth, MemU accumulation. It queries memory across all sources for a weekly snapshot. It passes everything to an LLM and asks what is working, what is not, and what should change.

The synthesis produces two actionable outputs.

Context proposals are facts or patterns that should become permanent knowledge. Each one appears in Discord as a bookmark message. I react with a checkmark to approve or X to reject. Approved items get appended to /opt/finkbot/finkbot_context.txt, which is injected into every future self-reflection prompt. The system accumulates knowledge from its own outputs over time.

Code change proposals are improvement ideas formatted as PR descriptions. Each one opens a draft PR in the FinkBot Gitea repo and posts the URL to #insights. I review through normal CI/CD, or close it if the idea isn’t worth pursuing.

The pattern automation system extends this further. self_reflection can also write to a patterns file, which pattern_briefings reads to fire focused context briefs on a schedule. If self-reflection notices that Monday mornings are always context-switching heavy, it can propose a pattern that fires a tailored brief every Monday at 7:45am. I approve it once and it runs every week.


Infrastructure

Everything runs on a Proxmox cluster on my home network.

ServiceHostNotes
Prefect flows + APIprefect.crosscreek (LXC 203)4GB RAM, 4 cores
ChromaDBchroma.crosscreekraw memory backend
MemUmemu.crosscreek (LXC 204)distilled memory
Neo4jneo4j.crosscreekentity graph, Bolt protocol
Discord botprefect.crosscreekthin bot, talks to API
Home Assistantha.crosscreek:81231000+ entities
Giteagit.mystikos.orgsource of truth + CI/CD
Ollama (14B + 7B)16GB M4 Mac Miniprimary inference

CI/CD runs through Gitea Actions. Pushing to main rsyncs the relevant files to each target host and restarts the appropriate systemd services. The servers are not git clones. They are deploy targets. Code changes go through the repo, not SSH sessions on the server.


What’s Next

Since original publication (May 2026)

A month later, several “What’s Next” items have shipped, and a few new ones have emerged. The headlines:

A/B prompt variants are live. A modifier-delta framework in flows/common/prompt_variants.py picks a deterministic variant per ET calendar day (sha256 of flow:iso-date, modulo variants), so morning and evening briefings on the same day always share a variant. Engagement is logged per-variant via the existing 👍/👎 pipeline; self_reflection surfaces a per-variant comparison block in its weekly LLM prompt only when more than one variant has fired. Wired into daily_briefingmeeting_prep, and pattern_briefings. Promotion is still manual — at two briefings per day, engagement rate is too noisy for auto-promotion.

Time-scoped /ask. Queries that carry both a question phrase (“what was going on”, “recap”, “tell me about”) and a temporal marker (“last week”, “in March”, “Q1”) now route through api/temporal_intent.py: window parsing (deterministic fast path with an LLM fallback), MemU recall + by-ID fetch of pattern/anomaly detection docs from that period, and Anthropic synthesis. The pipeline also cross-references current Nextcloud tasks so historical task mentions get noted as resolved when they’re no longer open. Intent detection requires both signals — either alone produced too many false positives.

Pattern automations end-to-end. pattern_detector proposes ⏰ automations when it spots a recurring behaviour with a clear schedule. Mark taps ✅, the bot writes to /opt/finkbot/patterns.jsonl, and a new pattern_briefings flow polls every 5 minutes and fires a focused pre-briefing 15 minutes before each pattern’s next cron-scheduled time. /patterns list/add/remove Discord commands round it out.

Two new self-improvement loops.

Decision journal: every privileged reaction (task/context/graph-merge approvals, HA actions, watchdog kills, blocklist edits) writes a row to feedback_log.jsonlself_reflection mines this weekly to propose suppression rules like “Mark rejected 4 task proposals from Client X this week — suppress them.”

Thumbs-down post-mortem: a 👎 on a briefing fires a fire-and-forget task that compares the disliked briefing to recent 👍’d baselines, asks the LLM for a single-sentence suppression rule, and routes it through the existing context-proposal approval flow. Approved rules append to finkbot_context.txt and feed every future reflection prompt.

Incident learning. Watchdog auto-cancels and startup-reconcile cleanups now log to feedback_log.jsonl with noise-filter thresholds (5 zombies, 20 backlog) so only anomalous resilience events surface. self_reflection mines them and proposes timeout/threshold tweaks as draft Gitea PRs. The thresholds are load-bearing: a clean restart after every deploy clears 1–3 zombies, and without the floor the weekly report would propose “fix deploy restart” every week and drown the real signal.

Home Assistant action suggestions. Two new flows post 🏠 one-tap action suggestions to #alertsha_suggestions (rule engine over HA state, e.g. “everyone’s away and the front door is unlocked: lock it?”) and calendar_actions (rule engine over upcoming meetings × HA state, e.g. “pause the media player, standup starts in 5 min”). Both share a narrow allowlist keyed by domain/service so a misbehaving rule can at worst propose something the allowlist rejects.

Chroma HNSW rebuild (2026-04-25). Long-running where-filter 500s caused by orphaned IDs in the original collection are gone. scripts/rebuild_chroma_hnsw.py copied 260,022 docs into a fresh finkbot_memory_v2 collection; 255,835 with preserved embeddings, 4,200 (slack/gitea orphans) re-embedded via the proxy’s default embedding function. The 500-retry-without-where fallback in the client stays as a canary; if it fires again, something regressed.

Mac Mini hardening. Intermittent 1–2 hour Ollama outages through April were traced to two causes. First: launchctl setenv is per-launchd-session and lost on reboot, so an earlier OLLAMA_NUM_PARALLEL=2 quietly dropped to 1 after the next reboot, serializing every caller behind whichever flow was currently running and starving live flows during backfills. Fixed by baking the env var into a LaunchDaemon plist. Second: macOS’s manual Wi-Fi mode installs only interface-scoped default routes, which Go-based clients (Ollama included) ignore for outbound TCP. A second LaunchDaemon now installs a global default route at boot. A 2026-04-27 outage was a third issue: Ollama.app autostarted from Login Items, won the port-11434 race, and the LaunchDaemon failed to bind 78 times in a row. Removing the Login Item closed it. The new ollama_health_check flow catches any future variant in <10 minutes.

Qwen3 rollback (2026-04-21). Tried upgrading to qwen3:14b/qwen3:8b. Qwen3 ships with thinking-mode on by default, adding 60–180 s of internal monologue per call. Within two hours, three flows had blown their timeouts. Rolled back the same day. Models stay on disk pending an /api/chat-with-think:false switch or the 48GB Mac Mini, where the thinking budget will be affordable.

Neo4j round-trip optimization. get_person_context() collapsed from six Cypher queries to one with CALL {} subqueries. /who warm latency dropped from ~40ms to ~12ms.

What’s still next

  • Multi-step tool-use loop in chat: bounded agent loop with a tool manifest (calendar, memory, HA read/write, task create), hard step cap, and an approval reaction before any external-effect tool fires. The hard problem isn’t the loop, it’s the UX. A 30-second synchronous reply is unacceptable; this needs a “working on it…” ack with async completion.
  • Raw Chroma in temporal queries: the corpus has no indexed timestamp metadata, so post-filtering 260k docs by parsing bodies is too slow. Either backfill a date field or wait for the Chroma proxy to gain a date-range query.
  • 48GB Mac Mini arriving ~early June: Ollama with qwen2.5:72b, IDBot’s primary inference host, second Neo4j and Chroma instances for company namespace.
  • IDBot ↔ FinkBot cross-pollination: deferred until IDBot has ≥1 month of real data. Transport is solved by namespacing; the hard problem is the summary taxonomy: what’s safe for one bot to surface to the other. Design cold and you invite leaks.

Closing Thoughts

The thing that surprised me most building this was how much of the value comes from the plumbing, not the LLM. The dedup tracker. The concurrency guards. The two-tier memory split. The entity graph that knows Dave works at the same company I do and we’ve emailed 47 times. The feedback log that quietly records every reaction and query without any individual flow caring about it.

The LLM is the tip of the iceberg. Everything below it is data engineering and operational discipline.

If I had to do it over I would have started with Prefect instead of n8n. The version control alone was worth the migration cost. I would have put the dedup tracker in on day one. And I would have moved to Neo4j earlier. The embedded graph was fine until it wasn’t, and the migration was more work than switching from the start would have been.

The Band Anna – Gig On Main 2023

In 2023 The Band Anna played its last show and I feel we went out on top. Opening in Irwin’s annual “Gig On Main” show that was headline by the Clarks we played a full set IN THE POURING RAIN to an increasingly larger and more enthusiastic crowd. The Band Anna was:
– Brianna Acalotto – vocals
– Jesse Bergman – guitar, vocals, keyboard
– Mark Earnest – guitar, vocals
– Aaron McConnell – bass, guitar, vocals
– Matt Omler – drums

I’ve pulled out a few of the songs to post on YouTube and add some comments. Note that between the wind and rain, the audio (and for that matter video) is not ideal, but still represent I believe how we sounded in general at this time.


Boys of Summer has been a staple of ours from the very beginning. I feel like this one went a little fast (I can always tell when we get to the breakdown part and I find I have to scramble to get the syncopated part in) but still came off well. Also note that we are really shooting for more of The Ataris version and less Don Henley’s version, and as such I’m taking several overlapping guitars and trying to do them with one guitar, especially in the solo.


Truth be told, Under Pressure wasn’t even on our setlist, this was just something we tossed in as a soundcheck, but it came out pretty well and I wanted to preserve it. Jesse and I had a number of cues between each other in this song to verify we are where we think we are (this is literally the only song I ever count bars in, especially at the end where chord changes come at really non-intuitive places) so you will see us signaling behind Brianna and Aaron a few times. I am also very much a boring stationary performer but for some reason I kind of get into this one and almost move a bit (awkwardly)


Your Love is a fun song to perform, and Aaron and I have a little Easter Egg at the end we both love to do. If you listen in the outro we both play various parts from Fleetwood Mac’s Go Your Own way (bass and guitar solo respectively). This is also one of the few songs I actually sing loud on rather than just background harmonies.


Flagpole Sitta is a rare Jesse vocal led song, this one is just pure fun. High energy and a little silly. The guitar part is fairly simple, so I try to mix up how I am attacking each section.


This is the Fall Out Boy’s version of I Want to Dance with Somebody. I don’t remember exactly why we didn’t do the key change at the end, so it feels a little flat there to me, but still a jam and usually a crowd-pleaser.


Brooklyn was a late addition to our lineup, and I don’t have a lot of recordings of it. Also, my guitar was a bit quiet on this version which annoys me, but it is what it is. Another Jesse vocal performance, Jesse introduced us to this song and Patrick Droney in general which was a good find. I like to think with more time we would have really done this song justice (I’m playing a much more basic guitar part than I would have liked).


LOOK AT HOW HARD IT WAS RAINING DURING THIS! Also just a fun song to jam on


For anyone curious about my gear during this, the guitar is a Rick Turner Model 1, amp is a Peavey Classic 30 and the pedalboard was exactly set up like this (although much wetter by the end of the show).

A Better Human: Book Review

I’ve recently finished A Better Human, the debut novel from J Donald (if you’ve watched any of the videos on my music page, he’s the other guitarist), and I was blown away by how good this is. When I read novels, I like to mentally slot them into genre tropes, and this one subverted my expectations throughout. At various times I saw influences from sources as disparate as Invincible, The Walking Dead, Horizon Zero Dawn, Stranger Things, and even Dune (which the author hadn’t read, but somehow managed to channel in his fight scenes) yet it wasn’t really any of those. This book charts its own course and keeps you engaged from page one.

Two things I really appreciated plot-wise were the world building and character development. J doesn’t infodump or rely on the classic “as you know…” conversations; instead, he drops hints and builds the world organically in a way that feels natural. On the character side, books with large casts usually make it easy to lose track of minor characters or disregard them entirely, not here. Everyone has an arc and plot relevance, and some of the most gut-wrenching revelations are reserved for characters you’d initially write off as background filler.

From a structure standpoint, the chapter arrangement is particularly effective. I’m not a fan of books that end every chapter on a cliffhanger to keep you turning pages (cough Dan Brown), but this book takes a different approach: chapters often finish on a plot revelation, then sometimes the next jumps back in time to explain what just happened or set up what’s coming. It ends up feeling natural, allows for satisfying stopping points, and makes for an almost Memento-style storytelling experience, albeit much less confusing.

All in all, highly recommend.

WordPress Theme

I made a WordPress theme (you are soaking in it) to match the aesthetics of my Directory Master program, which in turn lovingly appropriated the idea from the old DOS Norton Utilities program (which, along with Wordperfect 5.1 represents the pinnacle of user interface design, it’s been downhill since).
Here is a screenshot of what it might look like



In addition to the nice edge effect and animated menu buttons, It has some rudimentary visual editor blocks like:

cool 3d blocks

And even

cooler inverse 3d blocks

And tables. Can you tell I really like the 3d bar effect?

ProgramMy UI rating on a scale of 1-10
Norton Utilities 510
Word Perfect 5.19
Windows 3.112
Windows 951

Source code and installable WordPress theme zip file can be downloaded from my github

Guitar Pedalboard Evolution

For a long time, I had an ART ECC 1 as my multi effects pedal. Around 2008 I decided I wanted to start using individual analog pedals. Over the decades (ugh) I have taken sporadic snapshots of the pedalboard before certain gigs or after major renovations and so I present here: the evolution of my pedalboard.

It’s 2008, I discovered General Guitar Gadgets kits (highly recommended) and built a germanium fuzz face, a triangle era Big Muff Pi, and a PT-80 delay. I also picked up an MXR Phase 90 and Dynacomp from the local music store, a Behringer Vintage Tube overdrive, and a sporadically functional wah pedal from Craigslist. What better way to organize these than on a panel from an IKEA desk with Lowe’s cabinet handles screwed on? I gigged with this setup for far too long before purchasing a proper board.

At this point my pedal building problem clearly comes into focus. Let’s take the signal path in order.

The Wah pedal has been gutted and replaced with GGG’s Mod Wah board including a rotary switch for 5 different tone caps (to adjust the Q position) and a three way selector for different inductors. I also have a switch to flip the input and output to achieve the loud “seagul” sound that can be heard on the middle part of Pink Floyd’s Echoes. Next is the the GGG Germanium Fuzz Face, followed by a dual pedal setup consisting of an always on buffer and dual compressors: a Ross and and a Dan Armstrong Orange Squeezer. Next is the GGG Big Muff Pi, a BYOC (Build Your Own Clone, sadly out of business) Tube Screamer (with multiple tone stacks and clipping options). Finishing out my overdrive chain is a Bajaman designed Real Tube Overdrive I built from his schematics and housing a 12AX7 tube.
For modulation I’m uncharacteristically lacking in this era. I swapped out the Phase 90 for a GGG Phase 45 kit, this pedal will nearly always be on my board and is something I consider a bit of a secret weapon for my tone. Rounding it out is a Deluxe Mistress Flanger I build from a GGG PCB (and some scavenged SAD1024 chips that I still hoard jealously) and my PT-80 delay, now featuring a “delay doubler” switch.

Now clearly entering my “too big for the IKEA wood slab” phase. Everything above, plus the addition of a Neovibe build from R.G. Keen’s specs and an unlabeled pedal that I believe is a BYOC Analog Chorus.
This is the pedalboard used on the Shadows Of Eve album Nowhere But Here.

Not much to add here. No new pedals but I did finally get my power distribution a bit under control with a power strip. Notice that the Neovibe needs an 18v DC power adapter and the Bajaman Tube Overdrive requires an inconvenient 16v AC power adapter which will account for it occasionally dropping off my board when space is at a premium.

Many changes. Change #1: A proper pedal-board with power and audio jacks. Change #2: labels are convenient, lose them. Change #3: Put an official Deluxe Mistress Flanger on my board to replace the homemade one which was on the bench being re-aligned. Nothing much new here, signal path is GGG Fuzz Face, Bi-Compressor, BYOC Rat clone, BYOC Bug Miff Pi, Bajaman Tube Overdrive, GGG Phase 45, BYOC Analog Chorus, GGG Tremelo, Flanger, Delay. This is the pedalboard used to record Broken World at Audible Images studio (This is a photo from that session)

My short lived attempt at being symmetrical. Also clearly my aesthetic of unlabeled pedals was giving way to at least trying to minimally label the knobs. A few new additions here are the excellent BYOC Stereo Flanger (mixed in with a Boss Line Selector) and Ditto looper. Also finally broke down and got a tuner for stage use. Otherwise my usual signal path of compressor, BYOC Big Muff Pi, Bajaman Tube Overdrive, Phase 45, Chorus, Tremolo, Flanger, Delay, Looper.

I briefly experimented with an Electro Harmonix POG2, but it definitely was not for me. I also built a BYOC programmable looper to make switching pedal on and off easier (and to program presets of groups of them). Otherwise nothing new here but a surprisingly minimal overdrive selection with only the Bajaman Tube Overdrive and Big Muff Pi.

Wah makes a reappearance and will very rarely leave my board from here out. I also have the #0001 serial number Sublime Pep-Pep Delay Pedal with custom graphics adding some rare color.

Clearly going for quantity over quality, I went through a period of time where I ditched nearly all of my homemade pedals for inexpensive Mooer clones. Included here are Big Muff Pi, Blues Driver, Boost, Blue’s Crab, Electro Flanger, and Analog Chorus along with my Ditto Looper, Pep Pep delay, a rare appearance by an EQ, and a BIG honking Univibe which I believe is why I switched to so many smaller pedals to try and fit it on the board. Mooer pedals are not the highest quality audio and they didn’t last long but I will say the Flanger is very high quality and matches up against both my Deluxe Mistress Flanger clone and the offical Electro Harmonix version.

Obviously neatness was not the goal here. This appears to be a random mix of quickly tossed together Mooer pedals with the Phase 45 and Pep Pep delay. The unfamiliar carpet and mess of wires leads me to believe this was taken at a show so it likely was an impromptu setup.

At this point I was going back to my roots of all homemade analog pedals. And again also no labels which makes identifying these kinda tricky many years later. I believe the signal path here is Wah, Bi-Compressor, Bajaman Tube Overdrive, Big Muff Pi, Rat, Phase 45, Chorus, Tremelo, Univibe, Flanger, Delay.

I had this new pedalboard for a while so I am surprised I only have one photo of it that I can find. A mix of previously seen homemade and Mooer pedals (including the Slow Engine to do Vertical Horizon’s Everything You Want) this is noteworthy because it was during a time where I was trying to use my Peavey Classic 30’s overdrive channel (thus the Peavey pedal on the left) and putting my modulation pedals in the amp’s effects loop. I didn’t maintain this setup long due to the fact that it required 4 cables (three in a bundle going back and forth between the pedalboard and amp). This is also marks the first appearance of a classic Boss CS-2 compressor to replace my large bi-compressor pedal and the MXR mini phaser pedal to replace my GGG Phase 45 (this pedal does both 90 and 45 but I only use the 45 setting)

New pedalboard with enough space to store everything I want, with power banks and adapters stored underneath. Wah and looper pedals return, as does the Mooer boost, Slow Engine, Flanger, and Chorus. An new addition is the Danelectro Big Spender spinning speaker simulator my wife got me which stayed in my lineup for a number of gigs.

Two notable additions here are a BK Butler Tube Driver to replace the Bajaman Tube Overdrive (not my preference, I actually lost the Bajaman pedal and the bi-compressor at some point but I will be building replacements). Also new to the pedal board is a dual pedal I built to house two Cornish pedal clones, the G2 and SS2.

The original GGG fuzz face returns in a custom made (sandblasted) circular enclosure, the GGG PT-80 delay and Phase 45 return along with a BYOC RAT, Chorus, Tubescreamer, Big Muff Pi, and Flanger. This is the pedalboard (and actual settings) used for this show.