LLM - Mark Earnest

I’ve been running OpenClaw on a Raspberry Pi 4 for a while. It’s a self-hosted AI assistant that lived in a terminal, had access to my email history and chat logs, and could answer questions about my own life with reasonable accuracy. It worked, mostly, but it was fragile. The Pi would fall over under load, the context window management was a mess, and every time I wanted to add something new I was fighting the architecture instead of building. And to be honest, nearly every update broke something fundamental.

So I rebuilt it from scratch. The result is FinkBot, named after my old BBS and Slashdot username, Finkployd. Here’s how it’s built and what it actually does.

The Stack

Three LXCs on a Proxmox host (Bosgame ADB20, Ryzen 5 3550H, 16GB RAM):

n8n — workflow orchestration. All the actual logic lives here.
Chroma + a query proxy — vector database for semantic memory. I wrote a small Flask proxy in front of it to handle embedding and expose a sane REST API.
A Discord bot — the interface. All interaction happens in a private Discord server.
A Mac Mini running Ollama — local AI models. No data leaves my network.

The interface choice was deliberate. I spend most of my day in Discord or Teams anyway. A dedicated server gives me structured channels for different types of output (#briefings, #alerts, #projects, #memory-log), persistent history, mobile notifications, and emoji reactions I can use to trigger actions. It’s a better interface than a chat window for something that pushes information proactively.

Privacy First

This was a hard requirement from the start. My emails, calendar events, and private Teams conversations aren’t going to third-party APIs. All LLM inference runs locally on the Mac Mini via Ollama. The only external calls are to Microsoft Graph (to fetch email, teams, and calendar data) and my personal email server; both necessary to retrieve data, not to process it.

Everything stays on my network.

Memory

FinkBot’s memory comes from two sources: a static corpus loaded at startup, and a growing corpus built from ongoing ingestion.

The static corpus is ~1,638 markdown files extracted from roughly 31,000 personal emails and 846 Pidgin/Adium chat logs spanning 2006-2025. These were processed by OpenClaw’s memory extraction pipeline before I shut it down, so FinkBot started with a reasonably complete picture of my life, work history, relationships, and recurring topics.

The growing corpus gets new chunks from every email that comes in (across two accounts), every Teams chat message, and every calendar event. Everything goes through Chroma with a content hash as the ID, so ingestion is idempotent, I can re-run it without creating duplicates.

Semantic search is straightforward: when you ask FinkBot something, it queries Chroma for the top N relevant chunks, injects them into the prompt as context, and the model reasons over them. No RAG pipeline complexity, no re-ranking. It works well enough for my needs.

What It Actually Does

Morning and evening briefings: At 8am and 5pm ET, FinkBot posts a briefing to #briefings. The morning brief covers today’s calendar (M365 + Nextcloud), any open action items, and things to be aware of. The evening brief reviews what happened today and previews tomorrow. Both are generated locally with strict anti-hallucination instructions. If there’s no data for a section, it says “nothing on record” rather than making something up.

Meeting prep: A workflow runs every minute checking whether any M365 calendar event starts in 30 minutes. If it finds one that hasn’t been prepped yet, it queries Chroma for relevant context about the meeting topic and attendees, generates a prep brief, and posts it to #projects. The brief covers meeting overview, key attendees, relevant history from memory, talking points, and action items to have completed. Context quality improves as more email and chat data accumulates.

Email monitoring: Both my personal (mystikos.org, IMAP via MXRoute) and work (instrumentalid.com, M365 Graph API) inboxes are monitored. Each email is analyzed locally for action items and urgency. Genuinely actionable emails get posted to #alerts. Automated emails, newsletters, and marketing get auto-blocklisted at the sender address level so they stop consuming inference cycles.

Teams chat ingestion: This was the interesting one. Reading your own Teams DMs via Graph API requires delegated auth. Application permissions can’t access cross-tenant conversations, which is most of mine. So there’s an MSAL device flow to get a refresh token, which gets stored and auto-renewed on every hourly run. The Chat Ingestor fetches all chats, processes messages through a Split In Batches node (one at a time to make de-duplication work correctly), and ingests new messages to Chroma.

Manual memory: /remember [text] stores an explicit memory chunk. Useful for things like “remember I agreed to do X for Y by Z date.”

Blocklist: reacting 🚫 to any alert in #alerts blocks the sender’s domain. 📵 blocks the specific address. The blocklist gates both ingestion and analysis, so blocked senders disappear completely.

LLM Routing

Everything runs locally via Ollama on the Mac Mini. The model split is based on task complexity:

Qwen2.5 32B handles the heavy lifting: briefings, meeting prep, email analysis. Long context, strong reasoning, handles complex synthesis tasks well.
Qwen2.5 7B handles interactive chat where latency matters more than depth. Fast enough to feel responsive for back-and-forth questions.

No tokens leave my network. No API costs. No third party seeing my calendar or reading my email.

Things That Were Annoying

n8n caches OAuth2 tokens encrypted in SQLite. Restarting n8n doesn’t clear them. If you update a client secret, the old cached token is used until it expires (~1 hour). The fix is to wait, or delete and recreate the credential.

Graph API datetimes are local time, not UTC. The calendarView endpoint returns datetimes like 2026-04-03T14:00:00.0000000 with a separate timeZone: "America/New_York" field. That 14:00 is 2pm ET, not 2pm UTC. Appending ‘Z’ to treat it as UTC gives you the wrong time by 4-5 hours. Parse it using the timeZone field directly.

n8n IF nodes eat your data. When data passes through an IF node, the output only contains what the preceding node returned. Everything from earlier nodes is gone unless you explicitly pass it through with a merge Code node. This bit me repeatedly.

Split In Batches is required for per-item HTTP chains. If you have a loop where each item needs to go through an HTTP Request → IF → HTTP Request chain, n8n’s default item processing only works for the first item. Wrap it in Split In Batches with batch size 1 and connect all branches (including the skip branch) back to the loop input.

Teams chat de-duplication. The processed message IDs are epoch millisecond timestamps, which are also valid Teams message IDs. This looked like a bug for a while. It’s not.

What’s Next

Vikunja task integration: already half-built, wiring up the Discord slash commands now
Full archive ingestion: the bulk of my email history hasn’t been processed yet
TLS + reverse proxy on Tailscale

The code isn’t public yet (too many credentials baked into the workflow exports) but the architecture is straightforward enough that this post covers the interesting parts. If you’re building something similar and have questions, KB3LYB on the air or [email protected].

Mark Earnest

[CAT] LLM

Building a Personal AI Assistant (FinkBot)