I work across too many contexts at once. Three email accounts. Two calendars. Work Teams. Personal Slack. A Gitea instance for side projects. Home Assistant watching over a thousand devices. Notes in Joplin. Tasks scattered across wherever I dumped them last week.
For years I tried to manage this with dashboards, integrations, and sheer willpower. None of it stuck. What I actually wanted was something that already knew what was going on and could just tell me, without me having to ask in exactly the right way or remember which system held which piece of information.
So I built FinkBot.
What It Does
FinkBot is a personal AI assistant that runs entirely on my home network. It continuously ingests data from every corner of my digital life: email, calendar, chat, code repos, smart home sensors, task lists, notes. It indexes everything into a searchable memory store and uses that context to:
- Send a morning and evening briefing every day
- Prep me for meetings 30 minutes before they start, pulling together context on who I’m meeting with and what we’ve been working on
- Alert me to things that matter: security sensor trips, appliances left on, unusual comms gaps, infrastructure pressure
- Answer ad-hoc questions in Discord (“what did I send Dave last week?”, “who is attending this meeting?”)
- Suggest home automation actions and let me approve them with a single reaction
- Announce context-aware briefs to my Echo Show when I walk into a room
- Run a weekly self-reflection that evaluates its own output quality and proposes improvements
All without sending a single byte of personal data outside my LAN.
The Stack
Data Sources
FinkBot polls and ingests from:
- Email: three accounts, IMAP for personal and university, Microsoft Graph API for work, Dovecot archive for historical backfill
- Calendar: Microsoft 365 plus Nextcloud CalDAV for personal events
- Chat: Microsoft Teams DMs and Slack channels
- Code: Gitea commits, issues, and PRs across all my repos
- Notes: Joplin (everything I have ever written down)
- Tasks: Nextcloud Tasks via CalDAV
- Smart home: Home Assistant with over 1,000 entities covering presence, appliances, energy, sensors, climate, plus dedicated monitors for TrueNAS pool health and Proxmox container pressure
Each source has its own Prefect flow with a schedule tuned to how often that source changes. Email checks every 5 to 10 minutes. Smart home every 5. Git repos every 2 hours. Joplin and Slack every 30 minutes.
Two-Tier Memory
FinkBot uses a two-tier memory architecture, and the distinction matters.
ChromaDB (chroma.crosscreek) is the raw recall layer. Over 257,000 documents, everything ingested from every source, embedded and stored. When FinkBot needs to answer “what did I send Dave last week?”, this is where it looks. Semantic search, no structure, just relevance.
MemU (memu.crosscreek) is the distilled long-term layer. It accumulates synthesized facts over time: things I’ve manually added with /context add, outputs approved from the weekly self-reflection, curated patterns. It is not a replacement for Chroma. It is the part of memory that has been thought about. Raw search and distilled knowledge serve different purposes and live in different stores.
A middleware layer in the memory client handles all query logging transparently. Individual flows don’t think about it. Queries from Discord get tagged separately from background automation queries so the self-reflection loop can distinguish what I’m actually asking about from what the system is doing on its own.
The Entity Graph
Layered on top of the vector stores is a structured knowledge graph backed by Neo4j Community Edition, running at neo4j.crosscreek. It tracks four node types: Person, Company, Project, and Topic. Relationships include EMAILED, WORKS_AT, INVOLVED_IN, COLLABORATES_WITH, and others.
The move to Neo4j from an earlier embedded graph database was driven by one practical problem: the embedded approach had a single-writer bottleneck. Multiple flows running concurrently would contend for the write lock. Neo4j’s MVCC gives concurrent readers and writers without coordination overhead, and the Bolt protocol means any flow or API endpoint can connect remotely without file locking concerns.
The graph gets populated automatically from email processing, Joplin notes, Slack analysis docs, calendar attendee extraction, and a deterministic upsert on every Gitea repo. A dedicated backfill flow processes the historical Dovecot email archive on an hourly schedule, steadily growing the graph from years of past correspondence. A separate curation flow runs weekly to merge duplicate nodes and filter noise.
The entity graph answers questions the vector store cannot. “Who am I meeting with today, and what do I know about them?” The daily briefing pulls today’s calendar attendees, looks each one up in the graph, and injects a “People you’ll meet today” section into the prompt before it ever touches Chroma. Structured facts first, semantic context second.
The Flows Layer
All orchestration runs on Prefect, deployed to a 4GB/4-core LXC (prefect.crosscreek). Prefect replaced an earlier n8n-based setup. The GUI-drag-drop approach was fine until I needed version control, testability, and the ability to do something non-trivial in a node. Python and git won.
Active flows currently running:
| Flow | Schedule | Purpose |
|---|---|---|
ha_monitor | every 5 min | Security, appliances, presence, TrueNAS, Proxmox |
meeting_prep | every 5 min | 30-min lookahead prep briefs |
calendar_actions | every 5 min | Rule engine over upcoming meetings x HA state (pauses media, etc.) |
ha_suggestions | every 5 min | Rule-based HA suggestions with one-tap Discord approval |
proactive_voice | every 60 sec | Presence transitions trigger TTS brief on Echo Show |
pattern_briefings | every 5 min | Fires focused briefs 15 min before scheduled pattern moments |
email_monitor_* | 5 to 15 min | Three accounts, IMAP + Graph API |
calendar_sync | every 30 min | M365 + Nextcloud to memory |
slack_ingestor | every 30 min | Slack channels to memory |
joplin_ingestor | every 30 min | Notes to memory |
chat_ingestor | hourly | Teams DMs to memory |
entity_backfill | hourly | Historical Dovecot emails to Neo4j entity graph |
daily_briefing | 8am + 5pm ET | Morning and evening briefings |
graph_curation | weekly | Neo4j merge suggestions, noise filtering |
memu_curation | weekly | MemU near-duplicate cleanup |
self_reflection | Sunday 8pm ET | Weekly synthesis, proposals, draft PRs |
memory_defrag | Saturday 7pm ET | Expire stale entries, corpus stats |
watchdog | continuous | Auto-cancel stuck runs, hard-kill past threshold |
A few things I have learned running these in production:
In-process concurrency guards beat deployment-level limits. High-frequency flows call check_self_concurrency() at startup and exit cleanly if another instance is already running. Relying on Prefect’s deployment-level concurrency limits alone left edge cases where crashed runs didn’t release their slots. Explicit guards are more reliable.
Startup reconciliation matters. After a crash or restart, the server can have zombie “running” states for flows that are no longer actually running. A startup reconciliation pass cleans these before new runs start, preventing phantom concurrency blocks.
CalDAV clients need timeouts. My NAS can be slow. A DAVClient without timeout=10 will hang indefinitely. Flows that run every 60 seconds cannot afford that.
Don’t alert on things you can’t fix. Meeting prep runs every 5 minutes. CalDAV errors go to print(), not the Discord error channel. Nobody needs a ping every 5 minutes because the CalDAV server hiccupped.
The API Bridge
A FastAPI service on port 8003 acts as the hub connecting flows, the Discord bot, and the kiosk. The bot never touches memory or the entity graph directly. It POSTs to the API and gets a response. Logic stays centralized.
Key endpoint groups: briefing and prep triggers, Home Assistant action execution (with Discord approval gating), entity graph CRUD, MemU memory management, pattern automation management, Prefect watchdog controls, kiosk announcement queue, and a transcription and TTS pipeline for the Echo Show.
There is also an HTTPS endpoint on port 8443 serving the kiosk dashboard. It required HTTPS because getUserMedia only works in a secure context.
The Echo Show Kiosk
The dashboard on my Echo Show (running vanilla android) has become more than a status screen. It shows live calendar, unread message counts, current tasks, and a Home Assistant home status panel. More interestingly, it now listens for a wake word via openwakeword, streaming audio from the browser to a Python WebSocket server. When the wake word fires, it triggers a context-aware TTS brief. The proactive voice flow also pushes announcements to the kiosk when presence transitions happen, so walking into the room can trigger a summary of what’s coming up.
The Discord Bot
Discord is my primary interface. Current slash commands:
/brief— trigger a morning or evening briefing on demand/prep— meeting prep for a specific event/searchand/search-email— semantic memory and Dovecot archive search/remember— store a memory manually/who— entity graph person lookup/context— manage persistent knowledge file without SSH/taskand/quicktask— Nextcloud Tasks management/cal— natural language calendar query/memory— corpus management (stats, forget, curate)/reflect— trigger self-reflection immediately/status— system health check/ark— ARK server management (yes, the game server lives here too)
Reaction handlers let me take action on messages without typing. Thumbs up and thumbs down on briefings feed the engagement feedback loop. Checkmark or X on HA suggestion messages triggers or dismisses the action. A book reaction on meeting prep saves notes to Joplin. A no-entry reaction blocks an email sender. These reactions are the primary interface for approving anything the system proposes.
Design Decisions
Local-First, Always
No personal data leaves 192.168.48.0/24. That is the constraint the architecture is built around. Email, calendar, chat, smart home state: none of it touches an external service.
For LLM inference, the system runs a tiered approach. A 16GB M4 Mac Mini runs Ollama with qwen2.5:14b for heavy inference (briefings, meeting prep, self-reflection) and qwen2.5:7b for lighter work. Anthropic’s API is configured as a fallback and is also used for high-value one-off tasks like code change proposals, where output quality matters more than token cost. But the default path for everything is local.
This split was a deliberate choice. High-volume extraction work like the entity backfill runs local exclusively. Running the full Dovecot email archive through the Anthropic API would cost real money and send personal email content to an external service. Neither is acceptable. With local Ollama it costs nothing and stays on the network.
The fallback to Anthropic exists for resilience, not as a cost optimization. If the Mac Mini is down, the system degrades gracefully rather than going silent.
Why Qwen2.5? Instruction following is strong, JSON output mode is reliable (critical for entity extraction), and the quantized models fit the hardware. The 14B at Q4_K_M runs comfortably within 16GB unified memory while leaving headroom for everything else.
The Entity Graph Upgrade
The move from an embedded graph database to Neo4j is the biggest architectural change in the last few months. The embedded approach worked fine for read-heavy lookups but fell apart when multiple flows needed to write simultaneously. Kuzu’s single-writer model meant contention, and flows running on tight schedules can’t afford to queue behind each other for graph writes.
Neo4j’s MVCC handles concurrent writers cleanly. The Bolt protocol means any process on the network can connect without worrying about file locking. And having a proper query interface makes ad-hoc exploration and curation much easier. The graph curation flow runs weekly, suggesting node merges and filtering noise via Cypher queries that would have been awkward to express in the embedded model.
Two-Tier Memory Is Not a Migration
An earlier version of the architecture treated MemU as a replacement for ChromaDB, something to migrate to once the hardware was ready. The current design treats them as doing different things.
Chroma is raw storage. Everything ingested lands there. Semantic search across 257,000 documents is fast and works well. The limitation is that it treats every document as equally relevant. There is no way to ask “what do I actually know about this person?” and get a distilled answer rather than a pile of email snippets.
MemU is for things that have been synthesized. Approved self-reflection outputs. Manually added context notes. Curated patterns. It is smaller, more intentional, and represents knowledge that has been validated rather than just observed. Briefings and prep queries can pull from both layers and get different things from each.
The Self-Learning Loop
Every Sunday at 8pm, self_reflection runs. It reads two weeks of feedback: briefing reaction rates, Discord query patterns, entity graph growth, MemU accumulation. It queries memory across all sources for a weekly snapshot. It passes everything to an LLM and asks what is working, what is not, and what should change.
The synthesis produces two actionable outputs.
Context proposals are facts or patterns that should become permanent knowledge. Each one appears in Discord as a bookmark message. I react with a checkmark to approve or X to reject. Approved items get appended to /opt/finkbot/finkbot_context.txt, which is injected into every future self-reflection prompt. The system accumulates knowledge from its own outputs over time.
Code change proposals are improvement ideas formatted as PR descriptions. Each one opens a draft PR in the FinkBot Gitea repo and posts the URL to #insights. I review through normal CI/CD, or close it if the idea isn’t worth pursuing.
The pattern automation system extends this further. self_reflection can also write to a patterns file, which pattern_briefings reads to fire focused context briefs on a schedule. If self-reflection notices that Monday mornings are always context-switching heavy, it can propose a pattern that fires a tailored brief every Monday at 7:45am. I approve it once and it runs every week.
Infrastructure
Everything runs on a Proxmox cluster on my home network.
| Service | Host | Notes |
|---|---|---|
| Prefect flows + API | prefect.crosscreek (LXC 203) | 4GB RAM, 4 cores |
| ChromaDB | chroma.crosscreek | raw memory backend |
| MemU | memu.crosscreek (LXC 204) | distilled memory |
| Neo4j | neo4j.crosscreek | entity graph, Bolt protocol |
| Discord bot | prefect.crosscreek | thin bot, talks to API |
| Home Assistant | ha.crosscreek:8123 | 1000+ entities |
| Gitea | git.mystikos.org | source of truth + CI/CD |
| Ollama (14B + 7B) | 16GB M4 Mac Mini | primary inference |
CI/CD runs through Gitea Actions. Pushing to main rsyncs the relevant files to each target host and restarts the appropriate systemd services. The servers are not git clones. They are deploy targets. Code changes go through the repo, not SSH sessions on the server.
What’s Next
Since original publication (May 2026)
A month later, several “What’s Next” items have shipped, and a few new ones have emerged. The headlines:
A/B prompt variants are live. A modifier-delta framework in flows/common/prompt_variants.py picks a deterministic variant per ET calendar day (sha256 of flow:iso-date, modulo variants), so morning and evening briefings on the same day always share a variant. Engagement is logged per-variant via the existing 👍/👎 pipeline; self_reflection surfaces a per-variant comparison block in its weekly LLM prompt only when more than one variant has fired. Wired into daily_briefing, meeting_prep, and pattern_briefings. Promotion is still manual — at two briefings per day, engagement rate is too noisy for auto-promotion.
Time-scoped /ask. Queries that carry both a question phrase (“what was going on”, “recap”, “tell me about”) and a temporal marker (“last week”, “in March”, “Q1”) now route through api/temporal_intent.py: window parsing (deterministic fast path with an LLM fallback), MemU recall + by-ID fetch of pattern/anomaly detection docs from that period, and Anthropic synthesis. The pipeline also cross-references current Nextcloud tasks so historical task mentions get noted as resolved when they’re no longer open. Intent detection requires both signals — either alone produced too many false positives.
Pattern automations end-to-end. pattern_detector proposes ⏰ automations when it spots a recurring behaviour with a clear schedule. Mark taps ✅, the bot writes to /opt/finkbot/patterns.jsonl, and a new pattern_briefings flow polls every 5 minutes and fires a focused pre-briefing 15 minutes before each pattern’s next cron-scheduled time. /patterns list/add/remove Discord commands round it out.
Two new self-improvement loops.
Decision journal: every privileged reaction (task/context/graph-merge approvals, HA actions, watchdog kills, blocklist edits) writes a row to feedback_log.jsonl. self_reflection mines this weekly to propose suppression rules like “Mark rejected 4 task proposals from Client X this week — suppress them.”
Thumbs-down post-mortem: a 👎 on a briefing fires a fire-and-forget task that compares the disliked briefing to recent 👍’d baselines, asks the LLM for a single-sentence suppression rule, and routes it through the existing context-proposal approval flow. Approved rules append to finkbot_context.txt and feed every future reflection prompt.
Incident learning. Watchdog auto-cancels and startup-reconcile cleanups now log to feedback_log.jsonl with noise-filter thresholds (5 zombies, 20 backlog) so only anomalous resilience events surface. self_reflection mines them and proposes timeout/threshold tweaks as draft Gitea PRs. The thresholds are load-bearing: a clean restart after every deploy clears 1–3 zombies, and without the floor the weekly report would propose “fix deploy restart” every week and drown the real signal.
Home Assistant action suggestions. Two new flows post 🏠 one-tap action suggestions to #alerts. ha_suggestions (rule engine over HA state, e.g. “everyone’s away and the front door is unlocked — lock it?”) and calendar_actions (rule engine over upcoming meetings × HA state, e.g. “pause the media player — standup starts in 5 min”). Both share a narrow allowlist keyed by domain/service so a misbehaving rule can at worst propose something the allowlist rejects.
Chroma HNSW rebuild (2026-04-25). Long-running where-filter 500s caused by orphaned IDs in the original collection are gone. scripts/rebuild_chroma_hnsw.py copied 260,022 docs into a fresh finkbot_memory_v2 collection — 255,835 with preserved embeddings, 4,200 (slack/gitea orphans) re-embedded via the proxy’s default embedding function. The 500-retry-without-where fallback in the client stays as a canary; if it fires again, something regressed.
Mac Mini hardening. Intermittent 1–2 hour Ollama outages through April were traced to two causes. First: launchctl setenv is per-launchd-session and lost on reboot, so an earlier OLLAMA_NUM_PARALLEL=2 quietly dropped to 1 after the next reboot, serializing every caller behind whichever flow was currently running and starving live flows during backfills. Fixed by baking the env var into a LaunchDaemon plist. Second: macOS’s manual Wi-Fi mode installs only interface-scoped default routes, which Go-based clients (Ollama included) ignore for outbound TCP. A second LaunchDaemon now installs a global default route at boot. A 2026-04-27 outage was a third issue: Ollama.app autostarted from Login Items, won the port-11434 race, and the LaunchDaemon failed to bind 78 times in a row. Removing the Login Item closed it. The new ollama_health_check flow catches any future variant in <10 minutes.
Qwen3 rollback (2026-04-21). Tried upgrading to qwen3:14b/qwen3:8b. Qwen3 ships with thinking-mode on by default, adding 60–180 s of internal monologue per call. Within two hours, three flows had blown their timeouts. Rolled back the same day. Models stay on disk pending an /api/chat-with-think:false switch — or the 48GB Mac Mini, where the thinking budget will be affordable.
Neo4j round-trip optimization. get_person_context() collapsed from six Cypher queries to one with CALL {} subqueries. /who warm latency dropped from ~40ms to ~12ms.
IDBot bootstrapped. A sister project — same architecture, separate repo, separate everything — for Instrumental Identity. Slack instead of Discord, GitLab instead of Gitea, OneDrive instead of Joplin, qwen2.5:72b on the incoming 48GB Mac Mini, namespace "company" instead of "personal". Namespace isolation in Chroma and Neo4j is defence-in-depth; the primary isolation is physical (separate hosts, separate model weights).
What’s still next
- Multi-step tool-use loop in chat: bounded agent loop with a tool manifest (calendar, memory, HA read/write, task create), hard step cap, and an approval reaction before any external-effect tool fires. The hard problem isn’t the loop — it’s the UX. A 30-second synchronous reply is unacceptable; this needs a “working on it…” ack with async completion.
- Raw Chroma in temporal queries: the corpus has no indexed
timestampmetadata, so post-filtering 260k docs by parsing bodies is too slow. Either backfill adatefield or wait for the Chroma proxy to gain a date-range query. - 48GB Mac Mini arriving ~early June: Ollama with
qwen2.5:72b, IDBot’s primary inference host, second Neo4j and Chroma instances for company namespace. - IDBot ↔ FinkBot cross-pollination: deferred until IDBot has ≥1 month of real data. Transport is solved by namespacing; the hard problem is the summary taxonomy — what’s safe for one bot to surface to the other. Design cold and you invite leaks.
Closing Thoughts
The thing that surprised me most building this was how much of the value comes from the plumbing, not the LLM. The dedup tracker. The concurrency guards. The two-tier memory split. The entity graph that knows Dave works at the same company I do and we’ve emailed 47 times. The feedback log that quietly records every reaction and query without any individual flow caring about it.
The LLM is the tip of the iceberg. Everything below it is data engineering and operational discipline.
If I had to do it over I would have started with Prefect instead of n8n. The version control alone was worth the migration cost. I would have put the dedup tracker in on day one. And I would have moved to Neo4j earlier. The embedded graph was fine until it wasn’t, and the migration was more work than switching from the start would have been.
