[CAT] Homelab

Building a Personal AI Assistant (FinkBot)

I work across too many contexts at once. Three email accounts. Two calendars. Work Teams. Personal Slack. A Gitea instance for side projects. Home Assistant watching over a thousand devices. Notes in Joplin. Tasks scattered across wherever I dumped them last week.

For years I tried to manage this with dashboards, integrations, and sheer willpower. None of it stuck. What I actually wanted was something that already knew what was going on and could just tell me, without me having to ask in exactly the right way or remember which system held which piece of information.

So I built FinkBot.


What It Does

FinkBot is a personal AI assistant that runs entirely on my home network. It continuously ingests data from every corner of my digital life: email, calendar, chat, code repos, smart home sensors, task lists, notes. It indexes everything into a searchable memory store and uses that context to:

  • Send a morning and evening briefing every day
  • Prep me for meetings 30 minutes before they start, pulling together context on who I’m meeting with and what we’ve been working on
  • Alert me to things that matter: security sensor trips, appliances left on, unusual comms gaps, infrastructure pressure
  • Answer ad-hoc questions in Discord (“what did I send Dave last week?”, “who is attending this meeting?”)
  • Suggest home automation actions and let me approve them with a single reaction
  • Announce context-aware briefs to my Echo Show when I walk into a room
  • Run a weekly self-reflection that evaluates its own output quality and proposes improvements

All without sending a single byte of personal data outside my LAN.


The Stack

Data Sources

FinkBot polls and ingests from:

  • Email: three accounts, IMAP for personal and university, Microsoft Graph API for work, Dovecot archive for historical backfill
  • Calendar: Microsoft 365 plus Nextcloud CalDAV for personal events
  • Chat: Microsoft Teams DMs and Slack channels
  • Code: Gitea commits, issues, and PRs across all my repos
  • Notes: Joplin (everything I have ever written down)
  • Tasks: Nextcloud Tasks via CalDAV
  • Smart home: Home Assistant with over 1,000 entities covering presence, appliances, energy, sensors, climate, plus dedicated monitors for TrueNAS pool health and Proxmox container pressure

Each source has its own Prefect flow with a schedule tuned to how often that source changes. Email checks every 5 to 10 minutes. Smart home every 5. Git repos every 2 hours. Joplin and Slack every 30 minutes.

Two-Tier Memory

FinkBot uses a two-tier memory architecture, and the distinction matters.

ChromaDB (chroma.crosscreek) is the raw recall layer. Over 257,000 documents, everything ingested from every source, embedded and stored. When FinkBot needs to answer “what did I send Dave last week?”, this is where it looks. Semantic search, no structure, just relevance.

MemU (memu.crosscreek) is the distilled long-term layer. It accumulates synthesized facts over time: things I’ve manually added with /context add, outputs approved from the weekly self-reflection, curated patterns. It is not a replacement for Chroma. It is the part of memory that has been thought about. Raw search and distilled knowledge serve different purposes and live in different stores.

A middleware layer in the memory client handles all query logging transparently. Individual flows don’t think about it. Queries from Discord get tagged separately from background automation queries so the self-reflection loop can distinguish what I’m actually asking about from what the system is doing on its own.

The Entity Graph

Layered on top of the vector stores is a structured knowledge graph backed by Neo4j Community Edition, running at neo4j.crosscreek. It tracks four node types: Person, Company, Project, and Topic. Relationships include EMAILEDWORKS_ATINVOLVED_INCOLLABORATES_WITH, and others.

The move to Neo4j from an earlier embedded graph database was driven by one practical problem: the embedded approach had a single-writer bottleneck. Multiple flows running concurrently would contend for the write lock. Neo4j’s MVCC gives concurrent readers and writers without coordination overhead, and the Bolt protocol means any flow or API endpoint can connect remotely without file locking concerns.

The graph gets populated automatically from email processing, Joplin notes, Slack analysis docs, calendar attendee extraction, and a deterministic upsert on every Gitea repo. A dedicated backfill flow processes the historical Dovecot email archive on an hourly schedule, steadily growing the graph from years of past correspondence. A separate curation flow runs weekly to merge duplicate nodes and filter noise.

The entity graph answers questions the vector store cannot. “Who am I meeting with today, and what do I know about them?” The daily briefing pulls today’s calendar attendees, looks each one up in the graph, and injects a “People you’ll meet today” section into the prompt before it ever touches Chroma. Structured facts first, semantic context second.

The Flows Layer

All orchestration runs on Prefect, deployed to a 4GB/4-core LXC (prefect.crosscreek). Prefect replaced an earlier n8n-based setup. The GUI-drag-drop approach was fine until I needed version control, testability, and the ability to do something non-trivial in a node. Python and git won.

Active flows currently running:

FlowSchedulePurpose
ha_monitorevery 5 minSecurity, appliances, presence, TrueNAS, Proxmox
meeting_prepevery 5 min30-min lookahead prep briefs
calendar_actionsevery 5 minRule engine over upcoming meetings x HA state (pauses media, etc.)
ha_suggestionsevery 5 minRule-based HA suggestions with one-tap Discord approval
proactive_voiceevery 60 secPresence transitions trigger TTS brief on Echo Show
pattern_briefingsevery 5 minFires focused briefs 15 min before scheduled pattern moments
email_monitor_*5 to 15 minThree accounts, IMAP + Graph API
calendar_syncevery 30 minM365 + Nextcloud to memory
slack_ingestorevery 30 minSlack channels to memory
joplin_ingestorevery 30 minNotes to memory
chat_ingestorhourlyTeams DMs to memory
entity_backfillhourlyHistorical Dovecot emails to Neo4j entity graph
daily_briefing8am + 5pm ETMorning and evening briefings
graph_curationweeklyNeo4j merge suggestions, noise filtering
memu_curationweeklyMemU near-duplicate cleanup
self_reflectionSunday 8pm ETWeekly synthesis, proposals, draft PRs
memory_defragSaturday 7pm ETExpire stale entries, corpus stats
watchdogcontinuousAuto-cancel stuck runs, hard-kill past threshold

A few things I have learned running these in production:

In-process concurrency guards beat deployment-level limits. High-frequency flows call check_self_concurrency() at startup and exit cleanly if another instance is already running. Relying on Prefect’s deployment-level concurrency limits alone left edge cases where crashed runs didn’t release their slots. Explicit guards are more reliable.

Startup reconciliation matters. After a crash or restart, the server can have zombie “running” states for flows that are no longer actually running. A startup reconciliation pass cleans these before new runs start, preventing phantom concurrency blocks.

CalDAV clients need timeouts. My NAS can be slow. A DAVClient without timeout=10 will hang indefinitely. Flows that run every 60 seconds cannot afford that.

Don’t alert on things you can’t fix. Meeting prep runs every 5 minutes. CalDAV errors go to print(), not the Discord error channel. Nobody needs a ping every 5 minutes because the CalDAV server hiccupped.

The API Bridge

A FastAPI service on port 8003 acts as the hub connecting flows, the Discord bot, and the kiosk. The bot never touches memory or the entity graph directly. It POSTs to the API and gets a response. Logic stays centralized.

Key endpoint groups: briefing and prep triggers, Home Assistant action execution (with Discord approval gating), entity graph CRUD, MemU memory management, pattern automation management, Prefect watchdog controls, kiosk announcement queue, and a transcription and TTS pipeline for the Echo Show.

There is also an HTTPS endpoint on port 8443 serving the kiosk dashboard. It required HTTPS because getUserMedia only works in a secure context.

The Echo Show Kiosk

The dashboard on my Echo Show (running vanilla android) has become more than a status screen. It shows live calendar, unread message counts, current tasks, and a Home Assistant home status panel. More interestingly, it now listens for a wake word via openwakeword, streaming audio from the browser to a Python WebSocket server. When the wake word fires, it triggers a context-aware TTS brief. The proactive voice flow also pushes announcements to the kiosk when presence transitions happen, so walking into the room can trigger a summary of what’s coming up.

The Discord Bot

Discord is my primary interface. Current slash commands:

  • /brief — trigger a morning or evening briefing on demand
  • /prep — meeting prep for a specific event
  • /search and /search-email — semantic memory and Dovecot archive search
  • /remember — store a memory manually
  • /who — entity graph person lookup
  • /context — manage persistent knowledge file without SSH
  • /task and /quicktask — Nextcloud Tasks management
  • /cal — natural language calendar query
  • /memory — corpus management (stats, forget, curate)
  • /reflect — trigger self-reflection immediately
  • /status — system health check
  • /ark — ARK server management (yes, the game server lives here too)

Reaction handlers let me take action on messages without typing. Thumbs up and thumbs down on briefings feed the engagement feedback loop. Checkmark or X on HA suggestion messages triggers or dismisses the action. A book reaction on meeting prep saves notes to Joplin. A no-entry reaction blocks an email sender. These reactions are the primary interface for approving anything the system proposes.


Design Decisions

Local-First, Always

No personal data leaves 192.168.48.0/24. That is the constraint the architecture is built around. Email, calendar, chat, smart home state: none of it touches an external service.

For LLM inference, the system runs a tiered approach. A 16GB M4 Mac Mini runs Ollama with qwen2.5:14b for heavy inference (briefings, meeting prep, self-reflection) and qwen2.5:7b for lighter work. Anthropic’s API is configured as a fallback and is also used for high-value one-off tasks like code change proposals, where output quality matters more than token cost. But the default path for everything is local.

This split was a deliberate choice. High-volume extraction work like the entity backfill runs local exclusively. Running the full Dovecot email archive through the Anthropic API would cost real money and send personal email content to an external service. Neither is acceptable. With local Ollama it costs nothing and stays on the network.

The fallback to Anthropic exists for resilience, not as a cost optimization. If the Mac Mini is down, the system degrades gracefully rather than going silent.

Why Qwen2.5? Instruction following is strong, JSON output mode is reliable (critical for entity extraction), and the quantized models fit the hardware. The 14B at Q4_K_M runs comfortably within 16GB unified memory while leaving headroom for everything else.

The Entity Graph Upgrade

The move from an embedded graph database to Neo4j is the biggest architectural change in the last few months. The embedded approach worked fine for read-heavy lookups but fell apart when multiple flows needed to write simultaneously. Kuzu’s single-writer model meant contention, and flows running on tight schedules can’t afford to queue behind each other for graph writes.

Neo4j’s MVCC handles concurrent writers cleanly. The Bolt protocol means any process on the network can connect without worrying about file locking. And having a proper query interface makes ad-hoc exploration and curation much easier. The graph curation flow runs weekly, suggesting node merges and filtering noise via Cypher queries that would have been awkward to express in the embedded model.

Two-Tier Memory Is Not a Migration

An earlier version of the architecture treated MemU as a replacement for ChromaDB, something to migrate to once the hardware was ready. The current design treats them as doing different things.

Chroma is raw storage. Everything ingested lands there. Semantic search across 257,000 documents is fast and works well. The limitation is that it treats every document as equally relevant. There is no way to ask “what do I actually know about this person?” and get a distilled answer rather than a pile of email snippets.

MemU is for things that have been synthesized. Approved self-reflection outputs. Manually added context notes. Curated patterns. It is smaller, more intentional, and represents knowledge that has been validated rather than just observed. Briefings and prep queries can pull from both layers and get different things from each.

The Self-Learning Loop

Every Sunday at 8pm, self_reflection runs. It reads two weeks of feedback: briefing reaction rates, Discord query patterns, entity graph growth, MemU accumulation. It queries memory across all sources for a weekly snapshot. It passes everything to an LLM and asks what is working, what is not, and what should change.

The synthesis produces two actionable outputs.

Context proposals are facts or patterns that should become permanent knowledge. Each one appears in Discord as a bookmark message. I react with a checkmark to approve or X to reject. Approved items get appended to /opt/finkbot/finkbot_context.txt, which is injected into every future self-reflection prompt. The system accumulates knowledge from its own outputs over time.

Code change proposals are improvement ideas formatted as PR descriptions. Each one opens a draft PR in the FinkBot Gitea repo and posts the URL to #insights. I review through normal CI/CD, or close it if the idea isn’t worth pursuing.

The pattern automation system extends this further. self_reflection can also write to a patterns file, which pattern_briefings reads to fire focused context briefs on a schedule. If self-reflection notices that Monday mornings are always context-switching heavy, it can propose a pattern that fires a tailored brief every Monday at 7:45am. I approve it once and it runs every week.


Infrastructure

Everything runs on a Proxmox cluster on my home network.

ServiceHostNotes
Prefect flows + APIprefect.crosscreek (LXC 203)4GB RAM, 4 cores
ChromaDBchroma.crosscreekraw memory backend
MemUmemu.crosscreek (LXC 204)distilled memory
Neo4jneo4j.crosscreekentity graph, Bolt protocol
Discord botdiscord-bot.crosscreekthin bot, talks to API
Home Assistantha.crosscreek:81231000+ entities
Giteagit.mystikos.orgsource of truth + CI/CD
Ollama (14B + 7B)16GB M4 Mac Miniprimary inference

CI/CD runs through Gitea Actions. Pushing to main rsyncs the relevant files to each target host and restarts the appropriate systemd services. The servers are not git clones. They are deploy targets. Code changes go through the repo, not SSH sessions on the server.


What’s Next

The near-term focus is depth over breadth. The infrastructure is stable, the local inference is running, and the entity graph is growing. What I want to improve now is the quality of what the system produces with that foundation.

A/B prompt testing is the next concrete piece. Every briefing could log which prompt variant produced it, and self-reflection could compare engagement rates across variants over time. Right now self-reflection can observe that thumbs-up rates changed, but not why. Prompt variant tracking would close that loop.

Multi-step task execution is further out but interesting. FinkBot can currently answer questions and make suggestions. The next step is executing sequences: research something, draft a message, propose sending it. The approval reaction pattern already handles one-tap confirmation for HA actions. The same pattern extends to anything that needs a human in the loop before committing.

Context-triggered flows are a natural extension of the pattern automation system. Instead of firing on a schedule, a flow could fire when the entity graph crosses a threshold: a collaborator I haven’t talked to in 60 days, a project with no commits in three weeks. The graph already tracks the data. The plumbing for proposing and approving actions already exists. Wiring them together is mostly a matter of writing the detection logic.


Closing Thoughts

The thing that surprised me most building this was how much of the value comes from the plumbing, not the LLM. The dedup tracker. The concurrency guards. The two-tier memory split. The entity graph that knows Dave works at the same company I do and we’ve emailed 47 times. The feedback log that quietly records every reaction and query without any individual flow caring about it.

The LLM is the tip of the iceberg. Everything below it is data engineering and operational discipline.

If I had to do it over I would have started with Prefect instead of n8n. The version control alone was worth the migration cost. I would have put the dedup tracker in on day one. And I would have moved to Neo4j earlier. The embedded graph was fine until it wasn’t, and the migration was more work than switching from the start would have been.