← Back to Projects
Live AI Agent Infrastructure
Desktop View

Live AI Agent Infrastructure

A production multi agent AI system with semantic long term memory, autonomous task monitoring, social automation, voice input, and proactive scheduling. Controlled through Telegram and running continuously on personal hardware.

AIMulti-AgentDockerLinuxClaudeWhisperPythonpgvectorAutomationInfrastructure

A production AI system I designed, built, and operate on personal hardware. Not a demo project. It runs every day, handles real work across code, marketing, research, and scheduling, and accepts voice commands through Telegram as the sole interface. Saying something is enough to start work.

Technical Explanation

The system runs inside a Docker container on Debian 12 with an AMD Ryzen 5 3600X and 23GB of RAM. Persistent volume mounts at /home/node/.openclaw/ preserve all credentials, memory, binaries, and state across container rebuilds. An auto restart policy keeps the container online without manual intervention. Telegram serves as the primary interface for both text and voice input.

Problem

Most AI tools are stateless. Every new session starts from zero, specialist agents require constant manual handoffs, and any task running longer than a few minutes needs supervision. OAuth tokens expire silently, parallel workstreams are impossible in a single agent loop, and there is no mechanism to detect a failed process without manually checking.

Technical Explanation

Stateless sessions force full context reconstruction on every call, wasting tokens and breaking continuity on multiday tasks. Single agent architectures serialize work that could run in parallel, blocking throughput. Without persistent semantic memory, each spawned agent must receive the full project brief at start time, pushing prompts past practical token limits. Without an autonomous monitoring layer, stalled or failed processes accumulate undetected until a human notices.

Approach

  • Built a local voice pipeline: Telegram delivers incoming audio as OGG files; ffmpeg converts each file to a 16kHz mono WAV; whisper.cpp runs inference on device using either the tiny or base model depending on accuracy requirements. The entire pipeline runs locally with zero external API calls and zero per-message transcription cost, completing in under 5 seconds per message.
  • Designed a 4-layer memory hierarchy: a curated MEMORY.md (hot context, always injected, 150 line cap) feeds every agent session; memory/INDEX.md holds a 3 lines per day scan index for identifying relevant daily logs without loading them; daily markdown log files store full session detail and are never auto-injected; a pgvector database called Brain DB (port 5433) holds semantic long term memory using OpenAI text-embedding-3-large embeddings at 3072 dimensions.
  • Implemented ASAA (Adaptive Semantic Agent Architecture): every memory query scores results by semantic similarity (60%), importance (30%), and recency (10%). Importance is assigned heuristically at insert time using a fixed scale: blocker=9, decision=8, insight=7, person=6, fact=5, task=4, with keyword boosts for anything marked critical or credential related. A consolidation pipeline merges near duplicate entries at a 0.92 cosine threshold to keep the database clean over time.
  • Built per query memory injection: context_refresh.py rewrites a dedicated section of MEMORY.md on every heartbeat cycle with the top Brain DB hits for currently active task keywords. The Memory Preamble Protocol is baked into AGENT_TEMPLATE.md so every sub agent issues a Brain DB query before starting work, pulling focused long term context without loading the full database into the spawn prompt.
  • Designed a hub and spoke agent coordination model: sub agents append results to a shared BLACKBOARD.md (append only shared state); confirmed outcomes are written to DECISIONS.md by the main agent after explicit confirmation. Every agent is instantiated from a universal agent class with its role defined at spawn time. Spawn prompts follow an outcome first structure covering goal, constraints, suggested approach, and deliverable. A specialist persona library covering build, design, marketing, security, career, and copywriting roles is referenced by path in spawn prompts.
  • Built an agent watcher system: agent_watcher.py registers active agents and tracks their start times; agent_watcher_daemon.py runs a 3 minute polling loop, compares agent timestamps against BLACKBOARD.md update times, and writes a file based stuck alert marker if no progress is detected within 6 minutes. The HEARTBEAT system monitors the daemon process itself and keeps it alive.
  • Implemented task routing and working memory separation: task_router.py performs a semantic search against persona descriptions to match an incoming task description to the correct specialist identity and spawn configuration. agent_memory_slice.py generates lean task specific context snippets by combining top Brain DB hits with keyword filtered MEMORY.md sections, keeping spawn prompts under 300 tokens at all times.
  • Built a structured inter-agent protocol: every agent completion message follows a JSON schema with six fields: summary, files_changed, what_works, what_doesnt, needs_alexander, and next. agent_message.py logs and tails all inter-agent messages to a persistent audit log at logs/agent_messages.log.
  • Integrated GitHub with two accounts using fine grained PATs scoped per repository. One account holds read access across all repositories; a second agent account holds collaborator write access for approved repositories where autonomous commits are permitted. All pushes are auditable through PAT scope records.
  • Built a social automation layer: Twitter/X engagement runs through Tweepy API v2 for all write operations including tweet, reply, and follow; YouTube engagement uses YouTube Data API v3 with OAuth2 for comments and subscriptions, scheduled on a 30 minute cycle; Reddit sessions run through Playwright with cookie based authentication. Each platform maintains a persona library, per platform velocity caps, and a deduplication log that persists across session boundaries to prevent repeat engagement.
  • Built an auto reauth system: reauth.py monitors OAuth health; when a token failure occurs, a file based marker is written at a known path. The HEARTBEAT system detects the marker, sends a Telegram alert containing the exact authorization URL and the one tap exchange command, then deletes the marker so the alert fires exactly once per incident.
  • Configured a proactive heartbeat: on a fixed schedule the system checks OAuth token expiry for Gmail and YouTube, verifies stuck agent markers, confirms daemon health, surfaces relevant emails, and reviews project status. Silence from the heartbeat means every check passed.
  • Published the StreakUp habit tracker Flutter app on Google Play as the primary product the system supports. The app integrates RevenueCat for subscription monetization and AdMob for ad delivery, includes 701 automated tests covering unit, widget, and integration scenarios, and is supported by a 12-member international beta team providing real-world feedback before each release.

Technical Explanation

context_refresh.py runs on each heartbeat cycle, extracts the top 3 active task keywords from the last 10 BLACKBOARD entries, issues a pgvector similarity query, deduplicates results by memory ID, and overwrites the ## Current Context section of MEMORY.md in place. agent_memory_slice.py produces a combined context slice at spawn time: it issues a Brain DB search for the task description, applies keyword filtering over MEMORY.md sections, and formats the output under a Relevant Context heading. The social layer splits concerns by transport: Tweepy handles all Twitter writes (tweet, reply, follow) while Playwright handles search and read operations only, avoiding synthetic event detection by Twitter’s React rendered UI. The OAuth reauth marker protocol is stateless and idempotent: the scheduler writes a marker on token failure; the heartbeat reads it, dispatches the Telegram alert, and deletes it immediately, preventing duplicate notifications.

System in Action

1) Voice message becomes an active task

A voice message arrives in Telegram as an OGG file. ffmpeg converts the audio to a 16kHz mono WAV file in a single command. whisper.cpp runs local inference and returns a text transcript in under 5 seconds. That transcript is passed to the agent as a text instruction with no cloud dependency and no per-message cost.

2) Memory context assembled before work begins

Before any task starts, the system assembles a layered context bundle for the agent. The MEMORY.md hot context is always present. The INDEX.md scan layer identifies whether any daily log is directly relevant to the current task. agent_memory_slice.py issues a semantic query to the Brain DB and combines the top weighted hits with filtered MEMORY.md sections into a focused preamble under 300 tokens. The agent receives a precise, task scoped context window instead of a raw file dump.

3) Task routed to the right specialist

task_router.py receives the task description and performs a semantic search against the persona library descriptions. The closest match returns a specialist identity and a prebuilt spawn configuration including persona file path, relevant memory slice, and structured reporting format. The main agent reviews the routing decision before spawning, maintaining human oversight at the dispatch point.

4) Parallel specialists execute simultaneously

A single instruction can trigger multiple specialists at once: a build agent implements Flutter features, a marketing agent drafts copy and schedules social posts, and a career agent updates portfolio content, all running in parallel. Each agent appends a structured JSON completion message to BLACKBOARD.md when done. The main agent reads every BLACKBOARD diff before writing anything to DECISIONS.md, ensuring no outcome is committed without review.

5) Watcher detects a stall and alerts

The agent_watcher_daemon.py polling loop runs every 3 minutes. If a registered agent has not updated BLACKBOARD.md within 6 minutes of its start time, the daemon writes a stuck alert to a fixed file path on the container. The next heartbeat cycle reads the alert file, sends a Telegram notification with the agent label and last known state, then deletes the file. The alert fires exactly once per incident.

6) Social automation runs on schedule

The Twitter persona posts content and replies via Tweepy API v2. Each reply mirrors the original tweet’s phrasing before introducing anything new, following the mirror phrase approach built into the persona copy rules. The YouTube scheduler runs every 30 minutes, selects videos by view count threshold, and submits comments through the YouTube Data API. The Reddit Playwright session builds community presence across relevant subreddits. All three channels check deduplication logs before any write to prevent repeat engagement.

7) OAuth failure surfaces and resolves in one step

When a scheduled YouTube or Gmail operation encounters an expired token, reauth.py writes a marker to a fixed path. On the next heartbeat, the system reads the marker, formats a recovery message containing the full authorization URL and the exact exchange command, sends it via Telegram, and removes the marker. The operator runs one command and the system resumes without a debugging session.

Results

  • Continuous operation across container restarts, session compactions, and day boundaries with full context preserved at every restart.
  • Voice transcription completed on device in under 5 seconds per message with zero transcription API cost across all usage.
  • Semantic memory retrieval returns focused, weighted results from the Brain DB rather than injecting complete context files into every prompt.
  • Parallel specialist agents reduce total task time by executing implementation, research, content, and review work concurrently rather than serially.
  • Stalled agents detected and surfaced via Telegram within 6 minutes through the autonomous watcher daemon with no manual checking required.
  • OAuth token failures resolved through a single Telegram message and one tap recovery command with no manual debugging session required.
  • Social automation operating across Twitter, YouTube, and Reddit with persona consistent copy, velocity controls, and cross session deduplication.
  • All GitHub commits and pushes executed within auditable fine grained PAT scope, traceable per repository and per account.
  • StreakUp published on Google Play with active RevenueCat monetization, AdMob integration, 701 automated tests passing, and a 12-member international beta team providing real-world coverage before each release.

Tech Stack

  • Hardware: Debian 12 on AMD Ryzen 5 3600X with 23GB RAM, self hosted on personal hardware
  • Container: Docker with persistent volume mounts and auto restart policy
  • AI Reasoning: Claude Sonnet via Anthropic API
  • Voice Processing: whisper.cpp (local inference, tiny and base models) and ffmpeg (OGG to 16kHz WAV conversion)
  • Memory Architecture: 4-layer hierarchy including MEMORY.md (hot context), INDEX.md (scan layer), daily markdown logs, and pgvector Brain DB (port 5433)
  • Embeddings: OpenAI text-embedding-3-large at 3072 dimensions
  • Semantic Retrieval: Weighted scoring at 60% cosine similarity plus 30% importance plus 10% recency decay, heuristic importance scale (blocker=9 through task=4), consolidation pipeline at 0.92 cosine threshold
  • Agent Coordination: BLACKBOARD.md (shared append only state), DECISIONS.md (confirmed outcomes, main agent only), JSON structured reporting protocol, agent_message.py audit log
  • Monitoring: agent_watcher_daemon.py (3 minute polling loop, file based stuck alerts) and heartbeat scheduler (token expiry, daemon keep alive, email, calendar checks)
  • Task Routing: task_router.py (semantic persona match) and agent_memory_slice.py (lean context slices under 300 tokens)
  • Social Automation: Tweepy API v2 (Twitter writes), YouTube Data API v3 with OAuth2 (comments and subscriptions), Playwright with cookie sessions (Reddit and Twitter search)
  • GitHub Integration: Two accounts with fine grained PATs scoped per repository, read only and collaborator write modes
  • Interface: Telegram bot integration
  • Languages: Python, Bash, Dart (Flutter)

Technical Explanation

The hub and spoke coordination model enforces a single decision authority: sub agents can only append to BLACKBOARD.md; the main agent is the sole writer to DECISIONS.md and only writes after reviewing the full BLACKBOARD diff. This prevents agents from committing outcomes without oversight. The 4-layer memory design separates retrieval cost from freshness: MEMORY.md holds the minimum viable context injected at nearly zero token cost on every session; INDEX.md adds a scan layer that identifies relevant daily logs without loading their full content; daily logs provide full session detail on demand; the Brain DB handles semantic retrieval across any time window. The social transport split between Tweepy for writes and Playwright for reads avoids platform detection issues that arise from automating write operations through a browser session in Twitter’s React UI. task_router.py and agent_memory_slice.py work as a pair at spawn time: the router selects the persona; the memory slice tool generates the focused context for that persona’s task, combining Brain DB hits with filtered MEMORY.md sections to stay under the 300 token spawn prompt budget.