Back to blog

Long-running side project

Telegram bot: from local chat utility to agentic system

The first commit in this repository was on July 16, 2015. It started as a collection of practical features for local chats and turned into an agentic production system: async AWS workers, a reply gate, model fallback, tool execution, memory, metrics, live stats and a separate UI.

Why it matters

The interesting part is not that it calls an LLM. The useful part is the system around the model: routing, context construction, tool safety, memory, telemetry, UI feedback and production boundaries. Those are the patterns I keep running into when turning AI features into real product workflows.

  • Takes a messy real-world workflow and separates it into deployable production boundaries.
  • Balances user experience, cost, latency and reliability in an AI-heavy product surface.
  • Turns model behavior into concrete routing, fallback, telemetry and tool-execution decisions.
  • Keeps improving the system through operational feedback instead of treating the prototype as the finish line.

Architecture

The webhook is intentionally small. It classifies incoming Telegram updates and dispatches async Lambda invokes to workers that own the real work, including replies, stats fanout, search and PNG rendering.

Current async worker architecture, including the sharp-renderer lambda for Telegram PNG images.

Project facts

First committed on July 16, 2015
TypeScript monorepo running on Bun workspaces
grammY Telegram bot with AWS Lambda and Serverless Framework
Async workers for commands, agent replies, activity tracking and broadcasts
DynamoDB for chat statistics, chat events and WebSocket connections
Upstash Redis for AI chat history, memory and metrics
Multiple LLM providers behind provider-aware model helpers
Next.js/Vercel companion UI for search, rendered PNG images and live stats

Agent loop

The agent path is built as a product workflow, not as one giant prompt. Each stage has a clear responsibility and observable failure mode.

Reply decision, context, tools and final delivery
Address checks
Reply gate
History + memory
Model routing
Tool execution
Telegram delivery

Available tools include web search, image search, image generation, voice generation, weather, code execution, history lookup, memory updates, randomization, GIF/video search and dynamic command creation.

How it evolved

01

2015: local chat utility

The repository started as a practical set of small features for local Telegram chats: helpers, jokes, commands, currency, weather and search-like utilities.

02

Command platform

The bot accumulated deterministic commands and integrations, then needed a cleaner command registry and shared helpers instead of one growing handler.

03

Production split

The webhook became a thin ingress lambda. Slow work moved into async workers so Telegram requests return quickly and each concern can scale independently.

04

Stats and UI

Chat events moved into DynamoDB. A WebSocket runtime, search endpoint and PNG renderer made statistics, activity charts and currency cards visible in Telegram and the companion UI.

05

Agentic layer

The bot gained a reply gate, model loop, tools, memory, recent history, multimodal context, dynamic commands and fallback behavior for model calls.

Legacy architecture diagram from an earlier version of the bot.

Engineering decisions

Thin webhook ingress

The Telegram-facing lambda only routes and invokes workers. It does not wait for statistics, model calls, image generation or WebSocket fanout.

Reply gate before the agent

Group chats need a default-ignore policy. Deterministic address checks run first, then a structured model decision decides whether the bot should engage.

Tool loop with guardrails

Tools are exposed through a typed registry, rate-limited tools can run sequentially, content tools are deferred until data-gathering tools finish, and every tool call has a timeout.

Provider-aware model fallback

Model calls record success, timeout and error states. The main chat model can fall back to another provider/model when the primary path fails.

Memory and history as product primitives

Recent chat history, media attachments and scoped memory are loaded only after the reply gate confirms that a response is useful.

Metrics over guesses

Model calls and tool calls write time-series metrics with status, latency, model name and fallback source. That makes failure modes visible.

What I would improve next

The system already has tests and runtime metrics. The next useful step is stronger evals: replay known conversations, preserve the failure modes, and make model or prompt changes measurable before they reach users.

  • Build a replay/evaluation harness from real conversations with redacted fixtures.
  • Track answer quality, refusal quality, tool success and latency by chat and feature.
  • Add admin review flows for dynamic tools and memory changes.
  • Promote the strongest patterns into reusable templates for other bots or agent systems.