Recursive Inference.
Engineered.
Stop stuffing everything into a single context window. ai-rlm decomposes long or complex tasks into recursive sub-LLM calls — cutting cost, eliminating hallucination on long inputs, and outperforming naive RAG.
LLMs break down on long, complex contexts
Every production AI team hits the same ceiling. The naive approach — one big context window — stops working under load.
Context window cost spiral
Passing entire documents or conversation histories into GPT-4o burns tokens fast. At scale, it becomes the dominant cost line.
Hallucination on long inputs
Models lose track of detail in large contexts. The longer the input, the less reliable the output — especially on multi-step reasoning tasks.
RAG falls short at scale
Retrieval-augmented generation clips context before it's useful. Complex, multi-hop queries need more than a nearest-neighbour lookup.
Recursive Language Model inference
Based on the Zhang, Kraska & Khattab (2025) paper. The RLM strategy treats inference as a recursive program — a model writes code to coordinate other models.
Decompose
The RLM writes JavaScript to break your task into bounded sub-problems. No manual chunking required.
Delegate
Each sub-task runs in an isolated sandboxed LLM call. Parallelism where possible, sequencing where required.
Synthesize
Sub-results are merged recursively until a single coherent answer emerges. The model checks its own work.
import { recursiveExecution } from "ai-rlm";
const result = await recursiveExecution({
task: longDocument, // any size input
model: openai("gpt-4o"),
strategy: "decompose", // chunk → delegate → merge
});
// result is fully synthesized — no truncationInfrastructure for production RLM
The open-source library gets you to proof of concept. The platform gets you to production.
Managed execution
Run RLM pipelines in isolated, scalable sandboxes. No infra to manage, no cold-start debugging.
Cost & iteration analytics
See token spend per task, per model, per run. Identify expensive sub-tasks and tune them down.
Full observability
Trace every recursive call, sub-task, and merge step. Debug inference the way you debug code.
Team management
Shared projects, audit logs, and role-based access. Built for the team, not just the individual.
API-first
REST API and TypeScript SDK. Integrates with your existing stack in minutes.
OSS core — always free
The ai-rlm library is MIT licensed. The platform monetizes hosted execution, not the algorithm.
Be first on the platform
We're onboarding design partners now. Get early access, shape the roadmap, and lock in launch pricing.
Or start now: npm install ai-rlm