Open source · 83 commits · MIT license

Recursive Inference.
Engineered.

Stop stuffing everything into a single context window. ai-rlm decomposes long or complex tasks into recursive sub-LLM calls — cutting cost, eliminating hallucination on long inputs, and outperforming naive RAG.

Join the platform waitlist View on GitHub

$npm install ai-rlm

The problem

LLMs break down on long, complex contexts

Every production AI team hits the same ceiling. The naive approach — one big context window — stops working under load.

Context window cost spiral

Passing entire documents or conversation histories into GPT-4o burns tokens fast. At scale, it becomes the dominant cost line.

Hallucination on long inputs

Models lose track of detail in large contexts. The longer the input, the less reliable the output — especially on multi-step reasoning tasks.

RAG falls short at scale

Retrieval-augmented generation clips context before it's useful. Complex, multi-hop queries need more than a nearest-neighbour lookup.

How it works

Recursive Language Model inference

Based on the Zhang, Kraska & Khattab (2025) paper. The RLM strategy treats inference as a recursive program — a model writes code to coordinate other models.

Decompose

The RLM writes JavaScript to break your task into bounded sub-problems. No manual chunking required.

Delegate

Each sub-task runs in an isolated sandboxed LLM call. Parallelism where possible, sequencing where required.

Synthesize

Sub-results are merged recursively until a single coherent answer emerges. The model checks its own work.

example.ts

import { recursiveExecution } from "ai-rlm";

const result = await recursiveExecution({
  task: longDocument,        // any size input
  model: openai("gpt-4o"),
  strategy: "decompose",     // chunk → delegate → merge
});
// result is fully synthesized — no truncation

Platform

Infrastructure for production RLM

The open-source library gets you to proof of concept. The platform gets you to production.

Managed execution

Run RLM pipelines in isolated, scalable sandboxes. No infra to manage, no cold-start debugging.

Cost & iteration analytics

See token spend per task, per model, per run. Identify expensive sub-tasks and tune them down.

Full observability

Trace every recursive call, sub-task, and merge step. Debug inference the way you debug code.

Team management

Shared projects, audit logs, and role-based access. Built for the team, not just the individual.

API-first

REST API and TypeScript SDK. Integrates with your existing stack in minutes.

OSS core — always free

The ai-rlm library is MIT licensed. The platform monetizes hosted execution, not the algorithm.

Early access

Be first on the platform

We're onboarding design partners now. Get early access, shape the roadmap, and lock in launch pricing.

Or start now: npm install ai-rlm

Recursive Inference.Engineered.