RTSM — Real-Time Spatio-Semantic Memory¶

Object-centric queryable memory for spatial AI and robotics.

RTSM builds a persistent, searchable memory of objects in 3D space from RGB-D camera streams. Ask natural language queries like "Where is the red mug?" and get answers grounded in real-world coordinates.

Why RTSM¶

Vision models can detect objects. SLAM systems can map geometry. Language models can reason abstractly. But none of them remember where things are.

RTSM is the missing layer between perception and reasoning:

SLAM provides geometry and poses
Vision models provide object masks and semantics
RTSM fuses them into a persistent, queryable world state

This makes spatial state inspectable, queryable, and reusable across robots, agents, and applications — regardless of which segmentation model or SLAM system you use.

Features¶

Model-agnostic — Swappable segmentation backends (CNN or transformer, permissive or AGPL)
Real-time — 210 ms mean pipeline latency (dual backend, RTX 5090)
Persistent memory — Objects tracked across views with stable IDs, promoted from proto to confirmed
Semantic search — Find objects by natural language via CLIP embeddings + FAISS
Spatial search — Find objects near 3D world coordinates or relative to other objects
MCP integration — AI agents (Claude, Cursor, LangGraph) can query spatial memory via Model Context Protocol
Record & replay — Capture live sessions for offline benchmarking and reproducible testing
Runtime analytics — Per-stage latency, segmentation rates, and throughput dashboards
Queryable API — REST endpoints for objects, search, stats, and analytics

// "Where is the red backpack?"
{ "id": "a3f2c1", "xyz": [1.2, 0.4, 2.1], "confidence": 0.87 }

Quick Links¶

- :material-download: **[Installation](getting-started/installation.md)** — Get RTSM running - :material-rocket-launch: **[Quick Start](getting-started/quick-start.md)** — Your first query in 5 minutes - :material-cog: **[Configuration](getting-started/configuration.md)** — Tune for your setup - :material-api: **[REST API](api/rest-api.md)** — API reference - :material-chart-bar: **[Benchmarks](benchmarks.md)** — Performance data - :material-robot: **[MCP (AI Agents)](api/mcp.md)** — Connect AI agents to spatial memory

Performance at a Glance¶

Measured on RTX 5090, iPhone ARKit recording (162 frames, 458s indoor scene). Full benchmarks.

Metric	dual (FastSAM + YOLOE)	grounded_sam2 (GDINO + SAM2)
Mean latency	210 ms	510 ms
P95 latency	509 ms	721 ms
Masks/frame	28.8	13.4
Objects confirmed	60	35
License	AGPL-3.0	Apache-2.0

RTSM's 10-stage pipeline is backend-agnostic — swap between CNN and transformer segmenters with a single config change, same memory layer, same API.

License¶

Apache-2.0 — See GitHub for details.