Record & Replay¶
RTSM can record live WebSocket sessions to disk and replay them through the full pipeline at the original rate. This enables reproducible benchmarking and offline testing without camera hardware.
Recording a Session¶
Record while running the pipeline¶
Stream frames from a camera while running the full perception pipeline:
This writes raw binary frames and metadata to the specified directory while processing them normally.
Record without GPU (raw capture)¶
Capture raw frames without running segmentation or CLIP — useful for quick capture on a laptop without a GPU:
Tip
--record-only mode requires no GPU dependencies. Install with just pip install rtsm (core only).
What gets recorded¶
Each recording session creates a directory with:
| File | Contents |
|---|---|
messages.bin |
Raw binary WebSocket frames (RGB + depth + pose) |
index.jsonl |
Per-frame metadata (timestamps, offsets, sizes) |
meta.json |
Session metadata (config snapshot, device info) |
Replaying a Session¶
Feed a recorded session through the full pipeline:
This launches the complete RTSM stack (segmentation, CLIP, association, memory, API, visualization) and replays frames at the original recording rate.
Included test dataset¶
RTSM ships with a test recording for quick verification:
# 162-frame bedroom scan (~458 seconds, iPhone ARKit via Calabi Lens)
python -m rtsm --replay recordings/session1
The recording is stored with git-lfs. After cloning, run git lfs pull if the binary files aren't downloaded.
Use Cases¶
- Benchmarking — Compare segmentation backends on the same input frames
- Debugging — Reproduce pipeline issues without needing the camera
- CI/CD — Run automated pipeline tests with deterministic input
- A/B testing — Use
scripts/debug_segmentation.pyto generate side-by-side comparisons
A/B segmentation comparison¶
This generates an HTML viewer in debug/session1/compare.html showing FastSAM vs YOLOE overlays per frame.
Next Steps¶
- Quick Start — Run your first session
- Benchmarks — Full performance comparison
- Configuration — Tune backends and thresholds