Skip to content

Quick Start

This guide walks you through running RTSM and making your first semantic query.


1. Start RTSM

RTSM expects an RGB-D stream with poses via ZeroMQ. Start the main service:

python -m rtsm.run

This launches:

Service Address
REST API http://localhost:8000
WebSocket (visualization) ws://localhost:8081

2. Verify It's Running

curl http://localhost:8000/stats/detailed

You should see system stats including frame count, object count, and memory usage.


3. List Detected Objects

Once frames are streaming, objects will appear in memory:

curl http://localhost:8000/objects

Response:

[
  {
    "id": "a3f2c1",
    "label": "backpack",
    "xyz": [1.2, 0.4, 2.1],
    "confidence": 0.87
  },
  ...
]

Ask natural language queries:

curl "http://localhost:8000/search/semantic?query=red%20mug&top_k=5"

Response:

{
  "query": "red mug",
  "results": [
    {
      "id": "b7d4e2",
      "label": "mug",
      "xyz": [0.8, 0.2, 1.5],
      "score": 0.82
    }
  ]
}

5. View in 3D (Optional)

Open the visualization demo in your browser:

http://localhost:8081

This shows a Three.js point cloud with detected objects overlaid.


Next Steps