LeRobot is Hugging Face’s open-source robotics stack for collecting data, training policies, running simulations, and sharing robotics datasets and models on the Hub. LeRobotDataset v3.0 standardizes robot learning data across sensorimotor time series, actions, multi-camera video, and task metadata. Its v3 layout stores high-frequency tabular signals in Parquet, visual streams as MP4 shards, and metadata that reconstructs episode-level views from larger files. Lance pairs well with LeRobot when you need high-performance random access, lazy multimodal blob reads, and a single table interface for curation, search, and training data preparation. TheDocumentation Index
Fetch the complete documentation index at: https://lancedb-bcbb4faf-mintlify-129b42b9.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
lerobot-lancedb package ships Lance-backed LeRobotDataset subclasses, and LanceDB can open Lance-formatted LeRobot datasets on the Hub directly through hf:// URIs.
Install
Use Lance-backed LeRobotDataset loaders
LeRobotLanceDataset is useful when your Lance-backed dataset stores decoded image observations. It’s a drop-in replacement for LeRobotDataset, so existing policy training code keeps working with the usual PyTorch dataset and dataloader patterns.
For datasets that store camera observations as MP4 video segments, use LeRobotLanceVideoDataset instead.
Use the image loader for Lance-backed repos that store image frames. Use the video loader for MP4-backed LeRobot datasets such as
lance-format/lerobot-pusht-lance.Open LeRobot Lance tables with LanceDB
Lance-formatted LeRobot datasets published bylance-format expose each .lance file under data/ as a LanceDB table. The PushT dataset, for example, has frames, episodes, and videos tables.
Opening the tables directly is handy for inspecting schemas, counting rows, sampling metadata, or building curation workflows before any data reaches the training loop.
Filter a frame window
Most robotics workflows want a deterministic slice byepisode_index, frame_index, or task metadata long before training begins. LanceDB filters those rows without touching the video blobs.
With the filtered set in hand, you can materialize a smaller local LanceDB database, add derived columns, attach embeddings, or build vector and scalar indexes for faster repeated access.
Example Lance-formatted LeRobot datasets
LeRobot PushT
A Lance-formatted version of
lerobot/pusht with frame, episode, and video tables.LeRobot X-VLA Soft-Fold
A multi-camera robotics dataset packaged as Lance tables for frame-level and episode-level access.
More resources
LeRobotDataset v3.0
Hugging Face’s guide to the v3 dataset layout, streaming, transforms, and migration.
lerobot-lancedb
API documentation for the Lance-backed LeRobotDataset implementations.
When to use each interface
| Interface | Best for |
|---|---|
LeRobotDataset | Standard LeRobot training loops and policy code |
LeRobotLanceDataset | Drop-in training on Lance-backed image datasets |
LeRobotLanceVideoDataset | Drop-in training on Lance-backed video datasets |
| LanceDB | Interactive inspection, filtering, curation, search, indexing, and materializing subsets |
lance.dataset(...) | Lower-level schema, fragment, index, and blob access |