Glossary

Key terms in A-Z order. `(Ch N)` marks the chapter where the term is introduced or discussed in depth.

Definitions are kept in sync with the monorepo master at `glossary/master_en.md`.

A

ACT (Action Chunking with Transformers): Transformer-based action chunking — learns continuous action sequences from demonstrations to stabilize delayed-reward tasks. (Ch8)

Agentic Coding: Autonomous agent pattern where the model writes, executes, receives error feedback, and iterates on code. (Ch1, Ch2, Ch3, Ch4, Ch5, Ch6, Ch7, Ch8, Ch9, Ch10, Ch11)

Agentic Robotics: The same autonomous perceive-act-reflect loop extended to physical robots. (Ch1, Ch3, Ch4, Ch7, Ch8, Ch9, Ch10, Ch11)

AutoRT: Google's large-scale autonomous robot data collection system combining an LLM planner with safety guardrails. (Ch1, Ch2, Ch4, Ch8, Ch9, Ch10, Ch11)

AutoTAMP: Automated translation from natural language to TAMP (Task and Motion Planning) problems. (Ch2, Ch3, Ch5, Ch7, Ch10)

B

BUMBLE: Closed-loop robot planner that reflects on failures and replans. (Ch1, Ch2, Ch4, Ch8, Ch10, Ch11)

C

Closed-loop: Architecture that feeds execution results back to update plans. (Ch1, Ch2, Ch8, Ch10)

Code as Policies (CaP): LLMs directly generate Python / function-call code as robot control policies. (Ch1, Ch3)

Co-training: Strategy of jointly training on human and robot data for complementary representations. (Ch4)

D

Dexterous manipulation: Precise object manipulation with multi-fingered hands — in-hand rotation, assembly, etc. (Ch4, Ch6)

Diffusion Policy: Policy learning via conditional denoising diffusion over action distributions. (Ch1, Ch3, Ch4, Ch6, Ch10)

Domain Randomization: Randomizes simulation parameters to improve policy robustness. (Ch9)

DROID: Large-scale manipulation dataset collected via multi-institution collaboration. (Ch4, Ch6, Ch10)

E

Embodied-RAG: Retrieval-augmented generation applied to embodied memory for long-horizon planning. (Ch2, Ch7, Ch10, Ch11)

F

Flow Matching: Learns action distributions via continuous normalizing flows — core technique of pi0. (Ch4, Ch6)

Foundation Model: Large-scale pretrained general-purpose model — e.g., Sparsh (tactile), pi0 (VLA). (Ch2, Ch3, Ch4, Ch5, Ch6, Ch8, Ch9, Ch10)

G

Grounding: Connecting an LLM's abstract language to feasible actions, objects, and states in the environment. (Ch1, Ch2, Ch3, Ch7, Ch8, Ch10)

H

HAMSTER: Hierarchical VLA separating high-level subgoals from low-level policies. (Ch4, Ch5, Ch6, Ch10)

Hi Robot: Robot planner that decomposes instructions and plans hierarchically. (Ch5, Ch10)

I

IL (Imitation Learning): Learns policies by directly imitating human demonstrations. (Ch3)

K

KARMA: Robot agent with persistent spatial-temporal memory. (Ch1, Ch2, Ch7, Ch8, Ch10, Ch11)

L

LLM Planner: LLM-based planner that decomposes natural-language instructions into high-level step sequences. (Ch2)

O

Open X-Embodiment: Largest open-source robot dataset, aggregating 1M+ trajectories from 34 labs. (Ch4, Ch6, Ch9, Ch10)

OpenVLA: Open-source VLA foundation model (7B params, trained on Open X-Embodiment). (Ch1, Ch4, Ch5, Ch6, Ch10)

P

PaLM-E: Google's embodied multimodal language model unifying image, state, and language into one token space. (Ch1, Ch2, Ch4)

pi0 (π₀): Physical Intelligence's flow-based VLA foundation model. (Ch1, Ch4, Ch6, Ch10)

Point cloud: 3D coordinate set representing tactile or visual data. (Ch6)

Policy transfer: Transferring a policy learned in one domain (e.g., human demonstrations) to another (e.g., robot). (Ch9)

PragmaBot: Robot planner that uses pragmatic dialogue to clarify instructions. (Ch1, Ch3, Ch8, Ch9, Ch10)

R

REFLECT: Closed-loop pattern of reflecting on failures and replanning. (Ch1, Ch2, Ch8, Ch10, Ch11)

RL (Reinforcement Learning): Policy learning by maximizing reward. (Ch3)

RT-2: Google DeepMind's VLA model jointly trained on web VQA and robot manipulation. (Ch1, Ch2, Ch4)

RT-H: Hierarchical action representation combining language motion with low-level control. (Ch5, Ch9, Ch10)

S

SayCan: Grounded planning combining an LLM (what is reasonable to say) with affordances (what is feasible). (Ch1, Ch2, Ch4, Ch8, Ch11)

SayPlan: Hierarchical semantic search over 3D scene graphs to shorten LLM planning horizon. (Ch1, Ch2, Ch7, Ch10)

Scene Graph: Structured world representation with objects as nodes and relations as edges. (Ch2, Ch7, Ch8, Ch10, Ch11)

SIMPLER: Benchmark that aligns simulation evaluation with real-world performance. (Ch1, Ch3, Ch9, Ch10)

Sim-to-Real: Process/strategies for transferring policies trained in simulation to the real world. (Ch1, Ch4)

T

TAMP (Task and Motion Planning): Classical integrated framework for task and motion planning. (Ch3, Ch4, Ch5, Ch10)

V

VLA (Vision-Language-Action): Unified model that directly outputs robot actions from vision and language input. (Ch1, Ch2, Ch3, Ch4, Ch6, Ch10)

Z

Zero-shot planning: LLM's capability to generate plans for new tasks without additional training. (Ch2)