Consolidated References

63 references

[1] Huang, W. et al., "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents," arXiv:2201.07207, 2022. scholar

[2] Ahn, M. et al., "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances," arXiv:2204.01691, 2022. scholar

[3] Liang, J. et al., "Code as Policies: Language Model Programs for Embodied Control," arXiv:2209.07753, 2022. scholar

[4] Driess, D. et al., "PaLM-E: An Embodied Multimodal Language Model," arXiv:2303.03378, 2023. scholar

[5] Brohan, A. et al., "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control," arXiv:2307.15818, 2023. scholar

[6] Chi, C. et al., "Diffusion Policy: Visuomotor Policy Learning via Action Diffusion," arXiv:2303.04137, 2023. scholar

[7] Liu, Z. et al., "REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction," arXiv:2306.15724, 2023. scholar

[8] Kim, M. J. et al., "OpenVLA: An Open-Source Vision-Language-Action Model," arXiv:2406.09246, 2024. scholar

[9] Ghosh, D. et al., "Octo: An Open-Source Generalist Robot Policy," arXiv:2405.12213, 2024. scholar

[10] Black, K. et al., "pi0: A Vision-Language-Action Flow Model for General Robot Control," arXiv:2410.24164, 2024. scholar

[11] Li, X. et al., "Evaluating Real-World Robot Manipulation Policies in Simulation (SIMPLER)," arXiv:2405.05941, 2024. scholar

[12] Brohan, A. et al., "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents," arXiv:2401.12963, 2024. scholar

[13] Shah, M. et al., "BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for Building-wide Mobile Manipulation," arXiv:2410.06237, 2024. scholar

[14] Wang, Z. et al., "KARMA: Augmenting Embodied AI Agents with Long-and-short Term Memory Systems," arXiv:2409.14908, 2024. scholar

[15] Rana, K. et al., "SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning," arXiv:2307.06135, 2023. scholar

[16] Fu, M. et al., "CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation," arXiv:2603.22435, 2026. scholar

[17] Xie, Q. et al., "Embodied-RAG: General Non-parametric Embodied Memory for Retrieval and Generation," arXiv:2409.18313, 2024. scholar

[18] Ekpo, D. et al., "VeriGraph: Scene Graphs for Execution Verifiable Robot Planning," arXiv:2411.10446, 2024. scholar

[19] Chen, Y. et al., "AutoTAMP: Autoregressive Task and Motion Planning with LLMs as Translators and Checkers," arXiv:2306.06531, 2023. scholar

[20] Survey, "A Survey on Large Language Models for Automated Planning," arXiv:2502.12435, 2025. scholar

[21] Chen, Y. et al., "Code-as-Symbolic-Planner: Foundation Model-Based Robot Planning via Symbolic Code Generation via Symbolic Computing," arXiv:2503.01700, 2025. scholar

[22] RL-GPT, "Integrating Reinforcement Learning and Code-as-Policy," arXiv:2402.19299, 2024. scholar

[23] Mikami, Y. et al., "Natural Language as Policies: Reasoning for Coordinate-Level Embodied Control with LLMs," arXiv:2403.13801, 2024. scholar

[24] Lang4Sim2Real, "Natural Language Can Help Bridge the Sim2Real Gap," arXiv:2405.10020, 2024. scholar

[25] Huang, W. et al., "Language Models as Zero-Shot Planners," arXiv:2201.07207, 2022. scholar

[26] Open X-Embodiment Collaboration, "Open X-Embodiment: Robotic Learning Datasets and RT-X Models," arXiv:2310.08864, 2023. scholar

[27] Black, K. et al., "π0: A Vision-Language-Action Flow Model for General Robot Control," arXiv:2410.24164, 2024. scholar

[28] Physical Intelligence, "π0.5: A Vision-Language-Action Model with Open-World Generalization," arXiv:2504.16054, 2025. scholar

[29] NVIDIA, "GR00T N1: An Open Foundation Model for Generalist Humanoid Robots," arXiv:2503.14734, 2025. scholar

[30] FAST, "Efficient Action Tokenization for Vision-Language-Action Models," arXiv:2501.09747, 2025. scholar

[31] TinyVLA, "TinyVLA: Towards Fast and Data-Efficient Vision-Language-Action Models," arXiv:2409.12514, 2024. scholar

[32] Khazatsky, A. et al., "DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset," arXiv:2403.12945, 2024. scholar

[33] "What Matters in Building Vision-Language-Action Models," arXiv:2412.14058, 2024. scholar

[34] Belkhale, S. et al., "RT-H: Action Hierarchies Using Language," arXiv:2403.01823, 2024. scholar

[35] Shi, L. X. et al., "Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models," arXiv:2502.19417, 2025. scholar

[36] Li, J. et al., "HAMSTER: Hierarchical Action Models for Open-World Robot Manipulation," arXiv:2502.05485, 2025. scholar

[37] Ke, T. et al., "3D Diffuser Actor: Policy Diffusion with 3D Scene Representations," arXiv:2402.10885, 2024. scholar

[38] Jiang, H. et al., "RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation," arXiv:2402.15487, 2024. scholar

[39] MoMa-LLM, "Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation," arXiv:2403.08605, 2024. scholar

[40] 3D-Mem, "3D Scene Memory for Embodied Exploration and Reasoning," arXiv:2411.17735, 2024. scholar

[41] RoboMemory, "RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Interactive Environmental Learning in Physical Embodied Systems," arXiv:2508.01415, 2025. scholar

[42] PragmaBot, "A Pragmatist Robot: Learning to Plan Tasks by Experiencing the Real World," arXiv:2507.16713, 2025. scholar

[43] Yardi, Y. et al., "Bridging the Sim2Real Gap: Vision Encoder Pre-Training for Visuomotor Policy Transfer," 2025. scholar

[44] Chen, Y. et al., "Foundation Model-Based Robot Planning via Symbolic Code Generation for TAMP," arXiv:2503.01700, 2025. scholar

[45] Shah, M. et al., "BUMBLE: Unifying Reasoning and Acting with VLMs for Building-wide Mobile Manipulation," arXiv:2410.06237, 2024. scholar

[46] The Register, "Claude Code's innards revealed as source code leaked online," theregister.com, April 2026. scholar

[47] MindStudio, "Claude Code Source Leak: The Three-Layer Memory Architecture and What It Means for Builders," mindstudio.ai/blog, 2026. scholar

[48] Rajiv Pant, "How Claude's Memory Actually Works (And Why CLAUDE.md Matters)," rajiv.com/blog, December 2025. scholar

[49] Penligent, "Inside Claude Code: The Architecture Behind Tools, Memory, Hooks, and MCP," penligent.ai, 2025. scholar

[50] VentureBeat, "Claude Code's source code appears to have leaked: here's what we know," venturebeat.com, 2026. scholar

[51] Anthropic, "Claude Code Best Practices," anthropic.com/engineering, 2025. scholar

[52] OpenAI, "Introducing Codex," openai.com/index/introducing-codex, May 2025. scholar

[53] OpenAI, "Introducing the Codex App," openai.com/index/introducing-the-codex-app, February 2026. scholar

[54] OpenAI, "Introducing upgrades to Codex," openai.com/index/introducing-upgrades-to-codex, 2026. scholar

[55] Wikipedia, "OpenAI Codex (AI agent)," en.wikipedia.org, 2026. scholar

[56] Morphllm, "Claude Code as Orchestrator: Inter-Agent Communication Protocols," morphllm.com, 2026. scholar

[57] Morphllm, "Claude Code Subagents: How They Work, What They See & When to Use Them," morphllm.com, 2026. scholar

[58] Paddo.dev, "Claude Code Auto-Fix: The PR That Fixes Itself," paddo.dev/blog, 2026. scholar

[59] Springer, "Agentic AI: A Comprehensive Survey of Architectures, Applications, and Future Directions," Artificial Intelligence Review, 2025. scholar

[60] Anthropic, "2026 Agentic Coding Trends Report," resources.anthropic.com, 2026. scholar

[61] Claude Code Docs, "Create custom subagents," code.claude.com/docs/en/sub-agents, 2026. scholar

[62] Claude Code Docs, "How Claude remembers your project," code.claude.com/docs/en/memory, 2026. scholar

[63] Dbreunig, "How Claude Code Builds a System Prompt," dbreunig.com, April 2026. scholar

Acknowledgment

This book traces the research evolution from LLM-based robot planning to agentic robotics. It analyzes fundamental differences between agentic coding and agentic robotics to chart the future of Physical AI.

Special thanks to Prof. Sungjoon Choi and Chanwoo Kim (PhD candidate) at Korea University. This survey was inspired by Chanwoo Kim's seminar presentation, and the reference papers from his seminar formed the foundation of this work.

This project was built using the Harness skill by Minho Hwang.

AI tools were used in the production of this work: Claude (Opus 4.6) for literature survey, content generation, and manuscript preparation.