Embodied-AI

This repository offers a brief summary of essential papers and blogs on embodied AI, alongside a categorized collection of 3D scene representation, LLM agents papers and useful code repositories for starting your own project.

Topic 1: Learning about 3D reconstruction and scene rendering
Topic 2: Learning about 3D scene representation
Topic 3: Learning about LLM agents
Topic 4: Learning about Text-to-image/video
Topic 5: Learning about Auto driving
Topic 6: Diffusion for Robotics and RL
Topic 7: Benchmarks: simulators, environments, datasets

Topic 1: Learning about 3D reconstruction and scene rendering

(ICRA'24 Oral) Kashu Yamazaki, et al. Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation. 📚 🌍
(arxiv) Yuqi Zhang, et al. Efficient Large-scale Scene Representation with a Hybrid of High-resolution Grid and Plane Feature. 📚 🌍
(ICLR'24) Francis Engelmann, et al. OpenNeRF: OpenSet 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views. 📚 🌍

Topic 2: Learning about 3D scene representation

(CVPR'24) Alexandros Delitzas, et al. SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes. 📚 🌍
(CVPR'23) Songyou Peng, et al. OpenScene: 3D Scene Understanding with Open Vocabularies. 📚 🌍
(NeurIPS'23) Yining Hong, et al. 3D-LLM: Injecting the 3D World into Large Language Models. 📚 🌍
(ICCV'23) Yicong Hong, et al. Learning Navigational Visual Representations with Semantic Map Supervision. 📚 🌍
(NeurIPS'23) Ayça Takmaz, et al. OpenMask3D: Open-Vocabulary 3D Instance Segmentation. 📚 🌍
(ICCV'23 Oral) Justin Kerr, et al. LERF: Language Embedded Radiance Fields. 📚 🌍

Topic 3: Learning about LLM agents

(TMLR'22) Scott Reed, et al. A Generalist Agent. 📚
(arxiv) Michael S. Ryoo, et al. xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs. 📚 🌍
(COLM'24) Tianhua Tao, et al. CRYSTAL: Illuminating LLM Abilities on Language and Code. 📚 🌍
(COLM'24) Qingyun Wu, et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversations. 📚 🌍
(ECCV'24) Runsen Xu, et al. PointLLM: Empowering Large Language Models to Understand Point Clouds. 📚 🌍
(ICML'24 Oral) Ziniu Hu, et al. SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code. 📚

Topic 4: Learning about Text-to-image/video

(COLM'24) Abhay Zala, et al. DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning. 📚 🌍
(COLM'24) Han Lin, et al. VideoDirectorGPT: Consistent Multi-Scene Video Generation via LLM-Guided Planning. 📚 🌍

Topic 5: Learning about Auto driving

(COLM'24) Jiageng Mao, et al. A Language Agent for Autonomous Driving. 📚 🌍
(ICLR'24) Licheng Wen, et al. DiLu🐴: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models. 📚 🌍

Topic 6: Diffusion for Robotics and RL

(arxiv) Carmelo Sferrazza, et al. BodyTransformer:Leveraging Robot Embodiment for Policy Learning. 📚 🌍
(SIGGRAPH Asia'24) Agon Serifi, et al. Robot Motion Diffusion Model: Motion Generation for Robotic Characters. 📚
(NeurIPS'23) Biao Jiang, et al. MotionGPT: Human Motion as a Foreign Language. 📚 🌍

Topic 7: Benchmarks: simulators, environments, datasets

(NeurIPS'24) Tianbao Xie, et al. OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments. 📚 🌍

Other Resources

Userful Tools

Open3D

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Embodied-AI

Table of Contents

Other Resources

Userful Tools

About

Releases

Packages

License

tingchihc/embodied-AI-reading-list

Folders and files

Latest commit

History

Repository files navigation

Embodied-AI

Table of Contents

Other Resources

Userful Tools

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages