This repository offers a brief summary of essential papers and blogs on embodied AI, alongside a categorized collection of 3D scene representation, LLM agents papers and useful code repositories for starting your own project.
Topic 1: Learning about 3D reconstruction and scene rendering
Topic 2: Learning about 3D scene representation
Topic 3: Learning about LLM agents
Topic 4: Learning about Text-to-image/video
Topic 5: Learning about Auto driving
Topic 6: Diffusion for Robotics and RL
Topic 7: Benchmarks: simulators, environments, datasets
Topic 1: Learning about 3D reconstruction and scene rendering
- (ICRA'24 Oral) Kashu Yamazaki, et al. Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation. 📚 🌍
- (arxiv) Yuqi Zhang, et al. Efficient Large-scale Scene Representation with a Hybrid of High-resolution Grid and Plane Feature. 📚 🌍
- (ICLR'24) Francis Engelmann, et al. OpenNeRF: OpenSet 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views. 📚 🌍
Topic 2: Learning about 3D scene representation
- (CVPR'24) Alexandros Delitzas, et al. SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes. 📚 🌍
- (CVPR'23) Songyou Peng, et al. OpenScene: 3D Scene Understanding with Open Vocabularies. 📚 🌍
- (NeurIPS'23) Yining Hong, et al. 3D-LLM: Injecting the 3D World into Large Language Models. 📚 🌍
- (ICCV'23) Yicong Hong, et al. Learning Navigational Visual Representations with Semantic Map Supervision. 📚 🌍
- (NeurIPS'23) Ayça Takmaz, et al. OpenMask3D: Open-Vocabulary 3D Instance Segmentation. 📚 🌍
- (ICCV'23 Oral) Justin Kerr, et al. LERF: Language Embedded Radiance Fields. 📚 🌍
Topic 3: Learning about LLM agents
- (TMLR'22) Scott Reed, et al. A Generalist Agent. 📚
- (arxiv) Michael S. Ryoo, et al. xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs. 📚 🌍
- (COLM'24) Tianhua Tao, et al. CRYSTAL: Illuminating LLM Abilities on Language and Code. 📚 🌍
- (COLM'24) Qingyun Wu, et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversations. 📚 🌍
- (ECCV'24) Runsen Xu, et al. PointLLM: Empowering Large Language Models to Understand Point Clouds. 📚 🌍
- (ICML'24 Oral) Ziniu Hu, et al. SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code. 📚