Skip to content

tingchihc/embodied-AI-reading-list

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 

Repository files navigation

Embodied-AI

This repository offers a brief summary of essential papers and blogs on embodied AI, alongside a categorized collection of 3D scene representation, LLM agents papers and useful code repositories for starting your own project.

Table of Contents

Topic 1: Learning about 3D reconstruction and scene rendering
Topic 2: Learning about 3D scene representation
Topic 3: Learning about LLM agents
Topic 4: Learning about Text-to-image/video
Topic 5: Learning about Auto driving
Topic 6: Diffusion for Robotics and RL
Topic 7: Benchmarks: simulators, environments, datasets

Topic 1: Learning about 3D reconstruction and scene rendering
  • (ICRA'24 Oral) Kashu Yamazaki, et al. Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation. 📚 🌍
  • (arxiv) Yuqi Zhang, et al. Efficient Large-scale Scene Representation with a Hybrid of High-resolution Grid and Plane Feature. 📚 🌍
  • (ICLR'24) Francis Engelmann, et al. OpenNeRF: OpenSet 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views. 📚 🌍
Topic 2: Learning about 3D scene representation
  • (CVPR'24) Alexandros Delitzas, et al. SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes. 📚 🌍
  • (CVPR'23) Songyou Peng, et al. OpenScene: 3D Scene Understanding with Open Vocabularies. 📚 🌍
  • (NeurIPS'23) Yining Hong, et al. 3D-LLM: Injecting the 3D World into Large Language Models. 📚 🌍
  • (ICCV'23) Yicong Hong, et al. Learning Navigational Visual Representations with Semantic Map Supervision. 📚 🌍
  • (NeurIPS'23) Ayça Takmaz, et al. OpenMask3D: Open-Vocabulary 3D Instance Segmentation. 📚 🌍
  • (ICCV'23 Oral) Justin Kerr, et al. LERF: Language Embedded Radiance Fields. 📚 🌍
Topic 3: Learning about LLM agents
  • (TMLR'22) Scott Reed, et al. A Generalist Agent. 📚
  • (arxiv) Michael S. Ryoo, et al. xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs. 📚 🌍
  • (COLM'24) Tianhua Tao, et al. CRYSTAL: Illuminating LLM Abilities on Language and Code. 📚 🌍
  • (COLM'24) Qingyun Wu, et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversations. 📚 🌍
  • (ECCV'24) Runsen Xu, et al. PointLLM: Empowering Large Language Models to Understand Point Clouds. 📚 🌍
  • (ICML'24 Oral) Ziniu Hu, et al. SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code. 📚
Topic 4: Learning about Text-to-image/video
  • (COLM'24) Abhay Zala, et al. DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning. 📚 🌍
  • (COLM'24) Han Lin, et al. VideoDirectorGPT: Consistent Multi-Scene Video Generation via LLM-Guided Planning. 📚 🌍
Topic 5: Learning about Auto driving
  • (COLM'24) Jiageng Mao, et al. A Language Agent for Autonomous Driving. 📚 🌍
  • (ICLR'24) Licheng Wen, et al. DiLu🐴: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models. 📚 🌍
Topic 6: Diffusion for Robotics and RL
  • (arxiv) Carmelo Sferrazza, et al. BodyTransformer:Leveraging Robot Embodiment for Policy Learning. 📚 🌍
  • (SIGGRAPH Asia'24) Agon Serifi, et al. Robot Motion Diffusion Model: Motion Generation for Robotic Characters. 📚
  • (NeurIPS'23) Biao Jiang, et al. MotionGPT: Human Motion as a Foreign Language. 📚 🌍
Topic 7: Benchmarks: simulators, environments, datasets
  • (NeurIPS'24) Tianbao Xie, et al. OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments. 📚 🌍

Other Resources

  1. Awesome-LLM-3D
  2. Awesome-LLM
  3. Diffusion-Literature-for-Robotics
  4. Awesome Diffusion Model in RL
  5. Awesome-LLM4AD
  6. Awesome-LLM-Robotics

Userful Tools

  1. Open3D

About

reading list for 3D scene representation, NeRF and more ...

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published