Skip to content

Latest commit

 

History

History
206 lines (181 loc) · 18.8 KB

README.md

File metadata and controls

206 lines (181 loc) · 18.8 KB

GenAI_Papers

This project contains a list of interesting research papers in the field of GenAI.

Topics

  1. Overview
  2. Goals
  3. Scope and Context
  4. Research Papers
  5. Learning Logs

Overview

This repository is dedicated to the aggregation and discussion of groundbreaking research in the field of Generative AI.

Generative AI, or GenAI, refers to the subset of artificial intelligence focused on creating new content, ranging from text and images to code and beyond. The collection of papers included herein spans a variety of topics within GenAI, such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Transformer-based models.

This compendium serves as a resource for scholars, practitioners, and enthusiasts seeking to advance the state of the art in AI-driven content generation.

Goals

The primary goals of this repository are:

  1. Knowledge Consolidation: To centralize seminal and cutting-edge research papers that define and advance the GenAI field.
  2. Community Collaboration: To foster a collaborative environment where ideas and findings can be shared, discussed, and critiqued by the Gen AI research community.
  3. Innovation Promotion: To inspire and guide new research initiatives and practical applications of GenAI technologies.
  4. Interdisciplinary Integration: To encourage the cross-pollination of ideas from diverse fields such as computer science, cognitive psychology, and digital arts to enrich the GenAI research.

Scope and Context

Scope

The scope of this repository is to encompass a wide array of research within GenAI, including but not limited to:

  • Theoretical foundations of generative models
  • Technical advancements in algorithm design
  • Applications of GenAI in various domains (e.g., art, healthcare, software development)
  • Ethical considerations and societal impacts of GenAI

Context

The GenAI field is situated at the intersection of multiple disciplines. It leverages deep learning, statistical modeling, and computational creativity to generate novel outputs that can mimic or even surpass human-level creativity in certain aspects. With the rapid pace of advancement in AI, it is crucial to maintain a clear and organized overview of the progress in this area, which this repository aims to provide.

Research Papers

📝 Note: Not in a particular order.

Classification

Category Papers Description
Language Models & General AI 1, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 31, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 48, 54, 56, 58, 60, 66, 69, 74, 76, 79, 80, 82, 84, 86, 87, 89, 90, 92, 93, 95, 98, 99, 101, 103, 104 Papers related to language models, their applications, ethical considerations, and improvements in training or functionality.
Vision & Language Integration 3, 4, 29, 30, 33, 64 Focusing on the integration of visual data with language models, including vision transformers and text-to-image personalization.
Attention Mechanisms & Transformers 8, 9, 25, 28, 73 Discussing the theory of attention in deep learning and optimization of transformer models.
Music & Creative AI 5 A unique paper on music generation using AI.
High-Resolution Image Synthesis 6, 7, 63 Discussing high-resolution image synthesis using diffusion models and vision transformers.
Efficiency & Scaling in AI 2, 25, 26, 27, 28, 59, 61, 71, 72, 83, 88, 97 Covering AI efficiency in terms of memory, inference, and scaling.
Environmental Impact of AI 12 A unique paper focusing on the environmental impact of AI systems.
Dialog & Interaction-Focused AI 13, 24, 34, 35, 36, 37, 39, 53, 67, 81, 91 Involving dialogue applications and platforms for interactive language agents.
AI Enhancement & Meta-Learning 27, 31, 32, 37, 46, 47, 49, 55, 57, 62, 65, 68, 70, 75, 78, 96 On improving AI capabilities through self-improvement, preference optimization, and distillation.
Miscellaneous AI Applications 29, 30, 33, 50, 52, 77, 85, 94, 100, 102 Discussing niche AI applications like commonsense norms and visual instruction tuning.

Complete List

  1. Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts
  2. EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
  3. Key-Locked Rank One Editing for Text-to-Image Personalization
  4. ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
  5. Simple and Controllable Music Generation
  6. High-Resolution Image Synthesis with Latent Diffusion Models
  7. All are Worth Words: A ViT Backbone for Diffusion Models
  8. Attention Is All You Need
  9. A Mathematical View of Attention Models in Deep Learning
  10. Improving Language Understanding by Generative Pre-Training
  11. Large Language Models and the Reverse Turing Test
  12. Estimating the Carbon Footprint of Bloom, a 176b Parameter Language Model
  13. LaMDA: Language Models for Dialog Applications
  14. Gorilla: Large Language Model Connected with Massive APIs
  15. Foundation Models for Decision Making Problems, Methods, and Opportunities
  16. Continual Pre-training of Language Models
  17. How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
  18. Alpagasus: Training a Better Alpaca with Fewer Data
  19. Ethical and social risks of harm from Language Models
  20. Holistic Evaluation of Language Models
  21. On the Risk of Misinformation Pollution with Large Language Models
  22. The Capacity for Moral Self-Correction in Large Language Models
  23. HONEST: Measuring Hurtful Sentence Completion in Language Models
  24. ReAct: Synergizing Reasoning and Acting in Language Models
  25. Efficiently Scaling Transformer Inference
  26. Hungry Hungry Hippos: Towards Language Modeling with State Space Models
  27. Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution
  28. Efficient Streaming Language Models with Attention Sinks
  29. Visual Instruction Tuning
  30. Improved Baselines with Visual Instruction Tuning
  31. Direct Preference Optimization: Your Language Model is Secretly a Reward Model
  32. Distil-Whisper: Robust Knowledge Distilation via Large-Scale Pseudo Labelling
  33. Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms
  34. TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
  35. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
  36. InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
  37. OpenAgents: An Open Platform for Language Agents in the Wild
  38. Large Language Models Understand and Can be Enhanced by Emotional Stimuli
  39. Communicative Agents for Software Development
  40. Large Language Models Are Human-Level Prompt Engineers
  41. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
  42. Self-Consistency Improves Chain of Thought Reasoning in Language Models
  43. Language Models can be Logical Solvers
  44. Lost in the Middle: How Language Models Use Long Contexts
  45. Contrastive Chain-of-Thought Prompting
  46. RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!
  47. LLM in a flash: Efficient Large Language Model Inference with Limited Memory
  48. PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
  49. Human Centered Loss Functions (HALOs)
  50. A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
  51. Distributed Inference and Fine-tuning of Large Language Models Over The Internet
  52. GAIA: Zero-shot Talking Avatar Generation
  53. SLEEPER AGENTS: TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING
  54. LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
  55. Foundations of Vector Retrieval
  56. Self-Rewarding Language Models
  57. BloombergGPT: A Large Language Model for Finance
  58. Mistral 7B
  59. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
  60. MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
  61. Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models
  62. Orca 2: Teaching Small Language Models How to Reason
  63. ConvNets Match Vision Transformers at Scale
  64. Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
  65. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
  66. Llama 2: Open Foundation and Fine-Tuned Chat Models
  67. QLoRA: Efficient Finetuning of Quantized LLMs
  68. RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
  69. Training language models to follow instructions with human feedback
  70. Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
  71. Sparse Networks from Scratch: Faster Training without Losing Performance
  72. ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
  73. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
  74. DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
  75. MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
  76. Code Llama: Open Foundation Models for Code
  77. LLaMA Pro: Progressive LLaMA with Block Expansion
  78. Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
  79. Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
  80. Large Language Model based Multi-Agents: A Survey of Progress and Challenges
  81. Retrieval-Augmented Generation for Large Language Models: A Survey
  82. ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models
  83. The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
  84. Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
  85. RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
  86. Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
  87. Datasets for Large Language Models: A Comprehensive Survey
  88. An LLM Compiler for Parallel Function Calling
  89. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
  90. Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
  91. ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
  92. StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
  93. A Critical Evaluation of AI Feedback for Aligning Large Language Models
  94. Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
  95. Are Emergent Abilities of Large Language Models a Mirage?
  96. Yi: Open Foundation Models by 01.AI
  97. ORPO: Monolithic Preference Optimization without Reference Model
  98. Do Large Language Models Understand Logic or Just Mimick Context?
  99. Evaluating Large Language Models Trained on Code
  100. Self-Refine: Iterative Refinement with Self-Feedback
  101. Reflexion: Language Agents with Verbal Reinforcement Learning
  102. MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
  103. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
  104. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
  105. Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
  106. A survey of Generative AI Applications
  107. MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL
  108. Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
  109. MetaGPT: Meta Programming For A Multi-Agent Collaborative Framework
  110. Understanding Transformer Reasoning Capabilities via Graph Algorithms
  111. Banishing LLM Hallucinations Requires Rethinking Generalization
  112. Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers
  113. LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples
  114. Memory^3 : Language Modeling with Explicit Memory
  115. NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints
  116. LOTUS: Enabling Semantic Queries with LLMs Over Tables of Unstructured and Structured Data
  117. Text2SQL is Not Enough: Unifying AI and Databases with TAG
  118. Chain-of-Thought Reasoning Without Prompting
  119. Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
  120. Premise Order Matters in Reasoning with Large Language Models
  121. Teaching Large Language Models to Self-Debug
  122. SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning
  123. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
  124. Agentic Retrieval-Augmented Generation for Time Series Analysis
  125. Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
  126. OLMoE: Open Mixture-of-Experts Language Models
  127. Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
  128. Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
  129. Let's Verify Step by Step
  130. Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
  131. V-STaR: Training Verifiers for Self-Taught Reasoners
  132. Agent Workflow Memory

Learning Logs

Date Learning