GenAI_Papers

This project contains a list of interesting research papers in the field of GenAI.

Topics

Overview
Goals
Scope and Context
Research Papers
Learning Logs

Overview

This repository is dedicated to the aggregation and discussion of groundbreaking research in the field of Generative AI.

Generative AI, or GenAI, refers to the subset of artificial intelligence focused on creating new content, ranging from text and images to code and beyond. The collection of papers included herein spans a variety of topics within GenAI, such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Transformer-based models.

This compendium serves as a resource for scholars, practitioners, and enthusiasts seeking to advance the state of the art in AI-driven content generation.

Goals

The primary goals of this repository are:

Knowledge Consolidation: To centralize seminal and cutting-edge research papers that define and advance the GenAI field.
Community Collaboration: To foster a collaborative environment where ideas and findings can be shared, discussed, and critiqued by the Gen AI research community.
Innovation Promotion: To inspire and guide new research initiatives and practical applications of GenAI technologies.
Interdisciplinary Integration: To encourage the cross-pollination of ideas from diverse fields such as computer science, cognitive psychology, and digital arts to enrich the GenAI research.

Scope and Context

Scope

The scope of this repository is to encompass a wide array of research within GenAI, including but not limited to:

Theoretical foundations of generative models
Technical advancements in algorithm design
Applications of GenAI in various domains (e.g., art, healthcare, software development)
Ethical considerations and societal impacts of GenAI

Context

The GenAI field is situated at the intersection of multiple disciplines. It leverages deep learning, statistical modeling, and computational creativity to generate novel outputs that can mimic or even surpass human-level creativity in certain aspects. With the rapid pace of advancement in AI, it is crucial to maintain a clear and organized overview of the progress in this area, which this repository aims to provide.

Research Papers

📝 Note: Not in a particular order.

Classification

Category	Papers	Description
Language Models & General AI	1, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 31, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 48, 54, 56, 58, 60, 66, 69, 74, 76, 79, 80, 82, 84, 86, 87, 89, 90, 92, 93, 95, 98, 99, 101, 103, 104	Papers related to language models, their applications, ethical considerations, and improvements in training or functionality.
Vision & Language Integration	3, 4, 29, 30, 33, 64	Focusing on the integration of visual data with language models, including vision transformers and text-to-image personalization.
Attention Mechanisms & Transformers	8, 9, 25, 28, 73	Discussing the theory of attention in deep learning and optimization of transformer models.
Music & Creative AI	5	A unique paper on music generation using AI.
High-Resolution Image Synthesis	6, 7, 63	Discussing high-resolution image synthesis using diffusion models and vision transformers.
Efficiency & Scaling in AI	2, 25, 26, 27, 28, 59, 61, 71, 72, 83, 88, 97	Covering AI efficiency in terms of memory, inference, and scaling.
Environmental Impact of AI	12	A unique paper focusing on the environmental impact of AI systems.
Dialog & Interaction-Focused AI	13, 24, 34, 35, 36, 37, 39, 53, 67, 81, 91	Involving dialogue applications and platforms for interactive language agents.
AI Enhancement & Meta-Learning	27, 31, 32, 37, 46, 47, 49, 55, 57, 62, 65, 68, 70, 75, 78, 96	On improving AI capabilities through self-improvement, preference optimization, and distillation.
Miscellaneous AI Applications	29, 30, 33, 50, 52, 77, 85, 94, 100, 102	Discussing niche AI applications like commonsense norms and visual instruction tuning.

Complete List

Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
Key-Locked Rank One Editing for Text-to-Image Personalization
ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
Simple and Controllable Music Generation
High-Resolution Image Synthesis with Latent Diffusion Models
All are Worth Words: A ViT Backbone for Diffusion Models
Attention Is All You Need
A Mathematical View of Attention Models in Deep Learning
Improving Language Understanding by Generative Pre-Training
Large Language Models and the Reverse Turing Test
Estimating the Carbon Footprint of Bloom, a 176b Parameter Language Model
LaMDA: Language Models for Dialog Applications
Gorilla: Large Language Model Connected with Massive APIs
Foundation Models for Decision Making Problems, Methods, and Opportunities
Continual Pre-training of Language Models
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
Alpagasus: Training a Better Alpaca with Fewer Data
Ethical and social risks of harm from Language Models
Holistic Evaluation of Language Models
On the Risk of Misinformation Pollution with Large Language Models
The Capacity for Moral Self-Correction in Large Language Models
HONEST: Measuring Hurtful Sentence Completion in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Efficiently Scaling Transformer Inference
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution
Efficient Streaming Language Models with Attention Sinks
Visual Instruction Tuning
Improved Baselines with Visual Instruction Tuning
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Distil-Whisper: Robust Knowledge Distilation via Large-Scale Pseudo Labelling
Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
OpenAgents: An Open Platform for Language Agents in the Wild
Large Language Models Understand and Can be Enhanced by Emotional Stimuli
Communicative Agents for Software Development
Large Language Models Are Human-Level Prompt Engineers
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Language Models can be Logical Solvers
Lost in the Middle: How Language Models Use Long Contexts
Contrastive Chain-of-Thought Prompting
RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Human Centered Loss Functions (HALOs)
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Distributed Inference and Fine-tuning of Large Language Models Over The Internet
GAIA: Zero-shot Talking Avatar Generation
SLEEPER AGENTS: TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
Foundations of Vector Retrieval
Self-Rewarding Language Models
BloombergGPT: A Large Language Model for Finance
Mistral 7B
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models
Orca 2: Teaching Small Language Models How to Reason
ConvNets Match Vision Transformers at Scale
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Llama 2: Open Foundation and Fine-Tuned Chat Models
QLoRA: Efficient Finetuning of Quantized LLMs
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
Training language models to follow instructions with human feedback
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
Sparse Networks from Scratch: Faster Training without Losing Performance
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Code Llama: Open Foundation Models for Code
LLaMA Pro: Progressive LLaMA with Block Expansion
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
Retrieval-Augmented Generation for Large Language Models: A Survey
ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Datasets for Large Language Models: A Comprehensive Survey
An LLM Compiler for Parallel Function Calling
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
A Critical Evaluation of AI Feedback for Aligning Large Language Models
Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
Are Emergent Abilities of Large Language Models a Mirage?
Yi: Open Foundation Models by 01.AI
ORPO: Monolithic Preference Optimization without Reference Model
Do Large Language Models Understand Logic or Just Mimick Context?
Evaluating Large Language Models Trained on Code
Self-Refine: Iterative Refinement with Self-Feedback
Reflexion: Language Agents with Verbal Reinforcement Learning
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
A survey of Generative AI Applications
MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
MetaGPT: Meta Programming For A Multi-Agent Collaborative Framework
Understanding Transformer Reasoning Capabilities via Graph Algorithms
Banishing LLM Hallucinations Requires Rethinking Generalization
Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers
LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples
Memory^3 : Language Modeling with Explicit Memory
NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints
LOTUS: Enabling Semantic Queries with LLMs Over Tables of Unstructured and Structured Data
Text2SQL is Not Enough: Unifying AI and Databases with TAG
Chain-of-Thought Reasoning Without Prompting
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Premise Order Matters in Reasoning with Large Language Models
Teaching Large Language Models to Self-Debug
SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Agentic Retrieval-Augmented Generation for Time Series Analysis
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
OLMoE: Open Mixture-of-Experts Language Models
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Let's Verify Step by Step
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
V-STaR: Training Verifiers for Self-Taught Reasoners
Agent Workflow Memory

Learning Logs

Date	Learning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

GenAI_Papers

Topics

Overview

Goals

Scope and Context

Scope

Context

Research Papers

Classification

Complete List

Learning Logs

Files

README.md

Latest commit

History

README.md

File metadata and controls

GenAI_Papers

Topics

Overview

Goals

Scope and Context

Scope

Context

Research Papers

Classification

Complete List

Learning Logs