Papers and Resources on Adding Conditional Controls to Diffusion Models in the Era of AIGC.
🗂️ Table of Contents
-
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation. 🔥 [project] [paper] [code]
Yinwei Wu, Xianpan Zhou, Bing Ma, Xuefeng Su, Kai Ma, Xinchao Wang. Preprint 2024.
-
CSGO: Content-Style Composition in Text-to-Image Generation. [project] [paper] [code]
Peng Xing, Haofan Wang, Yanpeng Sun, Qixun Wang, Xu Bai, Hao Ai, Renyuan Huang, Zechao Li. Preprint 2024.
-
Generative Photomontage. [project] [paper] [code]
Sean J. Liu, Nupur Kumari, Ariel Shamir, Jun-Yan Zhu. Preprint 2024.
-
Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches. [project] [paper]
Yongzhi Xu, Yonhon Ng, Yifu Wang, Inkyu Sa, Yunfei Duan, Yang Li, Pan Ji, Hongdong Li. Preprint 2024.
-
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts. [project] [paper] [code]
Ciara Rowles, Shimon Vainer, Dante De Nigris, Slava Elizarov, Konstantin Kutsy, Simon Donné. Preprint 2024.
-
ViPer: Visual Personalization of Generative Models via Individual Preference Learning. [project] [paper] [code]
Sogand Salehi, Mahdi Shafiei, Teresa Yeo, Roman Bachmann, Amir Zamir. ECCV'24.
-
Training-free Composite Scene Generation for Layout-to-Image Synthesis. [paper] [code]
Jiaqi Liu, Tao Huang, Chang Xu. ECCV'24.
-
SEED-Story: Multimodal Long Story Generation with Large Language Model. [paper] [code]
Shuai Yang, Yuying Ge, Yang Li, Yukang Chen, Yixiao Ge, Ying Shan, Yingcong Chen. Preprint 2024.
-
Sketch-Guided Scene Image Generation. [paper]
Tianyu Zhang, Xiaoxuan Xie, Xusheng Du, Haoran Xie. Preprint 2024.
-
Instant 3D Human Avatar Generation using Image Diffusion Models. [project] [paper]
Nikos Kolotouros, Thiemo Alldieck, Enric Corona, Eduard Gabriel Bazavan, Cristian Sminchisescu. ECCV'24.
-
Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance. 🔥 [project] [paper] [code]
Kuan Heng Lin, Sicheng Mo, Ben Klingher, Fangzhou Mu, Bolei Zhou. Preprint 2024.
-
Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis. [paper] [code]
Marianna Ohanyan, Hayk Manukyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi. CVPR'24.
-
pOps: Photo-Inspired Diffusion Operators. 🔥 [project] [paper] [code]
Elad Richardson, Yuval Alaluf, Ali Mahdavi-Amiri, Daniel Cohen-Or. Preprint 2024.
-
RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control. [project] [paper] [code]
Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu. Preprint 2024. 🔥
-
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition. [project] [paper] [code]
Ganggui Ding, Canyu Zhao, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen. CVPR'24.
-
Personalized Residuals for Concept-Driven Text-to-Image Generation. [project] [paper]
Cusuh Ham, Matthew Fisher, James Hays, Nicholas Kolkin, Yuchen Liu, Richard Zhang, Tobias Hinz. CVPR'24.
-
Compositional Text-to-Image Generation with Dense Blob Representations. 🔥 [project] [paper]
Weili Nie, Sifei Liu, Morteza Mardani, Chao Liu, Benjamin Eckart, Arash Vahdat. ICML'24.
-
Customizing Text-to-Image Models with a Single Image Pair. [project] [paper] [code]
Maxwell Jones, Sheng-Yu Wang, Nupur Kumari, David Bau, Jun-Yan Zhu. Preprint 2024.
-
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation. [paper]
Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou. Preprint 2024.
-
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation. [paper]
Chanran Kim, Jeongin Lee, Shichang Joung, Bongmo Kim, Yeul-Min Baek. Preprint 2024.
-
PuLID: Pure and Lightning ID Customization via Contrastive Alignment. [paper] [code]
Zinan Guo, Yanze Wu, Zhuowei Chen, Lang Chen, Qian He. Tech Report 2024.
-
MultiBooth: Towards Generating All Your Concepts in an Image from Text. [project] [paper] [code]
Chenyang Zhu, Kai Li, Yue Ma, Chunming He, Li Xiu. Preprint 2024.
-
StyleBooth: Image Style Editing with Multimodal Instruction. [project] [paper] [code]
Zhen Han, Chaojie Mao, Zeyinzi Jiang, Yulin Pan, Jingfeng Zhang. Preprint 2024.
-
MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation. 🔥 [project] [paper] [code]
Kunpeng Song, Yizhe Zhu, Bingchen Liu, Qing Yan, Ahmed Elgammal, Xiao Yang. ECCV'24.
-
Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding. [paper]
Zezhong Fan, Xiaohan Li, Chenhao Fang, Topojoy Biswas, Kaushiki Nag, Jianpeng Xu, Kannan Achan. WWW'24.
-
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation. [project] [paper] [code]
Kuan-Chieh Wang, Daniil Ostashev, Yuwei Fang, Sergey Tulyakov, Kfir Aberman. Preprint 2024.
-
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models. [project] [paper] [code]
Nithin Gopalakrishnan Nair, Jeya Maria Jose Valanarasu, Vishal M Patel. ECCV'24.
-
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model. [project] [paper] [code]
Han Lin, Jaemin Cho, Abhay Zala, Mohit Bansal. Preprint 2024.
-
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback. [project] [paper] [code]
Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen. ECCV'24.
-
Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models. [project] [paper]
Sangwon Jang, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang. Preprint 2024.
-
Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models. [paper]
Gihyun Kwon, Simon Jenni, Dingzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron. CVPR'24.
-
FlashFace: Human Image Personalization with High-fidelity Identity Preservation. [project] [paper] [code]
Shilong Zhang, Lianghua Huang, Xi Chen, Yifei Zhang, Zhi-Fan Wu, Yutong Feng, Wei Wang, Yujun Shen, Yu Liu, Ping Luo. Preprint 2024.
-
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation. [project] [paper] [code]
Omer Dahary, Or Patashnik, Kfir Aberman, Daniel Cohen-Or. ECCV'24.
-
Continuous Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions. [project] [paper] [code]
Stefan Andreas Baumann, Felix Krause, Michael Neumayr, Nick Stracke, Vincent Tao Hu, Björn Ommer. Preprint 2024.
-
Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation. [project] [paper] [code]
Fangfu Liu, Hanyang Wang, Weiliang Chen, Haowen Sun, Yueqi Duan. ECCV'24.
-
FeedFace: Efficient Inference-based Face Personalization via Diffusion Models. 🔥 [paper] [code]
Chendong Xiang, Armando Fortes, Khang Hui Chua, Hang Su, Jun Zhu. Tiny Papers @ ICLR'24.
-
Multi-LoRA Composition for Image Generation. [project] [paper] [code]
Ming Zhong, Yelong Shen, Shuohang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, Weizhu Chen. Preprint 2024.
-
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition. [project] [paper] [code]
Chun-Hsiao Yeh, Ta-Ying Cheng, He-Yen Hsieh, Chuan-En Lin, Yi Ma, Andrew Markham, Niki Trigoni, H.T. Kung, Yubei Chen. Tech Report 2024.
-
Visual Style Prompting with Swapping Self-Attention. [project] [paper] [code]
Jaeseok Jeong, Junho Kim, Yunjey Choi, Gayoung Lee, Youngjung Uh. Preprint 2024.
-
RealCompo: Dynamic Equilibrium between Realism and Composition Improves Text-to-Image Diffusion Models. [paper] [code]
Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Jiake Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, Bin Cui. Preprint 2024.
-
Direct Consistency Optimization for Compositional Text-to-Image Personalization. [project] [paper] [code]
Kyungmin Lee, Sangkyung Kwak, Kihyuk Sohn, Jinwoo Shin. Preprint 2024.
-
InstanceDiffusion: Instance-level Control for Image Generation. [project] [paper] [code]
Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra. CVPR'24.
-
Training-Free Consistent Text-to-Image Generation. [project] [paper]
Yoad Tewel, Omri Kaduri, Rinon Gal, Yoni Kasten, Lior Wolf, Gal Chechik, Yuval Atzmon. SIGGRAPH'24.
-
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion. 🔥 [project] [paper]
Wei Li, Xue Xu, Jiachen Liu, Xinyan Xiao. ACL'24.
-
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs. 🔥 [paper] [code]
Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin Cui. ICML'24.
-
InstantID: Zero-shot Identity-Preserving Generation in Seconds. [project] [paper] [code]
Qixun Wang, Xu Bai, Haofan Wang, Zekui Qin, Anthony Chen, Huaxia Li, Xu Tang, Yao Hu. Tech Report 2024. 🔥
-
PALP: Prompt Aligned Personalization of Text-to-Image Models. [project] [paper]
Qixun Wang, Xu Bai, Haofan Wang, Zekui Qin, Anthony Chen. Preprint 2024.
-
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing. [project] [paper] [code]
Zeyinzi Jiang, Chaojie Mao, Yulin Pan, Zhen Han, Jingfeng Zhang. CVPR'24.
-
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding. [project] [paper] [code]
Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan. CVPR'24. 🔥
-
Context Diffusion: In-Context Aware Image Generation. [project] [paper]
Ivona Najdenkoska, Animesh Sinha, Abhimanyu Dubey, Dhruv Mahajan, Vignesh Ramanathan, Filip Radenovic. ECCV'24.
-
Style Aligned Image Generation via Shared Attention. 🔥 [project] [paper] [code]
Amir Hertz, Andrey Voynov, Shlomi Fruchter, Daniel Cohen-Or. CVPR'24.
-
Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models. [project] [paper] [code]
Daniel Geng, Inbum Park, Andrew Owens. CVPR'24.
-
MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion. [project] [paper] [code]
Di Chang, Yichun Shi, Quankai Gao, Jessica Fu, Hongyi Xu, Guoxian Song, Qing Yan, Xiao Yang, Mohammad Soleymani. ICML'24.
-
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models. [project] [paper] [code]
Omri Avrahami, Amir Hertz, Yael Vinker, Moab Arar, Shlomi Fruchter, Ohad Fried, Daniel Cohen-Or, Dani Lischinski. SIGGRAPH'24.
-
Cross-Image Attention for Zero-Shot Appearance Transfer. [project] [paper] [code]
Yuval Alaluf, Daniel Garibi, Or Patashnik, Hadar Averbuch-Elor, Daniel Cohen-Or. SIGGRAPH'24.
-
Kosmos-G: Generating Images in Context with Multimodal Large Language Models 🔥 [project] [paper] [code]
Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei. ICLR'24.
-
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning. [paper]
Jing Shi, Wei Xiong, Zhe Lin, Hyun Joon Jung. CVPR'24.
-
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs. [project] [paper]
Viraj Shah, Nataniel Ruiz, Forrester Cole, Erika Lu, Svetlana Lazebnik, Yuanzhen Li, Varun Jampani. Preprint 2023.
-
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models. 🔥 [project] [paper] [code]
Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, Wei Yang. Tech Report 2023.
-
Zero-shot spatial layout conditioning for text-to-image diffusion models.
Guillaume Couairon, Marlène Careil, Matthieu Cord, Stéphane Lathuilière, Jakob Verbeek. ICCV'23.
-
Controlling Text-to-Image Diffusion by Orthogonal Finetuning. [project] [paper] [code]
Zeju Qiu, Weiyang Liu, Haiwen Feng, Yuxuan Xue, Yao Feng, Zhen Liu, Dan Zhang, Adrian Weller, Bernhard Schölkopf. NeruIPS'23.
-
Face0: Instantaneously Conditioning a Text-to-Image Model on a Face. [paper]
Dani Valevski, Danny Wasserman, Yossi Matias, Yaniv Leviathan. SIGGRAPH Asia'23.
-
StyleDrop: Text-to-Image Generation in Any Style. 🔥 [project] [paper]
Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, Dilip Krishnan. NeurIPS'23.
-
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing. 🔥 [project] [paper] [code]
Dongxu Li, Junnan Li, Steven C.H. Hoi. NeurIPS'23.
-
Subject-driven Text-to-Image Generation via Apprenticeship Learning. [paper]
Wenhu Chen, Hexiang Hu, Yandong Li, Nataniel Ruiz, Xuhui Jia, Ming-Wei Chang, William W. Cohen. NeurIPS'23.
-
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models. 🔥 [paper] [code]
Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie. Tech Report 2023.
-
Adding Conditional Control to Text-to-Image Diffusion Models. 🔥 [paper] [code]
Lvmin Zhang, Anyi Rao, Maneesh Agrawala. ICCV'23.
-
GLIGEN: Open-Set Grounded Text-to-Image Generation. 🔥 [project] [paper] [code]
Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, Yong Jae Lee. CVPR'23.
-
Multi-Concept Customization of Text-to-Image Diffusion. [project] [paper] [code]
Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan Zhu. CVPR'23.
-
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. 🔥 [project] [paper]
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman. CVPR'23.
- Regional Prompter Set a prompt to a divided region.
-
Awesome-LLM-Reasoning Collection of papers and resources on Reasoning in Large Language Models.
-
Awesome-Controllable-T2I-Diffusion-Models A collection of resources on controllable generation with text-to-image diffusion models.
- Add a new paper or update an existing paper, thinking about which category the work should belong to.
- Use the same format as existing entries to describe the work.
- Add the abstract link of the paper (
/abs/
format if it is an arXiv publication).
Don't worry if you do something wrong, it will be fixed for you!