[1] Localization Distillation for Dense Object Detection(密集对象检测的定位蒸馏)
keywords: Bounding Box Regression, Localization Quality Estimation, Knowledge Distillation
paper | code
解读：南开程明明团队和天大提出LD：目标检测的定位蒸馏

视频目标检测(Video Object Detection)

[1] Unsupervised Activity Segmentation by Joint Representation Learning and Online Clustering(通过联合表示学习和在线聚类进行无监督活动分割)
paper | video

3D目标检测(3D object detection)

[2] A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation(在全景分割的指导下，用于基于 LiDAR 的 3D 对象检测的多功能多视图框架)
keywords: 3D Object Detection with Point-based Methods, 3D Object Detection with Grid-based Methods, Cluster-free 3D Panoptic Segmentation, CenterPoint 3D Object Detection
paper

[1] Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving(自动驾驶中用于单目 3D 目标检测的伪立体)
keywords: Autonomous Driving, Monocular 3D Object Detection
paper | code

人物交互检测(HOI Detection)

伪装目标检测(Camouflaged Object Detection)

旋转目标检测(Rotation Object Detection)

显著性检测(Saliency Object Detection)

图像异常检测(Anomally Detection in Image)

关键点检测(Keypoint Detection)

车道线检测(Lane Detection)

[1] Rethinking Efficient Lane Detection via Curve Modeling(通过曲线建模重新思考高效车道检测)
keywords: Segmentation-based Lane Detection, Point Detection-based Lane Detection, Curve-based Lane Detection, autonomous driving
paper | code

分割(Segmentation)

图像分割(Image Segmentation)

全景分割(Panoptic Segmentation)

[1] Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation(弯曲现实：适应全景语义分割的失真感知Transformer)
keywords: Semantic- and panoramic segmentation, Unsupervised domain adaptation, Transformer
paper | code

语义分割(Semantic Segmentation)

[2] ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation(让自我训练更好地用于半监督语义分割)
keywords: Semi-supervised learning, Semantic segmentation, Uncertainty estimation
paper | code

[1] Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation(弱监督语义分割的类重新激活图)
paper | code

实例分割(Instance Segmentation)

[2] Efficient Video Instance Segmentation via Tracklet Query and Proposal(通过 Tracklet Query 和 Proposal 进行高效的视频实例分割)
paper

[1] SoftGroup for 3D Instance Segmentation on Point Clouds(用于点云上的 3D 实例分割)
keywords: 3D Vision, Point Clouds, Instance Segmentation
paper | code

超像素(Superpixel)

视频目标分割(Video Object Segmentation)

抠图(Matting)

密集预测(Dense Prediction)

估计(Estimation)

姿态估计(Human Pose Estimation)

[2] Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation(学习用于多人姿势估计的局部-全局上下文适应)
keywords:Top-Down Pose Estimation(从上至下姿态估计), Limb-based Grouping, Direct Regression

paper

[1] MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video(用于视频中 3D 人体姿势估计的 Seq2seq 混合时空编码器)
keywords：3D Human Pose Estimation, Transformer
paper

手势估计(Gesture Estimation)

光流/位姿/运动估计(Optical Flow/Pose/Motion Estimation)

深度估计(Depth Estimation)

[5] ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks(立体匹配网络中自动避免捷径和域泛化的信息论方法)
keywords: Learning-based Stereo Matching Networks, Single Domain Generalization, Shortcut Learning
paper

[4] Attention Concatenation Volume for Accurate and Efficient Stereo Matching(用于精确和高效立体匹配的注意力连接体积)
keywords: Stereo Matching, cost volume construction, cost aggregation
paper | code

[3] Occlusion-Aware Cost Constructor for Light Field Depth Estimation(光场深度估计的遮挡感知成本构造函数)
paper | [code](https://github.com/YingqianWang/OACC- Net)

[2] NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation(用于单目深度估计的神经窗口全连接 CRF)
keywords: Neural CRFs for Monocular Depth
paper

[1] OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion(通过几何感知融合进行 360 度单目深度估计)
keywords: monocular depth estimation(单目深度估计),transformer
paper

图像处理(Image Processing)

超分辨率(Super Resolution)

[1] HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging(光谱压缩成像的高分辨率双域学习)
keywords: HSI Reconstruction, Self-Attention Mechanism, Image Frequency Spectrum Analysis
paper

图像复原/图像增强/图像重建(Image Restoration/Image Reconstruction)

[1] Event-based Video Reconstruction via Potential-assisted Spiking Neural Network(通过电位辅助尖峰神经网络进行基于事件的视频重建)
paper

图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)

图像去噪/去模糊/去雨去雾(Image Denoising)

[1] E-CIR: Event-Enhanced Continuous Intensity Recovery(事件增强的连续强度恢复)
keywords: Event-Enhanced Deblurring, Video Representation
paper | code

图像编辑/图像修复(Image Edit/Inpainting)

[2] HairCLIP: Design Your Hair by Text and Reference Image(通过文本和参考图像设计你的头发)
keywords: Language-Image Pre-Training (CLIP), Generative Adversarial Networks
paper | project

[1] Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding(增量transformer结构增强图像修复与掩蔽位置编码)
keywords: Image Inpainting, Transformer, Image Generation

paper | code

图像翻译(Image Translation)

[1] Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks(探索图像到图像翻译任务中对比学习的补丁语义关系)
keywords: image translation, knowledge transfer,Contrastive learning
paper

图像质量评估(Image Quality Assessment)

风格迁移(Style Transfer)

[1] CLIPstyler: Image Style Transfer with a Single Text Condition(具有单一文本条件的图像风格转移)
keywords: Style Transfer, Text-guided synthesis, Language-Image Pre-Training (CLIP)
paper

人脸(Face)

人脸识别/检测(Facial Recognition/Detection)

[1] An Efficient Training Approach for Very Large Scale Face Recognition(一种有效的超大规模人脸识别训练方法)
paper | code

人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

[1] Sparse to Dense Dynamic 3D Facial Expression Generation(稀疏到密集的动态 3D 面部表情生成)
keywords: Facial expression generation, 4D face generation, 3D face modeling
paper

人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)

[2] Voice-Face Homogeneity Tells Deepfake
paper | code

[1] Protecting Celebrities with Identity Consistency Transformer(使用身份一致性transformer保护名人)
paper

目标跟踪(Object Tracking)

[3] TCTrack: Temporal Contexts for Aerial Tracking(空中跟踪的时间上下文)
paper | code

[2] Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds(超越 3D 连体跟踪：点云中 3D 单对象跟踪的以运动为中心的范式)
keywords: Single Object Tracking, 3D Multi-object Tracking / Detection, Spatial-temporal Learning on Point Clouds
paper

[1] Correlation-Aware Deep Tracking(相关感知深度跟踪)
paper

图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding)

[1] BEVT: BERT Pretraining of Video Transformers(视频Transformer的 BERT 预训练)
keywords: Video understanding, Vision transformers, Self-supervised representation learning, BERT pretraining
paper | code

行为识别/动作识别/检测/分割/定位(Action/Activity Recognition)

[1] Colar: Effective and Efficient Online Action Detection by Consulting Exemplars(通过咨询示例进行有效且高效的在线动作检测)
keywords:Online action detection(在线动作检测)
paper

行人重识别/检测(Re-Identification/Detection)

图像/视频字幕(Image/Video Caption)

[1] X -Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning(使用 Transformer 进行 3D 密集字幕的跨模式知识迁移) keywords：Image Captioning and Dense Captioning(图像字幕/密集字幕)；Knowledge distillation(知识蒸馏)；Transformer；3D Vision(三维视觉)
paper

医学影像(Medical Imaging)

[1] Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations(时间上下文很重要：使用疾病进展表示增强单图像预测)
keywords: Self-supervised Transformer, Temporal modeling of disease progression
paper

文本检测/识别(Text Detection/Recognition)

遥感图像(Remote Sensing Image)

GAN/生成式/对抗式(GAN/Generative/Adversarial)

[1] Label-Only Model Inversion Attacks via Boundary Repulsion(通过边界排斥的仅标签模型反转攻击)
paper

图像生成/图像合成(Image Generation/Image Synthesis)

[4] 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces(基于小批量特征交换的三维形状变化自动编码器潜在解纠缠)
paper | code

[3] Interactive Image Synthesis with Panoptic Layout Generation(具有全景布局生成的交互式图像合成)
[paper])(https://arxiv.org/abs/2203.02104)

[2] Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values(极性采样：通过奇异值对预训练生成网络的质量和多样性控制)
paper | demo

[1] Autoregressive Image Generation using Residual Quantization(使用残差量化的自回归图像生成)
paper | code

视图合成(View Synthesis)

三维视觉(3D Vision)

[1] X -Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning(使用 Transformer 进行 3D 密集字幕的跨模式知识迁移) 关键词：图像字幕/密集字幕；知识蒸馏；Transformer；三维视觉
paper

点云(Point Cloud)

[2] A Unified Query-based Paradigm for Point Cloud Understanding(一种基于统一查询的点云理解范式)
paper

[1] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding(用于 3D 点云理解的自监督跨模态对比学习)
keywords: Self-Supervised Learning, Contrastive Learning, 3D Point Cloud, Representation Learning, Cross-Modal Learning
paper | code

三维重建(3D Reconstruction)

[1] H4D: Human 4D Modeling by Learning Neural Compositional Representation(通过学习神经组合表示进行人体 4D 建模)
keywords: 4D Representation(4D 表征),Human Body Estimation(人体姿态估计),Fine-grained Human Reconstruction(细粒度人体重建)

paper

场景重建/新视角合成(Novel View Synthesis)

[2] CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields(文本和图像驱动的神经辐射场操作)
keywords: NeRF, Image Generation and Manipulation, Language-Image Pre-Training (CLIP)
paper | code

[1] Point-NeRF: Point-based Neural Radiance Fields(基于点的神经辐射场)
paper | code | project

模型压缩(Model Compression)

知识蒸馏(Knowledge Distillation)

剪枝(Pruning)

量化(Quantization)

神经网络结构设计(Neural Network Structure Design)

[1] BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning(学习探索样本关系以进行鲁棒表征学习)
keywords: sample relationship, data scarcity learning, Contrastive Self-Supervised Learning, long-tailed recognition, zero-shot learning, domain generalization, self-supervised learning
paper | code

CNN

[1] A ConvNet for the 2020s
paper | code
解读：“文艺复兴” ConvNet卷土重来，压过Transformer！FAIR重新设计纯卷积新架构

Transformer

[1] Mobile-Former: Bridging MobileNet and Transformer(连接 MobileNet 和 Transformer)
keywords: Light-weight convolutional neural networks(轻量卷积神经网络),Combination of CNN and ViT
paper

图神经网络(GNN)

神经网络架构搜索(NAS)

[1] β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search(可微架构搜索的 Beta-Decay 正则化)
paper

MLP

[1] An Image Patch is a Wave: Quantum Inspired Vision MLP(图像补丁是波浪：量子启发的视觉 MLP)
paper | code | code

数据处理(Data Processing)

数据增广(Data Augmentation)

[1] 3D Common Corruptions and Data Augmentation(3D 常见损坏和数据增强)
keywords: Data Augmentation, Image restoration, Photorealistic image synthesis
paper | projecr

表征学习(Representation Learning)

归一化/正则化(Batch Normalization)

图像聚类(Image Clustering)

图像压缩(Image Compression)

异常检测(Anomaly Detection)

[1] Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection(用于异常检测的自监督预测卷积注意力块)(论文暂未上传)
paper | code

模型训练/泛化(Model Training/Generalization)

[3] CAFE: Learning to Condense Dataset by Aligning Features(通过对齐特征学习压缩数据集)
keywords: dataset condensation, coreset selection, generative models
paper | code

[2] The Devil is in the Margin: Margin-based Label Smoothing for Network Calibration(魔鬼在边缘：用于网络校准的基于边缘的标签平滑)
paper | code

[1] DN-DETR: Accelerate DETR Training by Introducing Query DeNoising(通过引入查询去噪加速 DETR 训练)
keywords: Detection Transformer
paper | code

噪声标签(Noisy Label)

长尾分布(Long-Tailed Distribution)

[1] Targeted Supervised Contrastive Learning for Long-Tailed Recognition(用于长尾识别的有针对性的监督对比学习)
keywords: Long-Tailed Recognition(长尾识别), Contrastive Learning(对比学习)
paper

模型评估(Model Evaluation)

多模态学习(Multi-Modal Learning)

视听学习(Audio-visual Learning)

视觉语言（Vision-language Representation Learning）

[3] HairCLIP: Design Your Hair by Text and Reference Image(通过文本和参考图像设计你的头发)
keywords: Language-Image Pre-Training (CLIP), Generative Adversarial Networks
paper | project

[2] CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields(文本和图像驱动的神经辐射场操作)
keywords: NeRF, Image Generation and Manipulation, Language-Image Pre-Training (CLIP)
paper | code

[1] Vision-Language Pre-Training with Triple Contrastive Learning(三重对比学习的视觉语言预训练)
keywords: Vision-language representation learning, Contrastive Learning paper | code

视觉预测(Vision-based Prediction)

数据集(Dataset)

主动学习(Active Learning)

小样本学习/零样本学习(Few-shot Learning/Zero-shot Learning)

持续学习(Continual Learning/Life-long Learning)

场景图(Scene Graph)

场景图生成(Scene Graph Generation)

[1] Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs(将视频场景图重新格式化为时间二分图)
keywords: Video Scene Graph Generation, Transformer, Video Grounding
paper | code

场景图预测(Scene Graph Prediction)

场景图理解(Scene Graph Understanding)

视觉定位(Visual Localization)

视觉推理/视觉问答(Visual Reasoning/VQA)

图像分类(Image Classification)

迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

[1] Weakly Supervised Object Localization as Domain Adaption(作为域适应的弱监督对象定位)
keywords: Weakly Supervised Object Localization(WSOL), Multi-instance learning based WSOL, Separated-structure based WSOL, Domain Adaption
paper | code

度量学习(Metric Learning)

[1] Enhancing Adversarial Robustness for Deep Metric Learning(增强深度度量学习的对抗鲁棒性)
keywords: Adversarial Attack, Adversarial Defense, Deep Metric Learning
paper

对比学习(Contrastive Learning)

[2] HCSC: Hierarchical Contrastive Selective Coding(分层对比选择性编码)
keywords: Self-supervised Representation Learning, Deep Clustering, Contrastive Learning
paper | code

[1] Crafting Better Contrastive Views for Siamese Representation Learning(为连体表示学习制作更好的对比视图)
paper | code

增量学习(Incremental Learning)

强化学习(Reinforcement Learning)

元学习(Meta Learning)

机器人(Robotic)

[1] IFOR: Iterative Flow Minimization for Robotic Object Rearrangement(IFOR：机器人对象重排的迭代流最小化)
paper | project

自监督学习/半监督学习

[2] Class-Aware Contrastive Semi-Supervised Learning(类感知对比半监督学习)
keywords: Semi-Supervised Learning, Self-Supervised Learning, Real-World Unlabeled Data Learning
paper

[1] A study on the distribution of social biases in self-supervised learning visual models(自监督学习视觉模型中social biases分布的研究)
paper

暂无分类

[2] Do Explanations Explain? Model Knows Best(解释解释吗？模型最清楚)
paper

[1] PINA: Learning a Personalized Implicit Neural Avatar from a Single RGB-D Video Sequence(PINA：从单个 RGB-D 视频序列中学习个性化的隐式神经化身)
paper | video | project

Files

CVPR2022.md

Latest commit

History