Deep Learning for Video Anomaly Detection: A Review

This is the official repository for the paper entitled "Deep Learning for Video Anomaly Detection: A Review".

📖 Table of contents

Existing Reviews
Our Taxonomy
Performance Comparison
Citation

Reviews

Reference	Year	Venue	Main Focus	Main Categorization	UVAD	WVAD	SVAD	FVAD	OVAD	LVAD	IVAD
Ramachandra et al.	2020	IEEE TPAMI	Semi-supervised single-scene VAD	Methodology	×	×	√	×	×	×	×
Santhosh et al.	2020	ACM CSUR	VAD applied on road traffic	Methodology	√	×	√	√	×	×	×
Nayak et al.	2021	IMAVIS	Deep learning driven semi-supervised VAD	Methodology	×	×	√	×	×	×	×
Tran et al.	2022	ACM CSUR	Semi&weakly supervised VAD	Architecture	×	×	√	×	×	×	×
Chandrakala et al.	2023	Artif. Intell. Rev.	Deep model-based one&two-class VAD	Methodology&Architecture	×	√	√	√	×	×	×
Liu et al.	2023	ACM CSUR	Deep models for semi&weakly supervised VAD	Model Input	√	√	√	√	×	×	×
Our survey	2024	-	Comprehensive VAD taxonomy and deep models	Methodology, Architecture, Refinement, Model Input, Model Output	√	√	√	√	√	√	√

UVAD=Unsupervised VAD, WVAD=Weakly supervised VAD, SVAD=Semi-supervised VAD, FVAD=Fully supervised VAD, OVAD=Open-set supervised VAD, LVAD: Large-model based VAD, IVAD: Interpretable VAD

Taxonomy

1. Semi-Supervised Video Anomaly Detection

1.1 Model Input

1.1.1 RGB

Frame-Level RGB

🗓️ 2016

📄 ConvAE:Learning temporal regularity in video sequences, 📰 CVPR code homepage

🗓️ 2017

📄 ConvLSTM-AE:Remembering history with convolutional LSTM for anomaly detection, 📰 ICCV code
📄 STAE: Spatio-temporal autoencoder for video anomaly detection, 📰 ACM MM
📄 AnomalyGAN: Abnormal event detection in videos using generative adversarial nets, 📰 ICIP

🗓️ 2019

📄 AMC: Anomaly detection in video sequence with appearance-motion correspondence, 📰 ICCV code

Patch-Level RGB

🗓️ 2015

📄 AMDN:Learning deep representations of appearance and motion for anomalous event detection, 📰 BMVC

🗓️ 2017

📄 AMDN2:Detecting anomalous events in videos by learning deep representations of appearance and motion, 📰 CVIU
📄 Deep-cascade:Deep-cascade: Cascading 3d deep neural networks for fast anomaly detection and localization in crowded scenes, 📰 TIP

🗓️ 2018

📄 S$^2$-VAE:Generative neural networks for anomaly detection in crowded scenes, 📰 TIFS

🗓️ 2019

📄 DeepOC:A deep one-class neural network for anomalous event detection in complex scenes, 📰 TNNLS

🗓️ 2020

📄 GM-VAE:Video anomaly detection and localization via gaussian mixture fully convolutional variational autoencoder, 📰 CVIU

Object-Level RGB

🗓️ 2017

📄 FRCN:Joint detection and recounting of abnormal events by learning deep generic knowledge, 📰 ICCV

🗓️ 2019

📄 ObjectAE:Object-centric auto-encoders and dummy anomalies for abnormal event detection in video, 📰 CVPR code

🗓️ 2021

📄 HF$^2$-VAD:A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction, 📰 ICCV code

🗓️ 2022

📄 HSNBM:Hierarchical scene normality-binding modeling for anomaly detection in surveillance videos, 📰 ACM MM code
📄 BDPN:Comprehensive regularization in a bi-directional predictive network for video anomaly detection, 📰 AAAI
📄 ER-VAD:Evidential reasoning for video anomaly detection, 📰 ACM MM

🗓️ 2023

📄 HSC:Hierarchical semantic contrast for scene-aware video anomaly detection, 📰 CVPRcode

1.1.2 Optical Flow

Frame Level

🗓️ 2018

📄 FuturePred:Future frame prediction for anomaly detection–a new baseline, 📰 CVPR code

🗓️ 2020

📄 FSCN:Fast sparse coding networks for anomaly detection in videos, 📰 PR code

🗓️ 2021
📄 F$^2$PN:Future frame prediction network for video anomaly detection, 📰 TPAMI code
📄 AMMC-Net:Appearance-motion memory consistency network for video anomaly detection, 📰 AAAI code

🗓️ 2022

📄 STA-Net:Learning task-specific representation for video anomaly detection with spatialtemporal attention, 📰 ICASSP

🗓️ 2023

📄 AMSRC:A video anomaly detection framework based on appearance-motion semantics representation consistency, 📰 ICASSP

Patch Level

🗓️ 2019

📄 DeepOC:A deep one-class neural network for anomalous event detection in complex scenes, 📰 TNNLS

🗓️ 2020

📄 ST-CaAE:Spatial-temporal cascade autoencoder for video anomaly detection in crowded scenes, 📰 TMM
📄 Siamese-Net:Learning a distance function with a siamese network to localize anomalies in videos, 📰 WACV

Object Level

🗓️ 2021

📄 HF$^2$-VAD:A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction, 📰 ICCV code

🗓️ 2022

📄 ER-VAD:Evidential reasoning for video anomaly detection, 📰 ACM MM
📄 Accurate-Interpretable-VAD:Attribute-based representations for accurate and interpretable video anomaly detection, 📰 Arxiv code

🗓️ 2023

📄 AMSRC:A video anomaly detection framework based on appearance-motion semantics representation consistency, 📰 ICASSP

1.1.3 Skeleton

🗓️ 2019

📄 MPED-RNN:Learning regularity in skeleton trajectories for anomaly detection in videos, 📰 CVPR code

🗓️ 2020

📄 GEPC:Graph embedded pose clustering for anomaly detection, 📰 CVPR code
📄 MTTP:Multi-timescale trajectory prediction for abnormal human activity detection, 📰 WACV homepage

🗓️ 2021

📄 NormalGraph:Normal graph: Spatial temporal graph convolutional networks based prediction network for skeleton based video anomaly detection, 📰 Neurocomputing
📄 HSTGCNN:A hierarchical spatio-temporal graph convolutional neural network for anomaly detection in videos, 📰 TCSVT code

🗓️ 2022

📄 TSIF:A two-stream information fusion approach to abnormal event detection in video, 📰 ICASSP
📄 STGCAE-LSTM:Human-related anomalous event detection via spatial-temporal graph convolutional autoencoder with embedded long short-term memory network, 📰 Neurocomputing
📄 STGformer:Hierarchical graph embedded pose regularity learning via spatiotemporal transformer for abnormal behavior detection, 📰 ACM MM

🗓️ 2023

📄 STG-NF:Normalizing flows for human pose anomaly detection, 📰 ICCV code
📄 MoPRL:Regularity learning via explicit distribution modeling for skeletal video anomaly detection, 📰 TCSVT
📄 MoCoDAD:Multimodal motion conditioned diffusion model for skeleton-based video anomaly detection, 📰 ICCV code

🗓️ 2024

📄 TrajREC:Holistic representation learning for multitask trajectory anomaly detection, 📰 WACV

1.1.4 Hybrid

🗓️ 2018

📄 FuturePred:Future frame prediction for anomaly detection–a new baseline, 📰 CVPR code

🗓️ 2019

📄 DeepOC:A deep one-class neural network for anomalous event detection in complex scenes, 📰 TNNLS

🗓️ 2021

📄 HF$^2$-VAD:A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction, 📰 ICCV code

🗓️ 2024

📄 EOGT:Eogt: Video anomaly detection with enhanced object information and global temporal dependency, 📰 TOMM

1.2 Methodology

1.2.1 Self-Supervised Learning

Reconstruction

🗓️ 2016

📄 ConvAE:Learning temporal regularity in video sequences, 📰 CVPR code homepage

🗓️ 2017

📄 ConvLSTM-AE:Remembering history with convolutional LSTM for anomaly detection, 📰 ICCV code

🗓️ 2018

📄 S$^2$-VAE:Generative neural networks for anomaly detection in crowded scenes, 📰 TIFS

🗓️ 2019

📄 AMC: Anomaly detection in video sequence with appearance-motion correspondence, 📰 ICCV code

🗓️ 2020

📄 ClusterAE:Clustering driven deep autoencoder for video anomaly detection, 📰 ECCV
📄 SIGnet:Anomaly detection with bidirectional consistency in videos, 📰 TNNLS

🗓️ 2021

📄 SSR-AE:Self-supervision-augmented deep autoencoder for unsupervised visual anomaly detection, 📰 TCYB

🗓️ 2023

📄 MoPRL:Regularity learning via explicit distribution modeling for skeletal video anomaly detection, 📰 TCSVT

Prediction

🗓️ 2018

📄 FuturePred:Future frame prediction for anomaly detection–a new baseline, 📰 CVPR code

🗓️ 2019

📄 Attention-driven-loss:Attention-driven loss for anomaly detection in video surveillance, 📰 TCSVT code

🗓️ 2020

📄 Multispace:Normality learning in multispace for video anomaly detection, 📰 TCSVT

🗓️ 2021

📄 HF$^2$-VAD:A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction, 📰 ICCV code
📄 AMMC-Net:Appearance-motion memory consistency network for video anomaly detection, 📰 AAAI code
📄 ROADMAP:Robust unsupervised video anomaly detection by multipath frame prediction, 📰 TNNLS
📄 AEP:Abnormal event detection and localization via adversarial event prediction, 📰 TNNLS

🗓️ 2022

📄 STGformer:Hierarchical graph embedded pose regularity learning via spatiotemporal transformer for abnormal behavior detection, 📰 ACM MM
📄 OGMRA:Object-guided and motion-refined attention network for video anomaly detection, 📰 ICME

🗓️ 2023

📄 STGCN:Spatial-temporal graph convolutional network boosted flow-frame prediction for video anomaly detection, 📰 ICASSP
📄 AMP-NET:Amp-net: Appearance-motion prototype network assisted automatic video anomaly detection system, 📰 TII

Visual Cloze Test

🗓️ 2020

📄 VEC:Cloze test helps: Effective video anomaly detection via learning to complete video events, 📰 ACM MM code

🗓️ 2023

📄 USTN-DSC:Video event restoration based on keyframes for video anomaly detection, 📰 CVPR
📄 VCC:Video anomaly detection via visual cloze tests, 📰 TIFS

Jigsaw Puzzles

🗓️ 2022

📄 STJP:Video anomaly detection by solving decoupled spatio-temporal jigsaw puzzles, 📰 ECCV code

🗓️ 2023

📄 MPT:Video anomaly detection via sequentially learning multiple pretext tasks, 📰 ICCV
📄 SSMTL++:Ssmtl++: Revisiting self-supervised multi-task learning for video anomaly detection, 📰 CVIU

Contrastive Learning

🗓️ 2020

📄 CAC:Cluster attention contrast for video anomaly detection, 📰 ACM MM

🗓️ 2021

📄 TAC-Net:Abnormal event detection using deep contrastive learning for intelligent video surveillance system, 📰 TII

🗓️ 2022

📄 LSH:Learnable locality-sensitive hashing for video anomaly detection, 📰 TCSVT

Denoising

🗓️ 2020

📄 Adv-AE:Adversarial 3d convolutional autoencoder for abnormal event detection in videos, 📰 TMM

🗓️ 2021

📄 NM-GAN:Nm-gan: Noise-modulated generative adversarial network for video anomaly detection, 📰 PR

Deep Sparse Coding

🗓️ 2017

📄 Stacked-RNN, A revisit of sparse coding based anomaly detection in stacked RNN framework📰 ICCV code

🗓️ 2019

📄 Anomalynet:Anomalynet: An anomaly detection network for video surveillance, 📰 TIFS code
📄 sRNN-AE:Video anomaly detection with sparse coding inspired deep neural networks, 📰 TPAMI code

🗓️ 2020

📄 FSCN:Fast sparse coding networks for anomaly detection in videos, 📰 PR code

Patch Inpainting

🗓️ 2021

📄 RIAD:Reconstruction by inpainting for visual anomaly detection, 📰 PR code

🗓️ 2022

📄 SSPCAB:Self-supervised predictive convolutional attentive block for anomaly detection, 📰 CVPR code

🗓️ 2023

📄 SSMCTB:Self-supervised masked convolutional transformer block for anomaly detection, 📰 TPAMI code

🗓️ 2024

📄 AED-MAE:Self-distilled masked auto-encoders are efficient video anomaly detectors, 📰 CVPR code

Multiple Task

🗓️ 2017

📄 STAE: Spatio-temporal autoencoder for video anomaly detection, 📰 ACM MM

🗓️ 2019

📄 MPED-RNN:Learning regularity in skeleton trajectories for anomaly detection in videos, 📰 CVPR
📄 AnoPCN:Anopcn: Video anomaly detection via deep predictive coding network, 📰 ACM MM

🗓️ 2021

📄 Multitask:Anomaly detection in video via self-supervised and multi-task learning, 📰 CVPR homepage

🗓️ 2022

📄 HSNBM:Hierarchical scene normality-binding modeling for anomaly detection in surveillance videos, 📰 ACM MM code
📄 LSH:Learnable locality-sensitive hashing for video anomaly detection, 📰 TCSVT
📄 AMAE:Appearance-motion united auto-encoder framework for video anomaly detection, 📰 TCAS-II
📄 STM-AE:Learning appearance-motion normality for video anomaly detection, 📰 ICME
📄 SSAGAN:Self-supervised attentive generative adversarial networks for video anomaly detection, 📰 TNNLS

🗓️ 2023

📄 MPT:Video anomaly detection via sequentially learning multiple pretext tasks, 📰 ICCV
📄 SSMTL++:Ssmtl++: Revisiting self-supervised multi-task learning for video anomaly detection, 📰 CVIU

🗓️ 2024

📄 MGSTRL:Multi-scale video anomaly detection by multi-grained spatiotemporal representation learning, 📰 CVPR

1.2.2 One-Class Learning

One-Class Classifier

🗓️ 2015

📄 AMDN:Learning deep representations of appearance and motion for anomalous event detection, 📰 BMVC

🗓️ 2018

📄 Deep SVDD:Deep one-class classification, 📰 PMLR code

🗓️ 2019

📄 DeepOC:A deep one-class neural network for anomalous event detection in complex scenes, 📰 TNNLS
📄 GODS:Gods: Generalized one-class discriminative subspaces for anomaly detection, 📰 ICCV

🗓️ 2021

📄 FCDD:Explainable deep one-class classification, 📰 ICLR code

Gaussian Classifier

🗓️ 2018

📄 Deep-anomaly:Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes, 📰 CVIU

🗓️ 2020

📄 GM-VAE:Video anomaly detection and localization via gaussian mixture fully convolutional variational autoencoder, 📰 CVIU

🗓️ 2021

📄 Deep-cascade:Deep-cascade: Cascading 3d deep neural networks for fast anomaly detection and localization in crowded scenes, 📰 TIP

Adversarial Classifier

🗓️ 2018

📄 ALOCC:Adversarially learned one-class classifier for novelty detection, 📰 CVPR code
📄 AVID:Avid: Adversarial visual irregularity detection, 📰 ACCV code

🗓️ 2020

📄 ALOCC2:Deep end-to-end one-class classifier, 📰 TNNLS
📄 OGNet:Old is gold: Redefining the adversarially learned one-class classifier training paradigm, 📰 CVPR code

🗓️ 2022

📄 OGNet+:Stabilizing adversarially learned one-class novelty detection using pseudo anomalies, 📰 TIP

1.2.3 Interpretable Learning

🗓️ 2017

📄 FRCN:Joint detection and recounting of abnormal events by learning deep generic knowledge, 📰 ICCV

🗓️ 2022

📄 Accurate-Interpretable-VAD:Attribute-based representations for accurate and interpretable video anomaly detection, 📰 Arxiv code

🗓️ 2023

📄 InterVAD:Towards interpretable video anomaly detection, 📰 WACV
📄 EVAL:Eval: Explainable video anomaly localization, 📰 CVPR

🗓️ 2024

📄 AnomalyRuler:Follow the rules: Reasoning for video anomaly detection with large language models, 📰 ECCV code

1.3 Network Architecture

1.3.1 Auto-Encoder

🗓️ 2016

📄 Conv-LSTM:Anomaly detection in video using predictive convolutional long short-term memory networks, 📰 Arxiv

🗓️ 2017

📄 STAE: Spatio-temporal autoencoder for video anomaly detection, 📰 ACM MM
📄 ConvLSTM-AE:Remembering history with convolutional LSTM for anomaly detection, 📰 ICCV code

🗓️ 2019

📄 DeepOC:A deep one-class neural network for anomalous event detection in complex scenes, 📰 TNNLS
📄 sRNN-AE:Video anomaly detection with sparse coding inspired deep neural networks, 📰 TPAMI
📄 MPED-RNN:Learning regularity in skeleton trajectories for anomaly detection in videos, 📰 CVPR

🗓️ 2021

📄 NormalGraph:Normal graph: Spatial temporal graph convolutional networks based prediction network for skeleton based video anomaly detection, 📰 Neurocomputing

🗓️ 2022

📄 STGCAE-LSTM:Human-related anomalous event detection via spatial-temporal graph convolutional autoencoder with embedded long short-term memory network, 📰 Neurocomputing

🗓️ 2023

📄 USTN-DSC:Video event restoration based on keyframes for video anomaly detection, 📰 CVPR

🗓️ 2024

📄 AED-MAE:Self-distilled masked auto-encoders are efficient video anomaly detectors, 📰 CVPR code

1.3.2 GAN

🗓️ 2018

📄 FuturePred:Future frame prediction for anomaly detection–a new baseline, 📰 CVPR code
📄 ALOCC:Adversarially learned one-class classifier for novelty detection, 📰 CVPR code

🗓️ 2019

📄 AD-VAD:Training adversarial discriminators for cross-channel abnormal event detection in crowds, 📰 WACV
📄 VAD-GAN:Robust anomaly detection in videos using multilevel representations, 📰 AAAI code
📄 Ada-Net:Learning normal patterns via adversarial attention-based autoencoder for abnormal event detection in videos, 📰 TMM

🗓️ 2020

📄 OGNet:Old is gold: Redefining the adversarially learned one-class classifier training paradigm, 📰 CVPR code

🗓️ 2021

📄 CT-D2GAN:Convolutional transformer based dual discriminator generative adversarial networks for video anomaly detection, 📰 ACM MM

1.3.3 Diffusion

🗓️ 2023

📄 FPDM:Feature prediction diffusion model for video anomaly detection, 📰 ICCV
📄 MoCoDAD:Multimodal motion conditioned diffusion model for skeleton-based video anomaly detection, 📰 ICCV code

1.4 Model Refinement

1.4.1 Pseudo Anomalies

🗓️ 2021

📄 LNRA:Learning not to reconstruct anomalies, 📰 BMVC code
📄 G2D:G2d: Generate to detect anomaly, 📰 WACV code
📄 BAF:A background-agnostic framework with adversarial training for abnormal event detection in video, 📰 TPAMI code

🗓️ 2022

📄 OGNet+:Stabilizing adversarially learned one-class novelty detection using pseudo anomalies, 📰 TIP
📄 MBPA:Limiting reconstruction capability of autoencoders using moving backward pseudo anomalies, 📰 UR

🗓️ 2023

📄 DSS-NET:Dss-net: Dynamic self-supervised network for video anomaly detection, 📰 TMM
📄 PseudoBound:Pseudobound: Limiting the anomaly reconstruction capability of one-class classifiers using pseudo anomalies, 📰 Neurocomputing
📄 PFMF:Generating anomalies for video anomaly detection with prompt-based feature mapping, 📰 CVPR

1.4.2 Memory Bank

🗓️ 2019

📄 MemAE: Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection, 📰 ICCV code

🗓️ 2020

📄 MNAD:Learning memory-guided normality for anomaly detection, 📰 CVPR code homepage

🗓️ 2021

📄 MPN:Learning normal dynamics in videos with meta prototype network, 📰 CVPR code

🗓️ 2022

📄 EPAP-Net:Anomaly warning: Learning and memorizing future semantic patterns for unsupervised ex-ante potential anomaly prediction, 📰 ACM MM
📄 CAFE:Effective video abnormal event detection by learning a consistency-aware high-level feature extractor, 📰 ACM MM
📄 DLAN-AC:Dynamic local aggregation network with adaptive clusterer for anomaly detection, 📰 ECCV code

🗓️ 2023

📄 DMAD:Diversity-measurable anomaly detection, 📰 CVPR code
📄 SVN:Stochastic video normality network for abnormal event detection in surveillance videos, 📰 KBS
📄 LERF:Learning event-relevant factors for video anomaly detection, 📰 AAAI
📄 MAAM-Net:Memory-augmented appearance-motion network for video anomaly detection, 📰 PR

🗓️ 2024

📄 STU-Net:Context recovery and knowledge retrieval: A novel two-stream framework for video anomaly detection, 📰 TIP homepage

1.5 Model Output

1.5.1 Frame Level

1.5.2 Pixel Level

🗓️ 2022

📄 UPformer:Pixel-level anomaly detection via uncertainty-aware prototypical transformer, 📰 ACM MM

2. Weakly Supervised Video Anomaly Detection

🗓️ 2018

📄 DeepMIL: Real-world anomaly detectionin surveillance videos, 📰 CVPR code homepage

2.1 Model Input

2.1.1 RGB

🗓️ 2018

📄 DeepMIL: Real-world anomaly detectionin surveillance videos, 📰 CVPR code1 code2 homepage

🗓️ 2019

📄 GCN:Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection, 📰 CVPR code

🗓️ 2020

📄 CLAWS: Claws: Clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection, 📰 ECCV code
📄 HLNet:Not only look, but also listen: Learning multimodal violence detection under weak supervision, 📰 ECCV code homepage

🗓️ 2022

📄 S3R:Self-supervised sparse representation for video anomaly detection, 📰 ECCV code
📄 GCN+:Weakly-supervised anomaly detection in video surveillance via graph convolutional label noise cleaning, 📰 Neurocomputing
📄 MSL:Self-training multi-sequence learning with transformer for weakly supervised video anomaly detection, 📰 AAAI

🗓️ 2023

📄 BN-WVAD:Batchnorm-based weakly supervised video anomaly detection, 📰 Arxiv code
📄 LSTC:Long-short temporal co-teaching for weakly supervised video anomaly detection, 📰 ICME code

🗓️ 2024

📄 AlMarri Salem et al.: A multi-head approach with shuffled segments for weakly-supervised video anomaly detection, 📰 WACV
📄 OVVAD:Open-vocabulary video anomaly detection, 📰 CVPR

2.1.2 Optical Flow

🗓️ 2019

📄 GCN:Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection, 📰 CVPR code

🗓️ 2020

📄 AR-NET:Weakly supervised video anomaly detection via center-guided discriminative learning, 📰 ICME code

2.1.3 Audio

🗓️ 2021

📄 FVAL:Violence detection in videos based on fusing visual and audio information, 📰 ICASSP

🗓️ 2023

📄 HyperVD:Learning weakly supervised audio-visual violence detection in hyperbolic space, 📰 Arxiv code

2.1.4 Text

🗓️ 2023

📄 PEL4VAD:Learning prompt-enhanced context features for weakly-supervised video anomaly detection, 📰 Arxiv code
📄 TEVAD:Tevad: Improved video anomaly detection with captions, 📰 CVPRW code

🗓️ 2024

📄 LAP:Learn suspected anomalies from event prompts for video anomaly detection, 📰 Arxiv
📄 ALAN:Toward video anomaly retrieval from video anomaly detection: New benchmarks and model, 📰 TIP

2.1.5 Hybrid

🗓️ 2020

📄 AR-NET:Weakly supervised video anomaly detection via center-guided discriminative learning, 📰 ICME code

🗓️ 2022

📄 ACF_MMVD:Look, listen and pay more attention: Fusing multi-modal information for video violence detection, 📰 ICASSP code
📄 MSFA:Msaf: Multimodal supervise-attention enhanced fusion for video anomaly detection, 📰 SPL homepage
📄 MACIL_SD:Modality-aware contrastive instance learning with self-distillation for weakly-supervised audio-visual violence detection, 📰 ACM MM code
📄 HL-Net+:Weakly supervised audio-visual violence detection, 📰 TMM

🗓️ 2024

📄 UCA:Towards surveillance video-and-language understanding: New dataset baselines and challenges, 📰 CVPR homepage

2.2 Methodology

2.2.1 One-Stage MIL

🗓️ 2018

📄 DeepMIL: Real-world anomaly detectionin surveillance videos, 📰 CVPR code1 code2 homepage

🗓️ 2019

📄 MAF:Motion-aware feature for improved video anomaly detection 📰 BMVC
📄 TCN-IBL:Temporal convolutional network with complementary inner bag loss for weakly supervised anomaly detection, 📰 ICIP

🗓️ 2020

📄 HLNet:Not only look, but also listen: Learning multimodal violence detection under weak supervision, 📰 ECCV code

🗓️ 2022

📄 CNL:Collaborative normality learning framework for weakly supervised video anomaly detection, 📰 TCAS-II

2.2.2 Two-Stage Self-Training

🗓️ 2019

📄 GCN:Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection, 📰 CVPR

🗓️ 2021

📄 MIST:Mist: Multiple instance self-training framework for video anomaly detection, 📰 CVPR code homepage

🗓️ 2022

📄 MSL:Self-training multi-sequence learning with transformer for weakly supervised video anomaly detection, 📰 AAAI

🗓️ 2023

📄 CUPL:Exploiting completeness and uncertainty of pseudo labels for weakly supervised video anomaly detection, 📰 CVPR code

🗓️ 2024

📄 TPWNG:Text prompt with normality guidance for weakly supervised video anomaly detection, 📰 CVPR

2.3 Refinement Strategy

2.3.1 Temporal Modeling

🗓️ 2020

📄 HLNet:Not only look, but also listen: Learning multimodal violence detection under weak supervision, 📰 ECCV code

🗓️ 2021

📄 CTR:Learning causal temporal relation and feature discrimination for anomaly detection, 📰 TIP
📄 RTFM:Weakly-supervised video anomaly detection with robust temporal feature magnitude learning, 📰 ICCV code
📄 CA-Net:Contrastive attention for video anomaly detection, 📰 TMM code
📄 CRF:Dance with self-attention: A new look of conditional random fields on anomaly detection in videos, 📰 ICCV

🗓️ 2022

📄 MSL:Self-training multi-sequence learning with transformer for weakly supervised video anomaly detection, 📰 AAAI
📄 DAR:Decouple and resolve: transformer-based models for online anomaly detection from weakly labeled videos, 📰 TIFS
📄 WAGCN:Adaptive graph convolutional networks for weakly supervised anomaly detection in videos, 📰 SPL
📄 SGTDT:Weakly supervised video anomaly detection via self-guided temporal discriminative transformer, 📰 TCYB
📄 MLAD:Weakly supervised anomaly detection in videos considering the openness of events, 📰 TITS

🗓️ 2023

📄 CMRL: Look around for anomalies: weakly-supervised anomaly detection via context-motion relational learning, 📰 CVPR
📄 CBCG:Weakly supervised video anomaly detection based on cross-batch clustering guidance, 📰 ICME
📄 DMU:Dual memory units with uncertainty regulation for weakly supervised video anomaly detection, 📰 AAAI code

2.3.2 Spatio-Temporal Modeling

🗓️ 2022

📄 STA-Net:Learning task-specific representation for video anomaly detection with spatialtemporal attention, 📰 ICASSP
📄 SSRL:Scale-aware spatio-temporal relation learning for video anomaly detection, 📰 ECCV

🗓️ 2023

📄 LSTC:Long-short temporal co-teaching for weakly supervised video anomaly detection, 📰 ICME code

🗓️ 2024

📄 MSIP: Learning spatio-temporal relations with multi-scale integrated perception for video anomaly detection, 📰 ICASSP

2.3.3 MIL-Based Refinement

🗓️ 2019

📄 Social-MIL:Social mil: Interaction-aware for crowd anomaly detection, 📰 AVSS

🗓️ 2022

📄 MCR:Multiscale continuity-aware refinement network for weakly supervised video anomaly detection, 📰 ICME
📄 BN-SVP:Bayesian nonparametric submodular video partition for robust anomaly detection, 📰 CVPR code

🗓️ 2023

📄 NGMIL:Normality guided multiple instance learning for weakly supervised video anomaly detection, 📰 WACV
📄 UMIL:Unbiased multiple instance learning for weakly supervised video anomaly detection, 📰 CVPR code
📄 MGFN:Mgfn: Magnitude-contrastive glance-and-focus network for weakly-supervised video anomaly detection, 📰 AAAI code

🗓️ 2024

📄 LAP:Learn suspected anomalies from event prompts for video anomaly detection, 📰 Arxiv
📄 PE-MIL: Prompt-enhanced multiple instance learning for weakly supervised video anomaly detection, 📰 CVPR

2.3.4 Feature Metric Learning

🗓️ 2019

📄 TCN-IBL:Temporal convolutional network with complementary inner bag loss for weakly supervised anomaly detection, 📰 ICIP

🗓️ 2021

📄 CTR:Learning causal temporal relation and feature discrimination for anomaly detection, 📰 TIP

🗓️ 2022

📄 SGTDT:Weakly supervised video anomaly detection via self-guided temporal discriminative transformer, 📰 TCYB

🗓️ 2023

📄 BN-WVAD:Batchnorm-based weakly supervised video anomaly detection, 📰 Arxiv code
📄 PEL4VAD:Learning prompt-enhanced context features for weakly-supervised video anomaly detection, 📰 Arxiv code
📄 TeD-SPAD:Ted-spad: Temporal distinctiveness for self-supervised privacy-preservation for video anomaly detection, 📰 ICCV code
📄 CLAWS+:Clustering aided weakly supervised training to detect anomalous events in surveillance videos, 📰 TNNLS

🗓️ 2024

📄 LAP:Learn suspected anomalies from event prompts for video anomaly detection, 📰 Arxiv

2.3.5 Knowledge Distillation

🗓️ 2022

📄 MACIL-SD:Modality-aware contrastive instance learning with self-distillation for weakly-supervised audio-visual violence detection, 📰 ACM MM code

🗓️ 2023

📄 DPK:Distilling privileged knowledge for anomalous event detection from weakly labeled videos, 📰 TNNLS

2.3.6 Leveraging Large Models:

🗓️ 2023

📄 TEVAD:Tevad: Improved video anomaly detection with captions, 📰 CVPRW
📄 CLIP-TSA:Clip-tsa: Clip-assisted temporal self-attention for weakly-supervised video anomaly detection, 📰 ICIP code

🗓️ 2024

📄 UCA:Towards surveillance video-and-language understanding: New dataset baselines and challenges, 📰 CVPR homepage
📄 VadCLIP:Vadclip: Adapting vision-language models for weakly supervised video anomaly detection, 📰 AAAI code
📄 Holmes-VAD:Holmes-vad: Towards unbiased and explainable video anomaly detection via multi-modal llm, 📰 Arxiv code homepage
📄 VADor w LSTC:Video anomaly detection and explanation via large language models, 📰 Arxiv
📄 LAVAD: Harnessing large language models for training-free video anomaly detection, 📰 CVPR code homepage
📄 STPrompt:Weakly supervised video anomaly detection and localization with spatio-temporal prompts, 📰 ACM MM

2.4 Model Output

2.4.1 Frame Level

2.4.2 Pixel Level

🗓️ 2019

📄 Background-bias:Exploring background-bias for anomaly detection in surveillance videos, 📰 ACM MM code

🗓️ 2021

📄 WSSTAD:Weakly-supervised spatio-temporal anomaly detection in surveillance video, 📰 IJCAI

3. Fully Supervised Video Anomaly Detection

3.1 Appearance Input

🗓️ 2016

📄 TS-LSTM:Multi-stream deep networks for person to person violence detection in videos, 📰 CCPR

🗓️ 2017

📄 FightNet:Violent interaction detection in video based on deep learning, 📰 JPCS

🗓️ 2019

📄 Sub-Vio:Toward subjective violence detection in videos, 📰 ICASSP
📄 CCTV-Fights:Detection of real-world fights in surveillance videos, 📰 ICASSP homepage

3.2 Motion Input

🗓️ 2016

📄 TS-LSTM:Multi-stream deep networks for person to person violence detection in videos, 📰 CCPR

🗓️ 2017

📄 ConvLSTM:Learning to detect violent videos using convolutional long short-term memory, 📰 AVSS code

🗓️ 2018

📄 BiConvLSTM:Bidirectional convolutional lstm for the detection of violence in videos, 📰 ECCVW

🗓️ 2020

📄 MM-VD:Multimodal violence detection in videos, 📰 ICASSP

3.3 Skeleton Input

🗓️ 2018

📄 DSS:Eye in the sky: Real-time drone surveillance system for violent individuals identification using scatternet hybrid deep learning network, 📰 CVPRW

🗓️ 2020
📄 SPIL:Human interaction learning on 3d skeleton point clouds for video violence recognition, 📰 ECCV

3.4 Audio Input

🗓️ 2020

📄 MM-VD:Multimodal violence detection in videos, 📰 ICASSP

3.5 Hybrid Input

🗓️ 2021

📄 FlowGatedNet:Rwf-2000: an open large scale video database for violence detection, 📰 ICPR code

🗓️ 2022

📄 MutualDis:Multimodal violent video recognition based on mutual distillation, 📰 PRCV

🗓️ 2023

📄 HSCD: Human skeletons and change detection for efficient violence detection in surveillance videos, 📰 CVIU code

4. Unsupervised Video Anomaly Detection

4.1 Pseudo Label Based Paradigm

🗓️ 2018

📄 DAW:Detecting abnormality without knowing normality: A two-stage approach for unsupervised video abnormal event detection, 📰 ACM MM

🗓️ 2020

📄 STDOR:Self-trained deep ordinal regression for end-to-end video anomaly detection, 📰 CVPR

🗓️ 2022

📄 GCL:Generative cooperative learning for unsupervised video anomaly detection, 📰 CVPR

🗓️ 2024

📄 C2FPL:A coarse-to-fine pseudo-labeling (c2fpl) framework for unsupervised video anomaly detection, 📰 WACV code

4.2 Change Detection Based Paradigm

🗓️ 2016

📄 ADF:A discriminative framework for anomaly detection in large videos, 📰 ECCV code

🗓️ 2017

📄 Unmasking:Unmasking the abnormal events in video, 📰 ICCV

🗓️ 2018

📄 MC2ST:Classifier two sample test for video anomaly detections, 📰 BMVC code

🗓️ 2022

📄 TMAE:Detecting anomalous events from unlabeled videos via temporal masked autoencoding, 📰 ICME

4.3 Others

🗓️ 2021

📄 DUAD:Deep unsupervised anomaly detection, 📰 WACV

🗓️ 2022

📄 CIL:A causal inference look at unsupervised video anomaly detection, 📰 AAAI
📄 LBR-SPR:Deep anomaly discovery from unlabeled videos via normality advantage and self-paced refinement, 📰 CVPR code

5. Open Set Supervised Video Anomaly Detection

5.1 Open-Set VAD

🗓️ 2019

📄 MLEP:Margin learning embedded prediction for video anomaly detection with a few anomalies, 📰 IJCAI code

🗓️ 2022

📄 UBnormal:Ubnormal: New benchmark for supervised open-set video anomaly detection, 📰 CVPR code
📄 OSVAD:Towards open set video anomaly detection, 📰 ECCV

🗓️ 2024

📄 OVVAD:Open-vocabulary video anomaly detection, 📰 CVPR

5.2 Few-Shot VAD

🗓️ 2020

📄 FSSA:Few-shot scene-adaptive anomaly detection, 📰 ECCV code

🗓️ 2021

📄 AADNet:Adaptive anomaly detection network for unseen scene without fine-tuning, 📰 PRCV

🗓️ 2022

📄 VADNet:Boosting variational inference with margin learning for few-shot scene-adaptive anomaly detection, 📰 TCSVT code

🗓️ 2023

📄 zxVAD:Cross-domain video anomaly detection without target domain adaptation, 📰 WACV

Performance Comparison

The following tables are the performance comparison of semi-supervised VAD, weakly supervised VAD, fully supervised VAD, and unsupervised VAD methods as reported in the literature. For semi-supervised, weakly supervised, and unsupervised VAD methods, the evaluation metric used is AUC (%) and AP ( XD-Violence, %), while for fully supervised VAD methods, the metric is Accuracy (%).

Quantitative Performance Comparison of Semi-supervised Methods on Public Datasets.

Method	Publication	Methodology	Ped1	Ped2	Avenue	ShanghaiTech	UBnormal
AMDN	BMVC 2015	One-class classifier	92.1	90.8	-	-	-
ConvAE	CVPR 2016	Reconstruction	81.0	90.0	72.0	-	-
STAE	ACMMM 2017	Hybrid	92.3	91.2	80.9	-	-
StackRNN	ICCV 2017	Sparse coding	-	92.2	81.7	68.0	-
FuturePred	CVPR 2018	Prediction	83.1	95.4	85.1	72.8	-
DeepOC	TNNLS 2019	One-class classifier	83.5	96.9	86.6	-	-
MemAE	ICCV 2019	Reconstruction	-	94.1	83.3	71.2	-
AnoPCN	ACMMM 2019	Prediction	-	96.8	86.2	73.6	-
ObjectAE	CVPR 2019	One-class classifier	-	97.8	90.4	84.9	-
BMAN	TIP 2019	Prediction	-	96.6	90.0	76.2	-
sRNN-AE	TPAMI 2019	Sparse coding	-	92.2	83.5	69.6	-
ClusterAE	ECCV 2020	Reconstruction	-	96.5	86.0	73.3	-
MNAD	CVPR 2020	Reconstruction	-	97.0	88.5	70.5	-
VEC	ACMMM 2020	Cloze test	-	97.3	90.2	74.8	-
AMMC-Net	AAAI 2021	Prediction	-	96.6	86.6	73.7	-
MPN	CVPR 2021	Prediction	85.1	96.9	89.5	73.8	-
HF$^2$-VAD	ICCV 2021	Hybrid	-	99.3	91.1	76.2	-
BAF	TPAMI 2021	One-class classifier		98.7	92.3	82.7	59.3
Multitask	CVPR 2021	Multiple tasks	-	99.8	92.8	90.2	-
F$^2$PN	TPAMI 2022	Prediction	84.3	96.2	85.7	73.0	-
DLAN-AC	ECCV 2022	Reconstruction	-	97.6	89.9	74.7	-
BDPN	AAAI 2022	Prediction	-	98.3	90.3	78.1	-
CAFÉ	ACMMM 2022	Prediction	-	98.4	92.6	77.0	-
STJP	ECCV 2022	Jigsaw puzzle	-	99.0	92.2	84.3	56.4
MPT	ICCV 2023	Multiple tasks	-	97.6	90.9	78.8	-
HSC	CVPR 2023	Hybrid	-	98.1	93.7	83.4	-
LERF	AAAI 2023	Predicition	-	99.4	91.5	78.6	-
DMAD	CVPR 2023	Reconstruction	-	99.7	92.8	78.8	-
EVAL	CVPR 2023	Interpretable learning	-	-	86.0	76.6	-
FBSC-AE	CVPR 2023	Prediction	-	-	86.8	79.2	-
FPDM	ICCV 2023	Prediction	-	-	90.1	78.6	62.7
PFMF	CVPR 2023	Multiple tasks	-	-	93.6	85.0	-
STG-NF	ICCV 2023	Gaussian classifier	-	-	-	85.9	71.8
AED-MAE	CVPR 2024	Patch inpainting	-	95.4	91.3	79.1	58.5
SSMCTB	TPAMI 2024	Patch inpainting	-	-	91.6	83.7	-

Quantitative Performance Comparison of Weakly Supervised Methods on Public Datasets.

Method	Publication	Feature	UCF-Crime	XD-Violence	ShanghaiTech	TAD
DeepMIL	CVPR 2018	C3D(RGB)	75.40	-	-	-
GCN	CVPR 2019	TSN(RGB)	82.12	-	84.44	-
HLNet	ECCV 2020	I3D(RGB)	82.44	75.41	-	-
CLAWS	ECCV 2020	C3D(RGB)	83.03	-	89.67	-
MIST	CVPR 2021	I3D(RGB)	82.30	-	94.83	-
RTFM	ICCV 2021	I3D(RGB)	84.30	77.81	97.21	-
CTR	TIP 2021	I3D(RGB)	84.89	75.90	97.48	-
MSL	AAAI 2022	VideoSwin(RGB)	85.62	78.59	97.32	-
S3R	ECCV 2022	I3D(RGB)	85.99	80.26	97.48	-
SSRL	ECCV 2022	I3D(RGB)	87.43	-	97.98	-
CMRL	CVPR 2023	I3D(RGB)	86.10	81.30	97.60	-
CUPL	CVPR 2023	I3D(RGB)	86.22	81.43	-	91.66
MGFN	AAAI 2023	VideoSwin(RGB)	86.67	80.11	-	-
UMIL	CVPR 2023	CLIP	86.75	-	-	92.93
DMU	AAAI 2023	I3D(RGB)	86.97	81.66	-	-
PE-MIL	CVPR 2024	I3D(RGB)	86.83	88.05	98.35	-
TPWNG	CVPR 2024	CLIP	87.79	83.68	-	-
VadCLIP	AAAI 2024	CLIP	88.02	84.51	-	-
STPrompt	ACMMM 2024	CLIP	88.08	-	97.81	-

Quantitative Performance Comparison of Fully Supervised Methods on Public Datasets.

Method	Publication	Model Input	Hockey Fights	Violent-Flows	RWF-2000	Crowed Violence
TS-LSTM	PR 2016	RGB+Flow	93.9	-	-	-
FightNet	JPCS 2017	RGB+Flow	97.0	-	-	-
ConvLSTM	AVSS 2017	Frame Difference	97.1	94.6	-	-
BiConvLSTM	ECCVW 2018	Frame Difference	98.1	96.3	-	-
SPIL	ECCV 2020	Skeleton	96.8	-	89.3	94.5
FlowGatedNet	ICPR 2020	RGB+Flow	98.0	-	87.3	88.9
X3D	AVSS 2022	RGB	-	98.0	94.0	-
HSCD	CVIU 2023	Skeleton+Frame Difference	94.5	-	90.3	94.3

Quantitative Performance Comparison of Unsupervised Methods on Public Datasets.

Method	Publication	Methodology	Avenue	Subway Exit	Ped1	Ped2	ShaihaiTech	UMN
ADF	ECCV 2016	Change detection	78.3	82.4	-	-	-	91.0
Unmasking	ICCV 2017	Change detection	80.6	86.3	68.4	82.2	-	95.1
MC2ST	BMVC 2018	Change detection	84.4	93.1	71.8	87.5	-	-
DAW	ACMMM 2018	Pseudo label	85.3	84.5	77.8	96.4	-	-
STDOR	CVPR 2020	Pseudo label	-	92.7	71.7	83.2	-	97.4
TMAE	ICME 2022	Change detection	89.8	-	75.7	94.1	71.4	-
CIL	AAAI 2022	Others	90.3	97.6	84.9	99.4	-	100
LBR-SPR	CVPR 2022	Others	92.8	-	81.1	97.2	72.6	-

Citation

If you find our work useful, please cite our paper:

@article{wu2024deep,
  title={Deep Learning for Video Anomaly Detection: A Review},
  author={Wu, Peng and Pan, Chengyu and Yan, Yuting and Pang, Guansong and Wang, Peng and Zhang, Yanning},
  journal={arXiv preprint arXiv:xxxxx},
  year={2024}
}

Files

reference.md

Latest commit

History

reference.md

File metadata and controls

Deep Learning for Video Anomaly Detection: A Review

📖 Table of contents

Reviews

Taxonomy

1. Semi-Supervised Video Anomaly Detection

1.1 Model Input

1.1.1 RGB

1.1.2 Optical Flow

1.1.3 Skeleton

1.1.4 Hybrid

1.2 Methodology

1.2.1 Self-Supervised Learning

1.2.2 One-Class Learning

1.2.3 Interpretable Learning

1.3 Network Architecture

1.3.1 Auto-Encoder

1.3.2 GAN

1.3.3 Diffusion

1.4 Model Refinement

1.4.1 Pseudo Anomalies

1.4.2 Memory Bank

1.5 Model Output

1.5.1 Frame Level

1.5.2 Pixel Level

2. Weakly Supervised Video Anomaly Detection

2.1 Model Input

2.1.1 RGB

2.1.2 Optical Flow

2.1.3 Audio

2.1.4 Text

2.1.5 Hybrid

2.2 Methodology

2.2.1 One-Stage MIL

2.2.2 Two-Stage Self-Training

2.3 Refinement Strategy

2.3.1 Temporal Modeling

2.3.2 Spatio-Temporal Modeling

2.3.3 MIL-Based Refinement

2.3.4 Feature Metric Learning

2.3.5 Knowledge Distillation

2.3.6 Leveraging Large Models:

2.4 Model Output

2.4.1 Frame Level

2.4.2 Pixel Level

3. Fully Supervised Video Anomaly Detection

3.1 Appearance Input

3.2 Motion Input

3.3 Skeleton Input

3.4 Audio Input

3.5 Hybrid Input

4. Unsupervised Video Anomaly Detection

4.1 Pseudo Label Based Paradigm

4.2 Change Detection Based Paradigm

4.3 Others

5. Open Set Supervised Video Anomaly Detection

5.1 Open-Set VAD

5.2 Few-Shot VAD

Performance Comparison

Citation