If you find our code or paper useful, please cite as
@article{seenivasan2022global,
title={Global-Reasoned Multi-Task Learning Model for Surgical Scene Understanding},
author={Seenivasan, Lalithkumar and Mitheran, Sai and Islam, Mobarakol and Ren, Hongliang},
journal={IEEE Robotics and Automation Letters},
year={2022},
publisher={IEEE}
}
Global and local relational reasoning enable scene understanding models to perform human-like scene analysis and understanding. Scene understanding enables better semantic segmentation and object-to-object interaction detection. In the medical domain, a robust surgical scene understanding model allows the automation of surgical skill evaluation, real-time monitoring of surgeon’s performance and post-surgical analysis. This paper introduces a globally-reasoned multi-task surgical scene understanding model capable of performing instrument segmentation and tool-tissue interaction detection. Here, we incorporate global relational reasoning in the latent interaction space and introduce multi-scale local (neighborhood) reasoning in the coordinate space to improve segmentation. Utilizing the multi-task model setup, the performance of the visual-semantic graph attention network in interaction detection is further enhanced through global reasoning. The global interaction space features from the segmentation module are introduced into the graph network, allowing it to detect interactions based on both node-to-node and global interaction reasoning. Our model reduces the computation cost compared to running two independent single-task models by sharing common modules, which is indispensable for practical applications. Using a sequential optimization technique, the proposed multi-task model outperforms other state-of-the-art single-task models on the MICCAI endoscopic vision challenge 2018 dataset. Additionally, we also observe the performance of the multi-task model when trained using the knowledge distillation technique.
The proposed network architecture. The proposed globally-reasoned multi-task scene understanding model consists of a shared feature extractor. The segmentation module performs latent global reasoning (GloRe unit [2]) and local reasoning (multi-scale local reasoning) to segment instruments. To detect tool interaction, the scene graph (tool interaction detection) model incorporates the global interaction space features to further improve the performance of the visual-semantic graph attention network [1].
Variants of feature sharing between the segmentation and scene graph modules in multi-task setting to improve single-task performance
In this project, we implement our method using the Pytorch and DGL library, the structure is as follows:
dataset/
: Contains the data needed to train the network.checkpoints/
: Contains trained weights.models/
: Contains network models.utils/
: Contains utility tools used for training and evaluation.
DGL is a Python package dedicated to deep learning on graphs, built atop existing tensor DL frameworks (e.g. Pytorch, MXNet) and simplifying the implementation of graph-based neural networks
- Python 3.6
- Pytorch 1.7.1
- DGL 0.4.2
- CUDA 10.2
- Ubuntu 16.04
We have provided environment files for installation using conda
conda env create -f environment.yml
- Frames - Left camera images from 2018 robotic scene segmentation challenge are used in this work.
- Instrument label - To be released!
- BBox and Tool-Tissue interaction annotation - Our annotations (Cite this paper / our previous work when using these annotations.)
- Download the pretrain word2vec model on GoogleNews and put it into
dataset/word2vec
- To be released!
- Set the model_type, version for the mode to be trained according to the instructions given in the train file
python3 model_train.py
For the direct sequence of commands to be followed, refer to this link
Download from [Checkpoints Link
], place it inside the repository root and unzip
Download from [Dataset Link
] and place it inside the repository root and unzip
To reproduce the results, set the model_type, ver, seg_mode and checkpoint_dir based on the table given here
- model_type
- ver
- seg_mode
- checkpoint_dir
python3 evaluation.py
Code adopted and modified from :
- Visual-Semantic Graph Attention Network for Human-Object Interaction Detecion
- Paper Visual-Semantic Graph Attention Network for Human-Object Interaction Detecion.
- Official Pytorch implementation code.
- Graph-Based Global Reasoning Networks
- Paper Graph-Based Global Reasoning Networks.
- Official code implementation code.
- Learning and Reasoning with the Graph Structure Representation in Robotic Surgery| [
arXiv
] |[Paper
] |
For any queries, please contact Lalithkumar or Sai Mitheran