[176505]
Title: Data-Efficient Vision Transformers for Multi-Label Disease Classification on Chest Radiographs.
Written by: F. Behrendt and D. Bhattacharya and J. Krüger and R. Opfer and A. Schlaefer
in: <em>Current Directions in Biomedical Engineering</em>. (2022).
Volume: <strong>8</strong>. Number: (1),
on pages: 34--37
Chapter:
Editor:
Publisher:
Series:
Address:
Edition:
ISBN:
how published:
Organization:
School:
Institution:
Type:
DOI: doi:10.1515/cdbme-2022-0009
URL: https://doi.org/10.1515/cdbme-2022-0009
ARXIVID:
PMID:

[www] [BibTex]

Note:

Abstract: Radiographs are a versatile diagnostic tool for thedetection and assessment of pathologies, for treatment plan-ning or for navigation and localization purposes in clinical in-terventions. However, their interpretation and assessment byradiologists can be tedious and error-prone. Thus, a wide va-riety of deep learning methods have been proposed to supportradiologists interpreting radiographs.Mostly, these approaches rely on convolutional neural net-works (CNN) to extract features from images. Especially forthe multi-label classification of pathologies on chest radio-graphs (Chest X-Rays, CXR), CNNs have proven to be wellsuited. On the Contrary, Vision Transformers (ViTs) have notbeen applied to this task despite their high classification per-formance on generic images and interpretable local saliencymaps which could add value to clinical interventions. ViTs donot rely on convolutions but on patch-based self-attention andin contrast to CNNs, no prior knowledge of local connectivityis present. While this leads to increased capacity, ViTs typi-cally require an excessive amount of training data which rep-resents a hurdle in the medical domain as high costs are asso-ciated with collecting large medical data sets.In this work, we systematically compare the classification per-formance of ViTs and CNNs for different data set sizes andevaluate more data-efficient ViT variants (DeiT). Our resultsshow that while the performance between ViTs and CNNs ison par with a small benefit for ViTs, DeiTs outperform theformer if a reasonably large data set is available for training

To top