2024-2020 | {$INSTITUTE}

[176505]

Title: Data-Efficient Vision Transformers for Multi-Label Disease Classification on Chest Radiographs.

Written by: F. Behrendt and D. Bhattacharya and J. Krüger and R. Opfer and A. Schlaefer

in: <em>Current Directions in Biomedical Engineering</em>. (2022).

Volume: <strong>8</strong>. Number: (1),

on pages: 34--37

Chapter:

Editor:

Publisher:

Series:

Address:

Edition:

ISBN:

how published:

Organization:

School:

Institution:

Type:

DOI: doi:10.1515/cdbme-2022-0009

URL: https://doi.org/10.1515/cdbme-2022-0009

ARXIVID:

PMID:

[www] [BibTex]

Note:

Abstract: Radiographs are a versatile diagnostic tool for thedetection and assessment of pathologies, for treatment plan-ning or for navigation and localization purposes in clinical in-terventions. However, their interpretation and assessment byradiologists can be tedious and error-prone. Thus, a wide va-riety of deep learning methods have been proposed to supportradiologists interpreting radiographs.Mostly, these approaches rely on convolutional neural net-works (CNN) to extract features from images. Especially forthe multi-label classification of pathologies on chest radio-graphs (Chest X-Rays, CXR), CNNs have proven to be wellsuited. On the Contrary, Vision Transformers (ViTs) have notbeen applied to this task despite their high classification per-formance on generic images and interpretable local saliencymaps which could add value to clinical interventions. ViTs donot rely on convolutions but on patch-based self-attention andin contrast to CNNs, no prior knowledge of local connectivityis present. While this leads to increased capacity, ViTs typi-cally require an excessive amount of training data which rep-resents a hurdle in the medical domain as high costs are asso-ciated with collecting large medical data sets.In this work, we systematically compare the classification per-formance of ViTs and CNNs for different data set sizes andevaluate more data-efficient ViT variants (DeiT). Our resultsshow that while the performance between ViTs and CNNs ison par with a small benefit for ViTs, DeiTs outperform theformer if a reasonably large data set is available for training

To top