NCPS: ThinkingViT at CVPR!

"ThinkingViT: Matryoshka Thinking Vision Transformer for Elastic Inference" has just been accepted to CVPR!

ThinkingViT brings thinking-style adaptive computation to Vision Transformers by letting the model “think more” only when an image is hard, and exit early when it is confident. Concretely, ThinkingViT activates a small subset of the most important attention heads for an initial prediction, and progressively expands to larger subsets only if the confidence (entropy) suggests that more computation is needed. To make each rethink more powerful, we introduce Token Recycling, which fuses previous-stage embeddings back into the input so the model refines its decision instead of starting from scratch.

Congratulations and thanks to the team: Ali Hojjat, Janek Haberer, Soren Pirk, Olaf Landsiedel

Preprint: https://lnkd.in/eGriuYXp
GitHub: https://lnkd.in/eDd84SSc

ThinkingViT at CVPR!

TU Hamburg

Links