RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer

Published:

RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer

Accepted to ICML 2025

Overview

This project investigates the cooperation dynamics between different modalities in multimodal transformers and proposes RollingQ, a novel approach to enhance cross-modal understanding and generation capabilities.

Key Contributions

  • Discovered that dynamic adaptability of widely-used self-attention models diminishes during training
  • Identified a self-reinforcing cycle that progressively overemphasizes the favored modality
  • Proposed Rolling Query (RollingQ) to balance attention allocation and restore cooperation dynamics
  • Validated effectiveness across various multimodal scenarios

Authors

Haotian Ni, Yake Wei, Hang Liu, Gong Chen, Chong Peng, Hao Lin, Di Hu

Publication

Accepted to International Conference on Machine Learning (ICML), 2025