RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer
Published:
RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer
Accepted to ICML 2025
Overview
This project investigates the cooperation dynamics between different modalities in multimodal transformers and proposes RollingQ, a novel approach to enhance cross-modal understanding and generation capabilities.
Key Contributions
- Discovered that dynamic adaptability of widely-used self-attention models diminishes during training
- Identified a self-reinforcing cycle that progressively overemphasizes the favored modality
- Proposed Rolling Query (RollingQ) to balance attention allocation and restore cooperation dynamics
- Validated effectiveness across various multimodal scenarios
Links
- 📄 arXiv Paper
- 💻 Code (Coming soon)
- 📊 Project Page (Coming soon)
Authors
Haotian Ni, Yake Wei, Hang Liu, Gong Chen, Chong Peng, Hao Lin, Di Hu
Publication
Accepted to International Conference on Machine Learning (ICML), 2025