Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models Paper • 2310.05863 • Published Oct 9, 2023 • 2