Multimodal (text + image + video + audio) embedding models aligned with jina-embeddings-v5-text-*. Two sizes, four task variants each.
-
jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition
Paper • 2605.08384 • Published • 11 -
jinaai/jina-embeddings-v5-omni-small
Feature Extraction • 2B • Updated • 180k • 67 -
jinaai/jina-embeddings-v5-omni-nano
Feature Extraction • 1.0B • Updated • 84.2k • 27 -
jinaai/jina-embeddings-v5-omni-nano-text-matching
Feature Extraction • 0.9B • Updated • 926 • 3
