Marqo-FashionCLIP and Marqo-FashionSigLIP Collection SOTA multimodal models for fashion product embeddings -> https://github.com/marqo-ai/marqo-FashionCLIP/ β’ 11 items β’ Updated 27 days ago β’ 6
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Paper β’ 2408.16768 β’ Published 27 days ago β’ 26
CogVLM2: Visual Language Models for Image and Video Understanding Paper β’ 2408.16500 β’ Published 27 days ago β’ 56
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Paper β’ 2408.15881 β’ Published 28 days ago β’ 20
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper β’ 2408.08872 β’ Published Aug 16 β’ 96