Research Code for Multimodal-Cognition Team in Ant Group
☆179Oct 14, 2025Updated 8 months ago
Alternatives and similar repositories for Ant-Multi-Modal-Framework
Users that are interested in Ant-Multi-Modal-Framework are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆48Jul 17, 2025Updated 11 months ago
- This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale P…☆26Nov 28, 2023Updated 2 years ago
- A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios☆13Jan 24, 2024Updated 2 years ago
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.☆49Oct 14, 2024Updated last year
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆49Jan 14, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆175Jan 30, 2025Updated last year
- [ACL 2023] VSTAR is a multimodal dialogue dataset with scene and topic transition information☆16Oct 27, 2024Updated last year
- ☆17Oct 15, 2023Updated 2 years ago
- ☆22Aug 8, 2024Updated last year
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"☆901Aug 13, 2024Updated last year
- ☆13Jun 11, 2026Updated 2 weeks ago
- 【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?☆256Nov 29, 2024Updated last year
- [AAAI 2024] GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval☆20May 10, 2024Updated 2 years ago
- [ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption☆105Sep 18, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆30Aug 14, 2023Updated 2 years ago
- Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.☆5,945Mar 31, 2026Updated 2 months ago
- ☆26Aug 4, 2020Updated 5 years ago
- An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"☆1,028Apr 12, 2024Updated 2 years ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆28Sep 25, 2024Updated last year
- [ICME 2025 Oral] Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation☆14Dec 23, 2025Updated 6 months ago
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆21Feb 19, 2025Updated last year
- Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".☆55Oct 21, 2025Updated 8 months ago
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆66Jun 7, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆10,074Sep 22, 2025Updated 9 months ago
- This repo is used for downloading the videos for SVD dataset.☆18Aug 16, 2020Updated 5 years ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆138May 8, 2025Updated last year
- ☆26Mar 29, 2025Updated last year
- Official pytorch repository for "TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection" (AAAI 2024 Pape…☆57Feb 22, 2025Updated last year
- ☆60Aug 10, 2023Updated 2 years ago
- Papers about the ultra high resolution tasks.☆13Jul 12, 2024Updated last year
- Chain-of-Frames [CVPR 2026]☆40Jul 2, 2025Updated 11 months ago
- LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning☆78May 23, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- EVA Series: Visual Representation Fantasies from BAAI☆2,680Aug 1, 2024Updated last year
- InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions☆2,924May 26, 2025Updated last year
- [ICCV 2025] Explore the Limits of Omni-modal Pretraining at Scale☆124Sep 2, 2024Updated last year
- mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)☆227Jul 21, 2023Updated 2 years ago
- Emu Series: Generative Multimodal Models from BAAI☆1,775Jan 12, 2026Updated 5 months ago
- Source code of our MM'22 paper Partially Relevant Video Retrieval☆57Nov 4, 2024Updated last year
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.☆73Oct 14, 2024Updated last year