Research Code for Multimodal-Cognition Team in Ant Group
β178Oct 14, 2025Updated 7 months ago
Alternatives and similar repositories for Ant-Multi-Modal-Framework
Users that are interested in Ant-Multi-Modal-Framework are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoningβ47Jul 17, 2025Updated 10 months ago
- [ECCV 2024π₯] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"β155Sep 10, 2024Updated last year
- This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale Pβ¦β26Nov 28, 2023Updated 2 years ago
- A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenariosβ13Jan 24, 2024Updated 2 years ago
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.β48Oct 14, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understandingβ50Jan 14, 2025Updated last year
- β12Jan 10, 2025Updated last year
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.β174Jan 30, 2025Updated last year
- [ACL 2023] VSTAR is a multimodal dialogue dataset with scene and topic transition informationβ16Oct 27, 2024Updated last year
- β17Oct 15, 2023Updated 2 years ago
- β22Aug 8, 2024Updated last year
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"β897Aug 13, 2024Updated last year
- [AAAI 2024] GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrievalβ20May 10, 2024Updated 2 years ago
- Open-Vocabulary High-Resolution Remote Sensing Image Semantic Segmentationβ29Sep 19, 2025Updated 8 months ago
- End-to-end encrypted email - Proton Mail β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- β28Oct 14, 2024Updated last year
- [ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Captionβ105Sep 18, 2023Updated 2 years ago
- β30Aug 14, 2023Updated 2 years ago
- Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.β5,916Mar 31, 2026Updated last month
- β26Aug 4, 2020Updated 5 years ago
- An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"β1,027Apr 12, 2024Updated 2 years ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool woβ¦β28Sep 25, 2024Updated last year
- [ICME 2025 Oral] Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentationβ14Dec 23, 2025Updated 5 months ago
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learningβ21Feb 19, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [CVPR2023] All in One: Exploring Unified Video-Language Pre-trainingβ281Mar 25, 2023Updated 3 years ago
- Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".β55Oct 21, 2025Updated 7 months ago
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)β66Jun 7, 2024Updated last year
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. ζ₯θΏGPT-4o葨η°ηεΌζΊε€ζ¨‘ζε―Ήθ―樑εβ10,038Sep 22, 2025Updated 8 months ago
- This repo is used for downloading the videos for SVD dataset.β18Aug 16, 2020Updated 5 years ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captionsβ139May 8, 2025Updated last year
- Chinese-native image generation while compatible with SD eco-system, 1st-gen, AAAI2025β13Jun 25, 2024Updated last year
- Official pytorch repository for "TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection" (AAAI 2024 Papeβ¦β58Feb 22, 2025Updated last year
- β60Aug 10, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Papers about the ultra high resolution tasks.β13Jul 12, 2024Updated last year
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)β23Aug 1, 2025Updated 9 months ago
- Chain-of-Frames [CVPR 2026]β40Jul 2, 2025Updated 10 months ago
- LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learningβ77May 23, 2025Updated last year
- EVA Series: Visual Representation Fantasies from BAAIβ2,677Aug 1, 2024Updated last year
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrievalβ38Aug 4, 2025Updated 9 months ago
- InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactionsβ2,925May 26, 2025Updated last year