Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Grounding"
☆53Dec 30, 2023Updated 2 years ago
Alternatives and similar repositories for ADPN-MM
Users that are interested in ADPN-MM are comparing it to the libraries listed below
Sorting:
- Official repository of NeurIPS D&B Track 2024 paper "VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understan…☆40Jan 20, 2025Updated last year
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆21Feb 19, 2025Updated last year
- [WACV 2025] Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection☆16Mar 23, 2025Updated 11 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Sep 25, 2024Updated last year
- Chinese CLIP models with SOTA performance.☆60Aug 28, 2023Updated 2 years ago
- SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime☆109Oct 6, 2025Updated 5 months ago
- Multi-armed bandit algorithm with tensorflow and 11 policies☆16Dec 27, 2022Updated 3 years ago
- Official implement of MIA-DPO☆71Jan 23, 2025Updated last year
- 阅读顺序、Layoutreader☆19May 8, 2025Updated 10 months ago
- [CVPR 2025, Highlight] The official implementation of the paper "Unleashing In-context Learning of Autoregressive Models for Few-shot Ima…☆27Jun 6, 2025Updated 9 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Feb 24, 2026Updated last week
- Precision Search through Multi-Style Inputs☆73Jul 30, 2025Updated 7 months ago
- 甲木常用的 Agent Skills 集合,包含日常工作流中积累的高效技能扩展。 / A curated collection of Agent Skills for daily productivity workflows.☆64Jan 25, 2026Updated last month
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- LMM solved catastrophic forgetting, AAAI2025☆46Apr 15, 2025Updated 10 months ago
- Code and data of "Controllable Unsupervised Event-based Video Generation" (accepted as ICIP oral and invited by WACV workshop)☆19Nov 5, 2024Updated last year
- [WIP@Oct 13] 质衡-基准测试 (Q-Bench in Chinese),包含中文版【底层视觉问答】和【底层视觉描述】数据集,以及中文提示下的图片质量评价。 We will release Q-Bench in more languages in the futu…☆24Jan 7, 2024Updated 2 years ago
- [AAAI 2024] GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval☆20May 10, 2024Updated last year
- [CVPR 2024] Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training☆43Apr 13, 2024Updated last year
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆51Sep 14, 2024Updated last year
- Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …☆18Apr 12, 2024Updated last year
- A Dead Simple and Modularized Multi-Modal Training and Finetune Framework. Compatible to any LLaVA/Flamingo/QwenVL/MiniGemini etc series …☆19Apr 24, 2024Updated last year
- Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Rou…☆34Sep 25, 2025Updated 5 months ago
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆24Oct 5, 2024Updated last year
- LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)☆860Jul 29, 2024Updated last year
- 🔥🔥First-ever hour scale video understanding models☆614Jul 14, 2025Updated 7 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆104May 30, 2025Updated 9 months ago
- Extension of ChatTTS, 3x Faster on Windows, Support Voice Cloning and Mobile Deployment☆172Feb 9, 2025Updated last year
- Trying to implement https://arxiv.org/abs/2305.08891☆34Jun 10, 2023Updated 2 years ago
- NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing☆579Oct 20, 2024Updated last year
- Official implementation for "Diffusion Instruction Tuning"☆31Jun 10, 2025Updated 8 months ago
- Official Pytorch Implementation of 'BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos'☆35Feb 26, 2025Updated last year
- Unified layout planning and image generation, ICCV2025☆41Jan 19, 2026Updated last month
- Our 2nd-gen LMM☆34May 22, 2024Updated last year
- Exploration of the multi modal fuyu-8b model of Adept. 🤓 🔍☆27Nov 7, 2023Updated 2 years ago
- A Toolkit for Running On-device Large Language Models (LLMs) in APP☆82Jul 4, 2024Updated last year
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆130Nov 6, 2024Updated last year
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.☆74Oct 14, 2024Updated last year
- [ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning☆160Aug 8, 2025Updated 7 months ago