MAC-AutoML/QuoTA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MAC-AutoML/QuoTA)

MAC-AutoML / QuoTA

✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension"

☆79

Alternatives and similar repositories for QuoTA

Users that are interested in QuoTA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

VITA-MLLM / Sparrow
View on GitHub
Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation
☆32Mar 28, 2025Updated last year
zhourax / VEGA
View on GitHub
☆38Jul 9, 2024Updated 2 years ago
ggg0919 / cantor
View on GitHub
☆90May 10, 2024Updated 2 years ago
Kwai-YuanQi / MM-RLHF
View on GitHub
The Next Step Forward in Multimodal LLM Alignment
☆198May 1, 2025Updated last year
xjtupanda / Sparrow
View on GitHub
Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"
☆48Sep 3, 2025Updated 10 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
MME-Benchmarks / MME-Unify
View on GitHub
✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆42Apr 10, 2025Updated last year
MAC-AutoML / Awesome-Efficient-Large-Models
View on GitHub
A list of awesome papers on compression and acceleration of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs).
☆16May 12, 2026Updated 2 months ago
KejiaZhang-Robust / AI-Agent-papers
View on GitHub
Collection of recent works on AI Agents.
☆17Jun 5, 2025Updated last year
VITA-MLLM / Long-VITA
View on GitHub
✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
☆305May 14, 2025Updated last year
MAC-AutoML / ITPruner
View on GitHub
☆28Dec 7, 2021Updated 4 years ago
MME-Benchmarks / Video-MME-v2
View on GitHub
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
☆369May 24, 2026Updated last month
Tencent / VITA
View on GitHub
The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.
☆162Oct 28, 2025Updated 8 months ago
Northern-byte-bit / SpeechParaling-Bench
View on GitHub
☆30May 21, 2026Updated 2 months ago
MME-Benchmarks / Video-MME
View on GitHub
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆787Dec 8, 2025Updated 7 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
MAC-AutoML / SpecEyes
View on GitHub
[ECCV 2026🔥] This is the official implementation of our paper "SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception…
☆63Apr 2, 2026Updated 3 months ago
yangruoliu / VideoDetective
View on GitHub
VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding
☆58May 1, 2026Updated 2 months ago
MAC-AutoML / WFS-SB
View on GitHub
[CVPR 2026] Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding
☆31Apr 12, 2026Updated 3 months ago
YouHuang67 / mamba-code-explained
View on GitHub
☆19Jan 7, 2026Updated 6 months ago
yfzhang114 / r1_reward
View on GitHub
✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
☆291May 9, 2025Updated last year
Share14 / ShareGemini
View on GitHub
☆32Jul 29, 2024Updated last year
MAC-AutoML / SocialOmni
View on GitHub
Benchmarking Audio-Visual Social Interactivity in Omni Models
☆46May 7, 2026Updated 2 months ago
jongwoopark7978 / LVNet
View on GitHub
[Main Conference @ EACL'26] [Workshop @ NeurIPS'24] 🎞️ LVNet.
☆44Feb 10, 2026Updated 5 months ago
VITA-MLLM / Omni-Diffusion
View on GitHub
✨✨[ICML 2026] Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
☆151Mar 12, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Hon-Wong / ByteVideoLLM
View on GitHub
[ICCV 2025] Dynamic-VLM
☆28Dec 16, 2024Updated last year
MiG-NJU / PersonaVLM
View on GitHub
[CVPR 2026 Highlight] PersonaVLM: Long-Term Personalized Multimodal LLMs
☆110Apr 16, 2026Updated 3 months ago
sosppxo / 3D-STMN
View on GitHub
[AAAI 2024] The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Refer…
☆46Dec 20, 2023Updated 2 years ago
KejiaZhang-Robust / VAP
View on GitHub
[NeurIPS 2025] Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
☆38Sep 21, 2025Updated 9 months ago
wzhuang-xmu / LoSA
View on GitHub
[ICLR 2025] Official implementation of paper "Dynamic Low-Rank Sparse Adaptation for Large Language Models".
☆25Mar 16, 2025Updated last year
CeeZh / SILVR
View on GitHub
Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"
☆19Jan 18, 2026Updated 6 months ago
Vanixxz / BackMix
View on GitHub
[TPAMI2025] BackMix: Regularizing Open Set Recognition by Removing Underlying Fore-Background Priors
☆16Apr 23, 2025Updated last year
ziqipang / MR-Video
View on GitHub
MR. Video: MapReduce is the Principle for Long Video Understanding
☆31Jun 18, 2026Updated last month
RoyZhao926 / InstructBrush
View on GitHub
Official repository of the paper InstructBrush: Learning Attention-based Instruction Optimization for Image Editing
☆16Apr 14, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
zihuixue / MKE
View on GitHub
[ICCV 2021] Multimodal Knowledge Expansion
☆10Aug 28, 2021Updated 4 years ago
VITA-MLLM / VITA-QinYu
View on GitHub
VITA-QINYU: Expressive Spoken Language Model for Role-Playing and Singing
☆121Jul 14, 2026Updated last week
qhfan / UniPrefill
View on GitHub
Implementation of "UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification"
☆41May 8, 2026Updated 2 months ago
FeiElysia / Tempo
View on GitHub
Tempo: Small Vision-Language Models are Smart Compressors for Long Video Understanding (ECCV 2026)
☆76Jun 29, 2026Updated 3 weeks ago
sosppxo / RG-SAN
View on GitHub
[NeurIPS 2024 Oral] RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
☆20Dec 22, 2024Updated last year
sunggo / APIB
View on GitHub
Code for ICCV23 paper "Automatic network pruning via Hilbert Schmidt independence criterion lasso under information bottleneck principle"
☆18Aug 8, 2023Updated 2 years ago
yuanc3 / DATE
View on GitHub
Use 2 lines to empower absolute time awareness for Qwen2.5VL's MRoPE
☆29Sep 20, 2025Updated 10 months ago