AVoCaDO-Captioner/AVoCaDO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AVoCaDO-Captioner/AVoCaDO)

AVoCaDO-Captioner / AVoCaDO

https://avocado-captioner.github.io/

☆37

Alternatives and similar repositories for AVoCaDO

Users that are interested in AVoCaDO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

WPR001 / UGC_VideoCaptioner
View on GitHub
☆16Jun 23, 2026Updated last month
Lliar-liar / Daily-Omni
View on GitHub
This is the official repository of Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
☆42Apr 28, 2026Updated 2 months ago
yaolinli / TimeChat-Captioner
View on GitHub
[ICML 2026] Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
☆49Jun 29, 2026Updated 3 weeks ago
WeChatCV / D-ORCA
View on GitHub
D-ORCA: Dialogue-Centric Optimization for Robust Audio-Visual Captioning
☆15Feb 11, 2026Updated 5 months ago
NJU-LINK / OmniVideoBench
View on GitHub
The Source Code for OmniVideoBench @ICLR 2026
☆77Feb 12, 2026Updated 5 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
HVision-NKU / ASID-Caption
View on GitHub
ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Unde…
☆68Mar 3, 2026Updated 4 months ago
JaaackHongggg / WorldSense
View on GitHub
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
☆50Jul 12, 2026Updated last week
bytedance / video-SALMONN-2
View on GitHub
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…
☆204Feb 23, 2026Updated 5 months ago
ddlBoJack / Omni-Captioner
View on GitHub
[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.
☆142Apr 7, 2026Updated 3 months ago
dingyue772 / OmniSIFT
View on GitHub
[ICML2026] OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models
☆25May 21, 2026Updated 2 months ago
64327069 / LVAgent
View on GitHub
Code of LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
☆39Nov 24, 2025Updated 8 months ago
Dorniwang / UniVerse-1-code
View on GitHub
The official UniVerse-1 code.
☆129Oct 13, 2025Updated 9 months ago
kaist-ami / AVHBench
View on GitHub
[ICLR'25] Official repository for "AVHBench: A Cross-Modal Hallucination Evaluation for Audio-Visual Large Language Models"
☆25Mar 8, 2026Updated 4 months ago
JavisVerse / Awesome-AVI
View on GitHub
Awesome Audio-Visual Intelligence, Survey of Audio-Visual Intelligence
☆84May 8, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
JaeYeonKang / ECFVI
View on GitHub
Error Compensation Framework for Flow-Guided Video Inpainting
☆15Aug 9, 2022Updated 3 years ago
TencentARC / Video-Holmes
View on GitHub
[ECCV 2026] Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆95Jul 13, 2025Updated last year
j0seo / lookahead-anchoring
View on GitHub
☆15Oct 27, 2025Updated 8 months ago
swagshaw / WildDESED
View on GitHub
WildDESED: A LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection
☆18Nov 19, 2024Updated last year
klingfoley / Kling-Foley
View on GitHub
Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
☆62Jun 26, 2025Updated last year
W-Wu / DEER
View on GitHub
☆12Aug 25, 2023Updated 2 years ago
GXYM / VCapsBench
View on GitHub
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
☆20Jun 2, 2025Updated last year
ShareLab-SII / FluxMem
View on GitHub
[CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding
☆73Mar 16, 2026Updated 4 months ago
xuyang-liu16 / GlobalCom2
View on GitHub
[AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
☆42Jan 27, 2026Updated 5 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
multimodal-art-projection / OmniBench
View on GitHub
A project for tri-modal LLM benchmarking and instruction tuning.
☆61Mar 27, 2025Updated last year
schowdhury671 / meerkat
View on GitHub
☆35Jul 9, 2025Updated last year
ddlBoJack / MT4SSL
View on GitHub
[INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by In…
☆45Mar 25, 2024Updated 2 years ago
NJU-LINK / T2AV-Compass
View on GitHub
The Source Code for T2AV-Compass @ ICML 2026
☆20Jun 21, 2026Updated last month
wenhaochai / aurora
View on GitHub
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
☆147Jun 4, 2025Updated last year
zhengdian1 / AIA
View on GitHub
☆45Jan 4, 2026Updated 6 months ago
V-STaR-Bench / V-STaR
View on GitHub
Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
☆45Mar 2, 2026Updated 4 months ago
bebr2 / RACE
View on GitHub
Code for RACE.
☆15Nov 12, 2025Updated 8 months ago
fansunqi / VideoTool
View on GitHub
Official Repository for NeurIPS'25 Paper "Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task"
☆23May 18, 2026Updated 2 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
aiha-lab / InfiniPot-V
View on GitHub
[NeurIPS 25] InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
☆20Jan 25, 2026Updated 6 months ago
KD-TAO / OmniZip
View on GitHub
[CVPR 2026] OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
☆100Apr 20, 2026Updated 3 months ago
Jialuo-Li / DIG
View on GitHub
[CVPR 2026] Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
☆21Feb 21, 2026Updated 5 months ago
GLJS / AudioToolAgent
View on GitHub
GitHub repository for AudioToolAgent
☆20Feb 13, 2026Updated 5 months ago
ziqipang / MR-Video
View on GitHub
MR. Video: MapReduce is the Principle for Long Video Understanding
☆31Jun 18, 2026Updated last month
xuyang-liu16 / hermes-code-bridge
View on GitHub
Use Hermes Agent as the control plane for local coding agents like Codex, Kimi Code, Claude Code, OpenCode, and Gemini CLI.
☆24Updated this week
xuyang-liu16 / VidCom2
View on GitHub
[EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
☆127May 14, 2026Updated 2 months ago