JavisVerse/Awesome-AVI

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JavisVerse/Awesome-AVI)

JavisVerse / Awesome-AVI

Awesome Audio-Visual Intelligence, Survey of Audio-Visual Intelligence

☆84

Alternatives and similar repositories for Awesome-AVI

Users that are interested in Awesome-AVI are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Lliar-liar / Daily-Omni
View on GitHub
This is the official repository of Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
☆42Apr 28, 2026Updated 2 months ago
yaolinli / TimeChat-Captioner
View on GitHub
[ICML 2026] Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
☆49Jun 29, 2026Updated 3 weeks ago
WPR001 / UGC_VideoCaptioner
View on GitHub
☆16Jun 23, 2026Updated last month
AVoCaDO-Captioner / AVoCaDO
View on GitHub
https://avocado-captioner.github.io/
☆37Oct 16, 2025Updated 9 months ago
HVision-NKU / ASID-Caption
View on GitHub
ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Unde…
☆68Mar 3, 2026Updated 4 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
kaist-ami / AVHBench
View on GitHub
[ICLR'25] Official repository for "AVHBench: A Cross-Modal Hallucination Evaluation for Audio-Visual Large Language Models"
☆25Mar 8, 2026Updated 4 months ago
yichenzeng24 / SAVN-CE
View on GitHub
[CVPR'26] Semantic Audio-Visual Navigation in Continuous Environments
☆29Jun 23, 2026Updated last month
NJU-LINK / OmniVideoBench
View on GitHub
The Source Code for OmniVideoBench @ICLR 2026
☆77Feb 12, 2026Updated 5 months ago
shlizee / savvy
View on GitHub
Repository for SAVVY(Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing) Benchmark and SAVVY model
☆25May 30, 2026Updated last month
JavisVerse / JavisDiT
View on GitHub
[ICLR 2026] Official implementation of JavisDiT and JavisDiT++ series.
☆376Mar 29, 2026Updated 3 months ago
BayLing-Models / BayLing-Duplex
View on GitHub
Native full-duplex speech dialogue inference for BayLing-Duplex.
☆63Jun 22, 2026Updated last month
ddlBoJack / Omni-Captioner
View on GitHub
[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.
☆142Apr 7, 2026Updated 3 months ago
zhousheng97 / EgoTextVQA
View on GitHub
[CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
☆52Jun 19, 2025Updated last year
SCIR-SC-Qiaoban-Team / FreeEvalLM
View on GitHub
[AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitie…
☆11Feb 7, 2026Updated 5 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
LAION-AI / scaled-echo-tts
View on GitHub
Scaled diffusion transformer for text-to-speech synthesis (DiT + T5Gemma2 conditioning, TorchTitan & Megatron backends, tested up to 1024…
☆24Mar 29, 2026Updated 3 months ago
JaaackHongggg / WorldSense
View on GitHub
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
☆50Jul 12, 2026Updated last week
hyzhang24 / DuplexSLA
View on GitHub
DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action
☆105May 20, 2026Updated 2 months ago
NJU-LINK / T2AV-Compass
View on GitHub
The Source Code for T2AV-Compass @ ICML 2026
☆20Jun 21, 2026Updated last month
yangdongchao / ALMTokenizer
View on GitHub
The demo page for ALMTokenizer
☆59Apr 14, 2025Updated last year
Aiden0526 / MuSLR
View on GitHub
Coda and Data for NeurIPS 2025 paper "MuSLR: Multimodal Symbolic Logical Reasoning"
☆16Oct 5, 2025Updated 9 months ago
HarryHsing / EchoInk
View on GitHub
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning (🔥The Exploration of R1 for General Audio-Vis…
☆78Jun 3, 2026Updated last month
bytedance / UniVR
View on GitHub
☆28Updated this week
KD-TAO / OmniZip
View on GitHub
[CVPR 2026] OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
☆100Apr 20, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
xiaomi-research / acavcaps
View on GitHub
☆31Mar 27, 2026Updated 3 months ago
klingfoley / Kling-Foley
View on GitHub
Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
☆62Jun 26, 2025Updated last year
multimodal-art-projection / OmniBench
View on GitHub
A project for tri-modal LLM benchmarking and instruction tuning.
☆61Mar 27, 2025Updated last year
UniX-AI-Lab / WorldReasonBench
View on GitHub
WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors
☆22May 19, 2026Updated 2 months ago
Dorniwang / UniVerse-1-code
View on GitHub
The official UniVerse-1 code.
☆129Oct 13, 2025Updated 9 months ago
ku-vai / TPoS
View on GitHub
This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)
☆25Dec 7, 2023Updated 2 years ago
NJU-LINK / DRIFT
View on GitHub
Design for Error Detection in Deep-Research Agents Trajectories.
☆22Jun 4, 2026Updated last month
jinbae-s / ACVIS
View on GitHub
[ICASSP 2026] The official pytorch implementation of ACVIS
☆15Jan 19, 2026Updated 6 months ago
Junchao-cs / Edit360
View on GitHub
[ICCV 2025 Highlight] "Edit360: 2D Image Edits to 3D Assets from Any Angle"
☆21Feb 4, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
microsoft / AVGen-Bench
View on GitHub
[ICML26] AVGen-Bench is a task-driven benchmark for multi-granular evaluation of Text-to-Audio-Video (T2AV) generation.
☆22Jul 2, 2026Updated 3 weeks ago
AudenAI / Auden
View on GitHub
☆71Apr 2, 2026Updated 3 months ago
ASLP-lab / M7-TTS
View on GitHub
M7-TTS: A Mini-Scale Multilingual and Multi-Dialect Text-to-Speech Language Model with Mimi codec and Multi Token Prediction
☆20Mar 19, 2026Updated 4 months ago
yl3800 / EIGV
View on GitHub
☆15Aug 12, 2022Updated 3 years ago
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 5 months ago
zhaoyx239 / X-Translator
View on GitHub
☆25Updated this week
XiaomiMiMo / MiMo-Audio-Tokenizer
View on GitHub
A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.
☆145Sep 19, 2025Updated 10 months ago