BriansIDP/AudioVisualLLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/BriansIDP/AudioVisualLLM)

BriansIDP / AudioVisualLLM

☆19

Alternatives and similar repositories for AudioVisualLLM

Users that are interested in AudioVisualLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

stoneMo / OneAVM
View on GitHub
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
☆12Jun 1, 2023Updated 3 years ago
LoieSun / Auto-ACD
View on GitHub
code for A Large-scale Dataset for Audio-Language Representation Learning
☆14Sep 18, 2024Updated last year
SAGNIKMJR / ego-AV-spatial-correspondence
View on GitHub
[CVPR 2024] Code and datasets for 'Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos'
☆14Jun 16, 2024Updated 2 years ago
jack1yang / image-paragraph-captioning
View on GitHub
A Hierarchical Approach for Generating Descriptive Image Paragraphs
☆10Mar 27, 2020Updated 6 years ago
InnerPeace-Wu / im2p-tensorflow
View on GitHub
Implementation of CVPR2017 paper "A Hierarchical Approach for Generating Descriptive Image Paragraphs" in Tensorflow (in progress...)
☆13Jan 27, 2018Updated 8 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Skyline-9 / Visionary-Vids
View on GitHub
Multi-modal transformer approach for natural language query based joint video summarization and highlight detection
☆17May 23, 2024Updated 2 years ago
jsoft88 / cptr-vision-transformer
View on GitHub
Implementation of the CPTR model by https://arxiv.org/pdf/2101.10804.pdf
☆10Mar 27, 2022Updated 4 years ago
GeWu-Lab / LFAV
View on GitHub
Towards Long Form Audio-visual Video Understanding
☆15Jan 16, 2026Updated 6 months ago
stoneMo / CIGN
View on GitHub
Official implementation for CIGN
☆17Sep 11, 2023Updated 2 years ago
the-anonymous-bs / av-SALMONN
View on GitHub
av-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
☆13May 8, 2024Updated 2 years ago
YUCHEN005 / GILA
View on GitHub
Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"
☆18Jun 21, 2023Updated 3 years ago
hxixixh / mix-and-localize
View on GitHub
☆23Mar 20, 2024Updated 2 years ago
zhiheLu / Ensemble_VLM
View on GitHub
Official code for paper "Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models, ICML2024"
☆28Feb 2, 2025Updated last year
MrZilinXiao / AutoVER
View on GitHub
[ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.
☆14Mar 2, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
seaweiqing / image2story
View on GitHub
A project for telling stories according to images in some particular style
☆17Dec 16, 2018Updated 7 years ago
rishabhjain16 / whisper_child_asr
View on GitHub
☆12May 23, 2023Updated 3 years ago
BaoBaoGitHub / Hungyi_Lee_Machine_Learning_2021
View on GitHub
李宏毅机器学习2021笔记
☆14Nov 27, 2022Updated 3 years ago
GuangyanS / Sys2-LLaVA
View on GitHub
☆31Feb 10, 2025Updated last year
GeWu-Lab / MMCosine_ICASSP23
View on GitHub
The code repo for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning"
☆26May 18, 2023Updated 3 years ago
h-munakata / Lighthouse-Wrapper-for-Audio-Moment-Retrieval
View on GitHub
☆13Mar 23, 2026Updated 3 months ago
allenai / sso
View on GitHub
Repository for Skill Set Optimization
☆14Jul 26, 2024Updated last year
rikeilong / Bay-CAT
View on GitHub
[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…
☆59Sep 4, 2024Updated last year
GX77 / LCVSL
View on GitHub
☆14Sep 28, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
wutong8023 / SpeechRE
View on GitHub
☆11Nov 11, 2022Updated 3 years ago
JeongHun0716 / zero-avsr
View on GitHub
Official PyTorch implementation for "Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech …
☆36May 11, 2025Updated last year
uzh-dqbm-cmi / ARGON
View on GitHub
Progressive Transformer-Based Generation of Radiology Reports
☆25Jan 5, 2025Updated last year
TencentARC / FLM
View on GitHub
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
☆31May 15, 2023Updated 3 years ago
ewwink / wikipedia-wordlists-extractor
View on GitHub
Extract Unique Word Lists From Wikipedia Database
☆13May 27, 2020Updated 6 years ago
JeongHun0716 / MMS-LLaMA
View on GitHub
Official PyTorch implementation for "MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens…
☆48Jun 12, 2025Updated last year
LandyGuo / Download_HowTo100M
View on GitHub
code for downloading videos from HowTo100M dataset
☆18May 13, 2021Updated 5 years ago
XL2248 / SOV-MAS
View on GitHub
The code and data for "Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization"
☆11May 16, 2023Updated 3 years ago
zafstojano / wordgamebench
View on GitHub
Evaluating language models on word puzzle games
☆10Oct 25, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
MengboLi / MS-SENet
View on GitHub
☆11Jul 16, 2024Updated 2 years ago
mengshiY / RCSF
View on GitHub
Code for paper "Cross-Domain Slot Filling as Machine Reading Comprehension" in IJCAI 2021
☆11Aug 24, 2021Updated 4 years ago
zchoi / GLSCL
View on GitHub
[TIP25] Code for "Text-Video Retrieval with Global-Local Semantic Consistent Learning"
☆16May 12, 2025Updated last year
Serega6678 / NuNER
View on GitHub
NuNER is the family of SOTA Foundation and Zero-shot for Entity Recognition
☆15Jun 11, 2024Updated 2 years ago
mesolitica / multimodal-LLM
View on GitHub
Multi-Modal Language Modeling with Image, Audio and Text Integration, included multi-images and multi-audio in a single multiturn.
☆18Feb 20, 2024Updated 2 years ago
Sreyan88 / RECAP
View on GitHub
Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning
☆16Jun 23, 2024Updated 2 years ago
yangjingyuan / ConstDecoder
View on GitHub
☆11Oct 24, 2022Updated 3 years ago