the-anonymous-bs/av-SALMONN

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/the-anonymous-bs/av-SALMONN)

the-anonymous-bs / av-SALMONN

av-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

☆13

Alternatives and similar repositories for av-SALMONN

Users that are interested in av-SALMONN are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

BriansIDP / WhisperBiasing
View on GitHub
☆88Jul 31, 2025Updated 11 months ago
BriansIDP / RTLM
View on GitHub
☆12Oct 19, 2020Updated 5 years ago
Mashiro009 / slidespeech_dl
View on GitHub
☆24Sep 20, 2024Updated last year
SMILE-data / SMILE
View on GitHub
SMILE: A Multimodal Dataset for Understanding Laughter
☆13Jun 15, 2023Updated 3 years ago
ciodar / UniversalAttribution
View on GitHub
[ECCVW/TWYN 2024 - Best Workshop Paper] Are CLIP features all you need for Universal Synthetic Image Origin Attribution?
☆14Mar 27, 2026Updated 3 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
zds-potato / multilingual-phonetic-sv
View on GitHub
☆10Dec 22, 2023Updated 2 years ago
ReXTime / ReXTime
View on GitHub
☆18Jan 26, 2026Updated 5 months ago
yl4467 / singer
View on GitHub
☆15Feb 22, 2025Updated last year
chu0802 / SnD
View on GitHub
This is an official implementation of our work, Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on V…
☆17Sep 24, 2025Updated 9 months ago
WeChatCV / D-ORCA
View on GitHub
D-ORCA: Dialogue-Centric Optimization for Robust Audio-Visual Captioning
☆15Feb 11, 2026Updated 5 months ago
Franklin905 / VALOR
View on GitHub
Research code for NeurIPS 2023 paper "Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser"
☆17Jul 13, 2025Updated last year
Gorilla-Lab-SCUT / TTAC2
View on GitHub
[TPAMI 2024] The official implementation of "Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clu…
☆13Mar 19, 2024Updated 2 years ago
B06901052 / DeepSpeed
View on GitHub
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
☆13Oct 11, 2022Updated 3 years ago
pratyushmaini / ssft
View on GitHub
[NeurIPS'22] Official Repository for Characterizing Datapoints via Second-Split Forgetting
☆16Aug 11, 2023Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
h-munakata / Lighthouse-Wrapper-for-Audio-Moment-Retrieval
View on GitHub
☆13Mar 23, 2026Updated 3 months ago
BriansIDP / video-SALMONN-o1
View on GitHub
☆40Aug 26, 2025Updated 10 months ago
stoneMo / OneAVM
View on GitHub
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
☆12Jun 1, 2023Updated 3 years ago
wutong8023 / SpeechRE
View on GitHub
☆11Nov 11, 2022Updated 3 years ago
kjw11 / CSEnet-ASR
View on GitHub
Cross-Speaker Encoding Network for Multi-talker Speech Recognition
☆12Mar 14, 2025Updated last year
Yaojie-Shen / CoCap
View on GitHub
[ICCV 2023] Accurate and Fast Compressed Video Captioning
☆52Jul 28, 2025Updated 11 months ago
MatthewTamYT / Breakout
View on GitHub
Breakout is a game created with Python 3, using the module PyGame. It is a ball game where you bounce the ball by moving the paddle. Elim…
☆18Jul 24, 2021Updated 4 years ago
aiming-lab / MJ-Video
View on GitHub
[NeurIPS'25 Spotlight] MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation
☆20Feb 23, 2025Updated last year
XL2248 / SOV-MAS
View on GitHub
The code and data for "Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization"
☆11May 16, 2023Updated 3 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
HappyColor / Vesper
View on GitHub
A Compact and Effective Pretrained Model for Speech Emotion Recognition
☆54Apr 10, 2026Updated 3 months ago
MengboLi / MS-SENet
View on GitHub
☆11Jul 16, 2024Updated 2 years ago
raining-dev / AVT2-DWF
View on GitHub
AVT2-DWF: Improving Deepfake Detection with Audio-Visual Fusion and Dynamic Weighting Strategies
☆23Mar 26, 2024Updated 2 years ago
DanielMengLiu / DeepLip
View on GitHub
deep-learning based audio-visual lip bometrics
☆15May 9, 2023Updated 3 years ago
URRealHero / JudgeAnything
View on GitHub
☆17Jun 1, 2025Updated last year
yevvonlim / kai-presentation
View on GitHub
Claude Code skill for KAI presentation design in HTML
☆15Mar 20, 2026Updated 4 months ago
mlvlab / ProMetaR
View on GitHub
Official implementation of CVPR 2024 paper "Prompt Learning via Meta-Regularization".
☆31Mar 10, 2025Updated last year
ahaliassos / usr2
View on GitHub
PyTorch implementation of USR 2.0 (ICLR 2026)
☆15Apr 3, 2026Updated 3 months ago
yangjingyuan / ConstDecoder
View on GitHub
☆11Oct 24, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
alexpovel / betterletter
View on GitHub
Substitute alternative spellings of special characters (e.g. German umlauts [ae, oe, ue] and [ss]) with their correct versions (ä, ö, ü, …
☆11Nov 24, 2024Updated last year
dksanyal / SpERT.PL
View on GitHub
Joint Neural Model for Entity & Relation Extraction
☆16Oct 18, 2021Updated 4 years ago
Sreyan88 / LipGER
View on GitHub
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
☆19Jul 16, 2024Updated 2 years ago
pritamqu / OOD-VSSL
View on GitHub
[NeurIPS 2023 (Spotlight)] Uncovering the Hidden Dynamics of Video Self-supervised Learning under Distribution Shifts
☆13Jan 30, 2024Updated 2 years ago
DianboWork / M3T-CNERTA
View on GitHub
☆11Aug 10, 2022Updated 3 years ago
Speech-Lab-IITM / data2vec-aqc
View on GitHub
Repository having the code and models from the paper: data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student traini…
☆13Mar 18, 2024Updated 2 years ago
rpeloff / multimodal_one_shot_learning
View on GitHub
Code recipe for "Multimodal One-Shot Learning of Speech and Images"
☆11Nov 21, 2018Updated 7 years ago