IVY-LVLM/Video-MA2MBA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/IVY-LVLM/Video-MA2MBA)

IVY-LVLM / Video-MA2MBA

Official Implementation of Video-MA2MBA

☆12

Alternatives and similar repositories for Video-MA2MBA

Users that are interested in Video-MA2MBA are comparing it to the libraries listed below

Sorting:

JeongHun0716 / e-mvsr
View on GitHub
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)
☆20Mar 17, 2025Updated 11 months ago
JeongHun0716 / VoxLRS-SA
View on GitHub
This repository contains the speaker labeled information of VoxCeleb2 and LRS3 audio-visual datasets. (AAAI 2025)
☆13Sep 6, 2024Updated last year
ms-dot-k / Image-to-Speech
View on GitHub
Pytorch implementation of "Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal T…
☆12Mar 9, 2024Updated last year
JeongHun0716 / vsr-low
View on GitHub
Visual Speech Recognition For Low-Resource Languages with Automatic Labels (ICASSP 2024)
☆16Mar 17, 2025Updated 11 months ago
983632847 / All-in-One
View on GitHub
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
☆19Feb 11, 2025Updated last year
JeongHun0716 / MMS-LLaMA
View on GitHub
Official PyTorch implementation for "MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens…
☆46Jun 12, 2025Updated 8 months ago
yliu-cs / PiTe
View on GitHub
[ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model
☆17Feb 13, 2025Updated last year
XenoZLH / Shuffle-R1
View on GitHub
Official code repository of Shuffle-R1
☆25Feb 23, 2026Updated last week
KarlesZheng / FERMT
View on GitHub
☆14Jul 15, 2024Updated last year
JeongHun0716 / Personalized-Lip-Reading
View on GitHub
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language (AAAI 2025)
☆22Mar 17, 2025Updated 11 months ago
Andy-Cheng / TEMPURA
View on GitHub
TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…
☆25Jun 4, 2025Updated 9 months ago
dragonlzm / PAVE
View on GitHub
This repo holds the implementation of PAVE: Patching and Adapting Video Large Language Models (CVPR2025)
☆26Sep 6, 2025Updated 6 months ago
mlvlab / DeepVideoR1
View on GitHub
[NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"
☆31Feb 22, 2026Updated last week
ms-dot-k / TMT
View on GitHub
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
☆18May 23, 2024Updated last year
lzw-lzw / UnifiedMLLM
View on GitHub
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
☆22Aug 5, 2024Updated last year
MacavityT / REF-VLM
View on GitHub
☆30Jan 18, 2026Updated last month
ByungKwanLee / YOLO-Dyanmic-ROS
View on GitHub
Modification to YOLO for improving Dynamic Real-Time Processing on Robotics Operating Systems for Autonomous Vehicle System
☆21Feb 16, 2022Updated 4 years ago
SHI-Labs / Slow-Fast-Video-Multimodal-LLM
View on GitHub
☆28Apr 8, 2025Updated 10 months ago
ByungKwanLee / Hierarchical-Bayesian-Defense
View on GitHub
[OpenReview] Official PyTorch Implementation for "Towards Adversarial Robustness of Bayesian Neural Network through Hierarchical Variatio…
☆23Feb 15, 2022Updated 4 years ago
ByungKwanLee / CoLLaVO
View on GitHub
[ACL 2024 Findings] Official PyTorch Implementation code for realizing the technical part of CoLLaVO: Crayon Large Language and Vision mO…
☆99Jun 28, 2024Updated last year
ByungKwanLee / Robust-TopView
View on GitHub
Robustly Converting Camera View from Normal View to Top View for Autonomous Vehicle System on Robotics Operating System (ROS)
☆24Jan 29, 2020Updated 6 years ago
HusterYoung / MPLT
View on GitHub
☆28Oct 27, 2023Updated 2 years ago
LLMSQL / llmsql-benchmark
View on GitHub
A Text2SQL benchmark for evaluation of Large Language Models
☆41Updated this week
KD-TAO / OmniZip
View on GitHub
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
☆56Feb 1, 2026Updated last month
ByungKwanLee / Adavanced-ECMS
View on GitHub
Advanced Energy Control Management System (Advanced-ECMS) for Electrical Vehicle System using proposed Plus Version of Alternating Direct…
☆29Feb 15, 2022Updated 4 years ago
Yarayx / livelongbench
View on GitHub
The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…
☆12Jun 28, 2025Updated 8 months ago
JeongHun0716 / zero-avsr
View on GitHub
Official PyTorch implementation for "Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech …
☆33May 11, 2025Updated 9 months ago
qishisuren123 / AnyCap
View on GitHub
A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…
☆52Jul 24, 2025Updated 7 months ago
Hon-Wong / ByteVideoLLM
View on GitHub
[ICCV 2025] Dynamic-VLM
☆28Dec 16, 2024Updated last year
joslefaure / HERMES
View on GitHub
[ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
☆38Sep 10, 2025Updated 5 months ago
ByungKwanLee / Meteor
View on GitHub
[NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…
☆116May 30, 2024Updated last year
2toinf / IVM
View on GitHub
[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"
☆42Nov 15, 2024Updated last year
ByungKwanLee / Double-Debiased-Adversary
View on GitHub
[ICCV 2023] Official PyTorch Implementation for "Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial …
☆31Oct 13, 2023Updated 2 years ago
haowei-freesky / HERMES
View on GitHub
Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"
☆57Jan 23, 2026Updated last month
Zhao-Jianing-SUDA / Hawkeye
View on GitHub
The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…
☆12Oct 14, 2024Updated last year
wln20 / CSKV
View on GitHub
[NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios
☆16Oct 18, 2024Updated last year
marinero4972 / CyberV
View on GitHub
☆18Jun 10, 2025Updated 8 months ago
techmonsterwang / iLLaMA
View on GitHub
Adapting LLaMA Decoder to Vision Transformer
☆30May 20, 2024Updated last year
ahaliassos / usr
View on GitHub
Official implementation of USR (NeurIPS 2024)
☆39Dec 21, 2024Updated last year