IVY-LVLM / Video-MA2MBAView external linksLinks
Official Implementation of Video-MA2MBA
☆12Dec 3, 2024Updated last year
Alternatives and similar repositories for Video-MA2MBA
Users that are interested in Video-MA2MBA are comparing it to the libraries listed below
Sorting:
- Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)☆20Mar 17, 2025Updated 10 months ago
- This repository contains the speaker labeled information of VoxCeleb2 and LRS3 audio-visual datasets. (AAAI 2025)☆12Sep 6, 2024Updated last year
- Pytorch implementation of "Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal T…☆12Mar 9, 2024Updated last year
- Visual Speech Recognition For Low-Resource Languages with Automatic Labels (ICASSP 2024)☆16Mar 17, 2025Updated 10 months ago
- All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment☆19Feb 11, 2025Updated last year
- Official PyTorch implementation for "MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens…☆45Jun 12, 2025Updated 8 months ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- Official code repository of Shuffle-R1☆25Jan 27, 2026Updated 2 weeks ago
- ☆14Jul 15, 2024Updated last year
- Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language (AAAI 2025)☆21Mar 17, 2025Updated 10 months ago
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆25Jun 4, 2025Updated 8 months ago
- This repo holds the implementation of PAVE: Patching and Adapting Video Large Language Models (CVPR2025)☆26Sep 6, 2025Updated 5 months ago
- TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages☆18May 23, 2024Updated last year
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Nov 15, 2025Updated 2 months ago
- ☆30Jan 18, 2026Updated 3 weeks ago
- Modification to YOLO for improving Dynamic Real-Time Processing on Robotics Operating Systems for Autonomous Vehicle System☆21Feb 16, 2022Updated 3 years ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Aug 5, 2024Updated last year
- [OpenReview] Official PyTorch Implementation for "Towards Adversarial Robustness of Bayesian Neural Network through Hierarchical Variatio…☆23Feb 15, 2022Updated 3 years ago
- ☆28Apr 8, 2025Updated 10 months ago
- [ACL 2024 Findings] Official PyTorch Implementation code for realizing the technical part of CoLLaVO: Crayon Large Language and Vision mO…☆99Jun 28, 2024Updated last year
- Robustly Converting Camera View from Normal View to Top View for Autonomous Vehicle System on Robotics Operating System (ROS)☆24Jan 29, 2020Updated 6 years ago
- OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆54Feb 1, 2026Updated last week
- Advanced Energy Control Management System (Advanced-ECMS) for Electrical Vehicle System using proposed Plus Version of Alternating Direct…☆29Feb 15, 2022Updated 3 years ago
- ☆28Oct 27, 2023Updated 2 years ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆13Jun 28, 2025Updated 7 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Updated this week
- Official PyTorch implementation for "Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech …☆32May 11, 2025Updated 9 months ago
- Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"☆57Jan 23, 2026Updated 3 weeks ago
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆52Jul 24, 2025Updated 6 months ago
- [ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics☆38Sep 10, 2025Updated 5 months ago
- [ICCV 2025] Dynamic-VLM☆28Dec 16, 2024Updated last year
- [NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…☆116May 30, 2024Updated last year
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆41Nov 15, 2024Updated last year
- [ICCV 2023] Official PyTorch Implementation for "Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial …☆31Oct 13, 2023Updated 2 years ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- ☆18Jun 10, 2025Updated 8 months ago
- The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…☆12Oct 14, 2024Updated last year
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆38Jan 26, 2026Updated 2 weeks ago
- Adapting LLaMA Decoder to Vision Transformer☆30May 20, 2024Updated last year