The Source Code for OmniVideoBench @ICLR 2026
☆73Feb 12, 2026Updated 3 months ago
Alternatives and similar repositories for OmniVideoBench
Users that are interested in OmniVideoBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is the official repository of Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities☆42Apr 28, 2026Updated last month
- [CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding☆63Mar 16, 2026Updated 2 months ago
- https://avocado-captioner.github.io/☆36Oct 16, 2025Updated 7 months ago
- Awesome Audio-Visual Intelligence, Survey of Audio-Visual Intelligence☆77May 8, 2026Updated last month
- [ Arxiv 2023 ] This repository contains the code for "MUPPET: Multi-Modal Few-Shot Temporal Action Detection"☆16Aug 30, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Universal Video Temporal Grounding with Generative Multi-modal Large Language Models☆55May 20, 2026Updated 2 weeks ago
- [NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…☆40Feb 20, 2025Updated last year
- ☆16May 18, 2026Updated 3 weeks ago
- ☆20Apr 23, 2024Updated 2 years ago
- ☆16Aug 8, 2023Updated 2 years ago
- A project for tri-modal LLM benchmarking and instruction tuning.☆61Mar 27, 2025Updated last year
- ☆12Jun 12, 2024Updated last year
- [NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgent☆48Nov 30, 2025Updated 6 months ago
- Official Pytorch implementation of 'Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning'? (ICLR2024)☆13Mar 8, 2024Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A non-official re-implementation of article "[ECCV 18] Image Inpainting for Irregular Holes Using Partial Convolutions"☆12Mar 1, 2025Updated last year
- ☆28Mar 10, 2026Updated 2 months ago
- SimX-OR: Extending Any Simulation Benchmark to Evaluate the Observational Robustness of VLA Models☆33Nov 4, 2025Updated 7 months ago
- SODA: Story Oriented Dense Video Captioning Evaluation Framework☆14May 3, 2024Updated 2 years ago
- FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-form-gradients☆14Jan 22, 2025Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆23Apr 10, 2026Updated last month
- SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability☆17May 8, 2025Updated last year
- VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation [TMLR26]☆15Jun 1, 2026Updated last week
- Official repository for the ICCV2023 paper SAFE: Sensitivity-Aware Features for Out-of-Distribution Object Detection☆14Jul 28, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning☆28Oct 30, 2024Updated last year
- F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electr…☆37Jul 3, 2025Updated 11 months ago
- Official code for DAM: Dynamic Adapter Merging for Continual Video QA Learning☆15Apr 25, 2024Updated 2 years ago
- A minimal JUCE console app to compare the performance of FIR filtering algorithms☆24Sep 7, 2021Updated 4 years ago
- build vgg16 with pytorch 0.4.0 for classification of CIFAR datasets☆10Mar 31, 2019Updated 7 years ago
- Official code for DeepSound-V1☆12May 14, 2025Updated last year
- ☐ ☐ A simple, out-of-the-box and cross-platform bbox annotation tool by Python. Try it by `pip install easybox`☆10May 28, 2021Updated 5 years ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆146Dec 26, 2024Updated last year
- Score-aligned loudness, beat, and expressive markings data for 2000 Chopin Mazurka recordings☆14Jul 6, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…☆194Feb 23, 2026Updated 3 months ago
- [𝐍𝐚𝐭𝐮𝐫𝐞 𝐂𝐨𝐦𝐩𝐮𝐭𝐚𝐭𝐢𝐨𝐧𝐚𝐥 𝐒𝐜𝐢𝐞𝐧𝐜𝐞] ⚡️ PSE/PSRN: Fast and efficient symbolic expression discovery through paralleliz…☆22May 17, 2026Updated 3 weeks ago
- [ICLR 2026] Official code repository for "⚡️VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration"☆49Feb 24, 2026Updated 3 months ago
- [MICCAI 2025] GL-LCM: Global-Local Latent Consistency Models for Fast High-Resolution Bone Suppression in Chest X-Ray Images☆15Mar 12, 2026Updated 2 months ago
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Apr 18, 2026Updated last month
- UMB: Understanding Model Behavior for Open-World object Detection (NeurIPS 2024)☆12May 26, 2024Updated 2 years ago
- Official PyTorch implementation of CVPR2022 paper “Learning to Imagine: Diversify Memory for Incremental Learning using Unlabeled Data”☆13Jul 25, 2022Updated 3 years ago