The Source Code for OmniVideoBench @ICLR 2026
☆72Feb 12, 2026Updated 3 months ago
Alternatives and similar repositories for OmniVideoBench
Users that are interested in OmniVideoBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is the official repository of Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities☆40Apr 28, 2026Updated 3 weeks ago
- https://avocado-captioner.github.io/☆34Oct 16, 2025Updated 7 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆49May 7, 2026Updated last week
- [CVPR 2026] OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆79Apr 20, 2026Updated 3 weeks ago
- [ Arxiv 2023 ] This repository contains the code for "MUPPET: Multi-Modal Few-Shot Temporal Action Detection"☆15Aug 30, 2023Updated 2 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Universal Video Temporal Grounding with Generative Multi-modal Large Language Models☆52Mar 20, 2026Updated last month
- [NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…☆40Feb 20, 2025Updated last year
- ☆16Sep 17, 2024Updated last year
- ☆20Apr 23, 2024Updated 2 years ago
- A project for tri-modal LLM benchmarking and instruction tuning.☆60Mar 27, 2025Updated last year
- [NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgent☆45Nov 30, 2025Updated 5 months ago
- ☆13Jun 2, 2022Updated 3 years ago
- Official Pytorch implementation of 'Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning'? (ICLR2024)☆13Mar 8, 2024Updated 2 years ago
- ☆28Mar 10, 2026Updated 2 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- PyTorch implementation of the paper Learning Multi-Level Representations for Hierarchical Music Structure Analysis presented at ISMIR 202…☆14Jan 2, 2023Updated 3 years ago
- SimX-OR: Extending Any Simulation Benchmark to Evaluate the Observational Robustness of VLA Models☆33Nov 4, 2025Updated 6 months ago
- (ICLR 2025) AgentRefine: Enhancing Agent Generalization through Refinement Tuning☆19Nov 22, 2025Updated 5 months ago
- [ICCV 2025] Official PyTorch Code for "Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval"☆17Aug 23, 2025Updated 8 months ago
- This repository contains the code for the paper “Neuro-Symbolic Query Compiler”, accepted to the Findings of ACL 2025.☆17Oct 20, 2025Updated 6 months ago
- SODA: Story Oriented Dense Video Captioning Evaluation Framework☆14May 3, 2024Updated 2 years ago
- FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-form-gradients☆14Jan 22, 2025Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆23Apr 10, 2026Updated last month
- More reliable Video Understanding Evaluation☆15Sep 23, 2025Updated 7 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆43Jan 16, 2026Updated 4 months ago
- Generative Multi-modal Models are Good Class Incremental Learners, CVPR 2024 [PyTorch Code]☆51Nov 21, 2024Updated last year
- Official code for DAM: Dynamic Adapter Merging for Continual Video QA Learning☆15Apr 25, 2024Updated 2 years ago
- A minimal JUCE console app to compare the performance of FIR filtering algorithms☆24Sep 7, 2021Updated 4 years ago
- Official code for DeepSound-V1☆12May 14, 2025Updated last year
- ☐ ☐ A simple, out-of-the-box and cross-platform bbox annotation tool by Python. Try it by `pip install easybox`☆10May 28, 2021Updated 4 years ago
- video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…☆189Feb 23, 2026Updated 2 months ago
- [𝐍𝐚𝐭𝐮𝐫𝐞 𝐂𝐨𝐦𝐩𝐮𝐭𝐚𝐭𝐢𝐨𝐧𝐚𝐥 𝐒𝐜𝐢𝐞𝐧𝐜𝐞] ⚡️ PSE/PSRN: Fast and efficient symbolic expression discovery through paralleliz…☆22Feb 3, 2026Updated 3 months ago
- Omni Model Benchmark with high quality and diversity, which reveals the Compositional Law. We’re now focused on Chinese scenarios — and a…☆78Jan 12, 2026Updated 4 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ICLR 2026] Official code repository for "⚡️VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration"☆42Feb 24, 2026Updated 2 months ago
- ☆33May 27, 2025Updated 11 months ago
- [MICCAI 2025] GL-LCM: Global-Local Latent Consistency Models for Fast High-Resolution Bone Suppression in Chest X-Ray Images☆15Mar 12, 2026Updated 2 months ago
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Apr 18, 2026Updated last month
- UMB: Understanding Model Behavior for Open-World object Detection (NeurIPS 2024)☆11May 26, 2024Updated last year
- 🔥🔥[NeurIPS2025]Exploring and mitigating semantic hallucinations in scene text perception and reasoning☆28Dec 11, 2025Updated 5 months ago
- A dataset of Ottoman-Turkish makam music to test makam recognition (and tonic identification) methodologies☆19May 31, 2021Updated 4 years ago