☆39Aug 26, 2025Updated 9 months ago
Alternatives and similar repositories for video-SALMONN-o1
Users that are interested in video-SALMONN-o1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repository for "IntentQA: Context-aware Video Intent Reasoning" from ICCV 2023.☆24Nov 29, 2024Updated last year
- ICML2025☆64Aug 28, 2025Updated 9 months ago
- Official repository of paper "LOVE-R1: Advancing Long Video Understanding with Adaptive Zoom-in Mechanism via Multi-Step Reasoning"☆24Nov 1, 2025Updated 6 months ago
- ☆10Feb 10, 2022Updated 4 years ago
- PeMS crawler☆15Jan 2, 2019Updated 7 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Official Github Repo for the Findings of EMNLP 2021 paper "An animated picture says at least a thousand words: Selecting Gif-based Replie…☆32Oct 2, 2021Updated 4 years ago
- Useful tools for pulling data from CalTrans-PeMS.☆11Jan 20, 2023Updated 3 years ago
- Cross-Self KV Cache Pruning for Efficient Vision-Language Inference☆10Dec 15, 2024Updated last year
- Official repository of Siggraph Asia 2025 paper "LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representa…☆26Dec 24, 2025Updated 5 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆95Jul 13, 2025Updated 10 months ago
- ☆13Apr 11, 2022Updated 4 years ago
- [AAAI 2024] UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning☆12Dec 10, 2023Updated 2 years ago
- The source code of ExFunTube☆10Aug 8, 2025Updated 9 months ago
- RelayGS: Reconstructing Dynamic Scenes with Large-Scale and Complex Motions via Relay Gaussians☆14Dec 5, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Vector Index Benchmark for Embeddings (VIBE) is an extensible benchmark for approximate nearest neighbor search methods, or vector index…☆38Mar 23, 2026Updated 2 months ago
- This repository contains the files related to the project on frame-by-frame Drowsiness Detection in Drivers in videos using facial featur…☆12Jul 22, 2024Updated last year
- [ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache☆43Jul 26, 2024Updated last year
- Hugging Face Transformers Course 笔记☆41May 1, 2022Updated 4 years ago
- YesBut - Multimodal Satire Comprehension Dataset☆19Oct 23, 2024Updated last year
- [CVPR 2025] Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation☆85Dec 24, 2025Updated 5 months ago
- HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering (CVPR'23)☆14Nov 4, 2025Updated 6 months ago
- [ICCV 2025 DeepID Challenge] Official 1st Place in both tracks (Detection & Localization)☆18Apr 4, 2026Updated last month
- [IJCV 2025] OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation☆15Feb 13, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [Neural Networks 2025] The official code for the paper "MNet: A Multi-Scale Network for Visible Watermark Removal."☆17Jun 16, 2025Updated 11 months ago
- [ICML 2026] LaST$_0$: Latent Spatio-Temporal Chain-of-Thought for Robotic Vision-Language-Action Model☆70Apr 30, 2026Updated 3 weeks ago
- We propose MMAD, a novel automated pipeline for precise AD generation. MMAD introduces ambient music alongside visual and linguistic, enh…☆17Dec 31, 2024Updated last year
- The demo for "Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem".☆12Oct 25, 2021Updated 4 years ago
- ☆15Updated this week
- CLAIR: A (surprisingly) simple semantic text metric with large language models.☆22Jan 28, 2024Updated 2 years ago
- [NeurIPS 2023] Official PyTorch implementation for the paper "CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganog…☆11Sep 28, 2023Updated 2 years ago
- (AAAI2024) Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving☆21Dec 20, 2023Updated 2 years ago
- UniVid: The Open-Source Unified Video Model☆32Oct 13, 2025Updated 7 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆13Sep 25, 2024Updated last year
- [ICLR'25] Official repository for "AVHBench: A Cross-Modal Hallucination Evaluation for Audio-Visual Large Language Models"☆24Mar 8, 2026Updated 2 months ago
- ☆15Jan 9, 2026Updated 4 months ago
- Official code for ICCV25 paper: "CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation""☆125Sep 1, 2025Updated 8 months ago
- [ICLR2026] Any-to-Bokeh is a novel one-step video bokeh framework that converts arbitrary input videos into temporally coherent, depth-aw…☆138Feb 4, 2026Updated 3 months ago
- [IJCAI 2022] Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds (official pytorch implementation)☆21Aug 31, 2022Updated 3 years ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆144Jul 24, 2025Updated 10 months ago