☆39Aug 26, 2025Updated 7 months ago
Alternatives and similar repositories for video-SALMONN-o1
Users that are interested in video-SALMONN-o1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repository for "IntentQA: Context-aware Video Intent Reasoning" from ICCV 2023.☆24Nov 29, 2024Updated last year
- Machine Learning Course From Scratch☆13Jul 24, 2024Updated last year
- ☆40Dec 19, 2025Updated 3 months ago
- ICML2025☆65Aug 28, 2025Updated 7 months ago
- Cross-Self KV Cache Pruning for Efficient Vision-Language Inference☆10Dec 15, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Jupyter Notebooks from book UNDERSTANDING DEEP LEARNING (Prof Simon Prince) that I could solve.☆13Mar 20, 2024Updated 2 years ago
- Official repository of Siggraph Asia 2025 paper "LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representa…☆26Dec 24, 2025Updated 3 months ago
- [ICASSP2025] Official code for VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis☆52Apr 9, 2025Updated 11 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆90Jul 13, 2025Updated 8 months ago
- OmniSVG: A Unified Scalable Vector Graphics Generation Model,you can try it in ComfyUI☆28Dec 5, 2025Updated 3 months ago
- The official implementation of the paper **LVChat: Facilitating Long Video Comprehension**☆14Apr 15, 2024Updated last year
- [AAAI 2024] UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning☆12Dec 10, 2023Updated 2 years ago
- [ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache☆43Jul 26, 2024Updated last year
- [CVPR 2025] Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation☆83Dec 24, 2025Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- YesBut - Multimodal Satire Comprehension Dataset☆18Oct 23, 2024Updated last year
- HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering (CVPR'23)☆14Nov 4, 2025Updated 4 months ago
- [ICCV 2025 DeepID Challenge] Official 1st Place in both tracks (Detection & Localization)☆17Dec 24, 2025Updated 3 months ago
- [ICLR'25] Official repository for "AVHBench: A Cross-Modal Hallucination Evaluation for Audio-Visual Large Language Models"☆20Mar 8, 2026Updated 3 weeks ago
- ☆22Sep 16, 2025Updated 6 months ago
- [Neural Networks 2025] The official code for the paper "MNet: A Multi-Scale Network for Visible Watermark Removal."☆17Jun 16, 2025Updated 9 months ago
- EmoCAST: Emotional Talking Portrait via Emotive Text Description☆29Dec 23, 2025Updated 3 months ago
- The demo for "Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem".☆12Oct 25, 2021Updated 4 years ago
- ☆15Mar 16, 2026Updated last week
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- UniVid: The Open-Source Unified Video Model☆30Oct 13, 2025Updated 5 months ago
- [NeurIPS 2023] Official PyTorch implementation for the paper "CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganog…☆11Sep 28, 2023Updated 2 years ago
- (AAAI2024) Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving☆21Dec 20, 2023Updated 2 years ago
- [ICLR 2025] Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs☆19Mar 20, 2025Updated last year
- ☆15Jan 9, 2026Updated 2 months ago
- Official code for ICCV25 paper: "CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation""☆121Sep 1, 2025Updated 6 months ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆128Jul 24, 2025Updated 8 months ago
- CVPR 24 paper: Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs☆14Mar 19, 2024Updated 2 years ago
- Segment Anything (SAM) at Home web app using Gradio☆14Aug 7, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [CVPR 2021] Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection☆27Jul 13, 2022Updated 3 years ago
- Professor and Group List of CS☆10Mar 12, 2024Updated 2 years ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆44Mar 6, 2026Updated 3 weeks ago
- Code and dataset release for "PACS: A Dataset for Physical Audiovisual CommonSense Reasoning" (ECCV 2022)☆17Dec 20, 2022Updated 3 years ago
- Adaptive Multimodal Reasoning via Reinforcement Learning☆23Jan 11, 2026Updated 2 months ago
- 多变量时序预测transformer☆17Sep 13, 2022Updated 3 years ago
- SKT A.X LLM 3.1☆13Jul 24, 2025Updated 8 months ago