☆40Aug 26, 2025Updated 9 months ago
Alternatives and similar repositories for video-SALMONN-o1
Users that are interested in video-SALMONN-o1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆39Dec 19, 2025Updated 5 months ago
- ICML2025☆65Aug 28, 2025Updated 9 months ago
- Official repository of paper "LOVE-R1: Advancing Long Video Understanding with Adaptive Zoom-in Mechanism via Multi-Step Reasoning"☆24Nov 1, 2025Updated 7 months ago
- [ACL2023, Findings] Source codes for the paper "Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduc…☆16Feb 22, 2025Updated last year
- Official Github Repo for the Findings of EMNLP 2021 paper "An animated picture says at least a thousand words: Selecting Gif-based Replie…☆32Oct 2, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Constrained learning using boxes for event-event relation extraction☆12Aug 5, 2022Updated 3 years ago
- OmniSVG: A Unified Scalable Vector Graphics Generation Model,you can try it in ComfyUI☆29Dec 5, 2025Updated 6 months ago
- The official implementation of the paper **LVChat: Facilitating Long Video Comprehension**☆14Apr 15, 2024Updated 2 years ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆94Jul 13, 2025Updated 11 months ago
- The source code of ExFunTube☆10Aug 8, 2025Updated 10 months ago
- Official Repo of "Flow-OPD: On-Policy Distillation for Flow Matching Models"☆238Jun 7, 2026Updated last week
- RelayGS: Reconstructing Dynamic Scenes with Large-Scale and Complex Motions via Relay Gaussians☆14Dec 5, 2024Updated last year
- [ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache☆43Jul 26, 2024Updated last year
- Hugging Face Transformers Course 笔记☆41May 1, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- YesBut - Multimodal Satire Comprehension Dataset☆19Oct 23, 2024Updated last year
- Deep neural network architecture for representing robot experiences in an episodic-like memory which facilitates encoding, recalling, and…☆15Sep 12, 2018Updated 7 years ago
- [CVPR 2025] Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation☆85Dec 24, 2025Updated 5 months ago
- ☆22Sep 16, 2025Updated 9 months ago
- [ICML 2026] LaST$_0$: Latent Spatio-Temporal Chain-of-Thought for Robotic Vision-Language-Action Model☆78Apr 30, 2026Updated last month
- [Neural Networks 2025] The official code for the paper "MNet: A Multi-Scale Network for Visible Watermark Removal."☆17Jun 16, 2025Updated last year
- We propose MMAD, a novel automated pipeline for precise AD generation. MMAD introduces ambient music alongside visual and linguistic, enh…☆17Dec 31, 2024Updated last year
- [ACL 2024] FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model☆17Apr 28, 2025Updated last year
- CLAIR: A (surprisingly) simple semantic text metric with large language models.☆22Jan 28, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [NeurIPS 2023] Official PyTorch implementation for the paper "CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganog…☆11Sep 28, 2023Updated 2 years ago
- [ICLR 2025] Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs☆19Mar 20, 2025Updated last year
- UniVid: The Open-Source Unified Video Model☆32Oct 13, 2025Updated 8 months ago
- ☆13Sep 25, 2024Updated last year
- ☆15Jan 9, 2026Updated 5 months ago
- [ICLR'25] Official repository for "AVHBench: A Cross-Modal Hallucination Evaluation for Audio-Visual Large Language Models"☆25Mar 8, 2026Updated 3 months ago
- Faithfully Explainable Recommendation via Neural Logic Reasoning☆16May 3, 2021Updated 5 years ago
- [IJCAI 2022] Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds (official pytorch implementation)☆21Aug 31, 2022Updated 3 years ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆145Jul 24, 2025Updated 10 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Segment Anything (SAM) at Home web app using Gradio☆14Aug 7, 2023Updated 2 years ago
- CVPR 24 paper: Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs☆14Mar 19, 2024Updated 2 years ago
- Some articles by flipradio anchor --- Li HouChen☆17Mar 26, 2025Updated last year
- Official code for SongEcho☆64Mar 3, 2026Updated 3 months ago
- Code and dataset release for "PACS: A Dataset for Physical Audiovisual CommonSense Reasoning" (ECCV 2022)☆18Dec 20, 2022Updated 3 years ago
- Official implementation of paper "OED: Towards One-stage End-to-End Dynamic Scene Graph Generation".☆29Mar 26, 2024Updated 2 years ago
- ☆42May 5, 2026Updated last month