☆39Aug 26, 2025Updated 6 months ago
Alternatives and similar repositories for video-SALMONN-o1
Users that are interested in video-SALMONN-o1 are comparing it to the libraries listed below
Sorting:
- Official repository for "IntentQA: Context-aware Video Intent Reasoning" from ICCV 2023.☆23Nov 29, 2024Updated last year
- [IJCAI 2022] Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds (official pytorch implementation)☆21Aug 31, 2022Updated 3 years ago
- Pytorch implementation for Egoinstructor at CVPR 2024☆28Dec 1, 2024Updated last year
- (AAAI2024) Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving☆21Dec 20, 2023Updated 2 years ago
- Official implementation of "A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives", accepted at CVPR 2…☆24Jun 13, 2024Updated last year
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆88Jul 13, 2025Updated 7 months ago
- [Communication in Transprotation Reasearch] Official PyTorch Implementation of ''GPT-4 enhanced multimodal grounding for autonomous driv…☆26Nov 11, 2024Updated last year
- [CVPR 2021] Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection☆27Jul 13, 2022Updated 3 years ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆122Jul 24, 2025Updated 7 months ago
- Official Github Repo for the Findings of EMNLP 2021 paper "An animated picture says at least a thousand words: Selecting Gif-based Replie…☆32Oct 2, 2021Updated 4 years ago
- [Neural Networks 2025] The official code for the paper "MNet: A Multi-Scale Network for Visible Watermark Removal."☆17Jun 16, 2025Updated 8 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆42Updated this week
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- Official repository of Siggraph Asia 2025 paper "LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representa…☆26Dec 24, 2025Updated 2 months ago
- Egocentric Video Understanding Dataset (EVUD)☆33Jul 4, 2024Updated last year
- MiniGPT-Pancreas: Multimodal Large language Model for Pancreas Cancer Classification and Detection☆11Sep 19, 2025Updated 5 months ago
- ☆16Jul 20, 2025Updated 7 months ago
- CN Dota, Best Dota.☆11Dec 14, 2020Updated 5 years ago
- 一个面向中国学生(尤其受10043政策影响)的香港、澳门、新加坡等地区导师信息库。An open-source database of professors in HK/MO/SG/etc. for Chinese students (esp. those affected…☆38Nov 26, 2025Updated 3 months ago
- Code repository supporting the paper "Auto-Generating Weak Labels for Real & Synthetic Data to Improve Label-Scarce Medical Image Segment…☆11Apr 29, 2024Updated last year
- A semi print-in-place hand for human-like manipulation, designed to be built by anyone.☆17Jan 5, 2026Updated 2 months ago
- [IROS 2025] EgoLoc: Zero-Shot Temporal Interaction Localization for Egocentric Videos☆33Jan 13, 2026Updated last month
- Reimplemention of "Mask-Guided Attention Network for Occluded Pedestrian Detection" based on mmdetection toolbox☆10Aug 20, 2020Updated 5 years ago
- A benchmark dataset designed to support the development and evaluation of large language models (LLMs) for conversational mental health a…☆17Feb 24, 2025Updated last year
- ☆33Feb 26, 2026Updated last week
- Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformer☆16Sep 7, 2024Updated last year
- Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms☆11Nov 29, 2021Updated 4 years ago
- Gaussian Splatting for Robotic Simulation☆22Nov 7, 2025Updated 4 months ago
- [Computers & Graphics 2021] Pair-wise Relation Module for 3D Object Detection☆14Mar 6, 2022Updated 4 years ago
- ☆20Jul 23, 2025Updated 7 months ago
- Professor and Group List of CS☆10Mar 12, 2024Updated last year
- Implementation of Boundary Attributions for Normal (Vector) Explanations☆11Aug 13, 2021Updated 4 years ago
- PyPi package for KaniTTS-2 model☆54Feb 14, 2026Updated 3 weeks ago
- High-performance ASR tool using Faster Whisper, supporting custom models, multi-language transcription, and real-time processing feedback…☆10Sep 17, 2025Updated 5 months ago
- [ICLR 2025] Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception☆14Jul 4, 2025Updated 8 months ago
- Open-source API for Touch Sensors☆13Aug 19, 2025Updated 6 months ago
- [ECCV 2020] Official Matlab implementation of rOSD: Toward unsupervised, multi-object discovery in large-scale image collections.☆10Nov 4, 2021Updated 4 years ago
- ☆25Oct 13, 2025Updated 4 months ago
- Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud Analysis (ACCV 2022)☆10Jul 22, 2024Updated last year