OuyangKun10/Conan

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OuyangKun10/Conan)

OuyangKun10 / Conan

Multi-step reasoning MLLM

☆25

Alternatives and similar repositories for Conan

Users that are interested in Conan are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OuyangKun10 / SpaceR
View on GitHub
SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning
☆111Jul 9, 2025Updated last year
Koreyoshi01 / VISD
View on GitHub
This repository is the official implementation for VISD.
☆22May 17, 2026Updated 2 months ago
TIGER-AI-Lab / VideoEval-Pro
View on GitHub
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation [TMLR26]
☆15Jun 1, 2026Updated last month
MAC-AutoML / WFS-SB
View on GitHub
[CVPR 2026] Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding
☆32Apr 12, 2026Updated 3 months ago
MANGA-UOFA / TokMem
View on GitHub
☆25Mar 14, 2026Updated 4 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
llyx97 / video_reason_bench
View on GitHub
[ICLR 2026] "VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?", Yuanxin Liu, Kun Ouyang, Haoning Wu, Yi Liu, L…
☆41Jan 30, 2026Updated 5 months ago
lcqysl / FrameThinker
View on GitHub
[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"
☆50Oct 9, 2025Updated 9 months ago
MCG-NJU / Video-o3
View on GitHub
[ICML 2026] Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning
☆134Jul 2, 2026Updated 3 weeks ago
zsgvivo / VideoZoomer
View on GitHub
☆34Feb 12, 2026Updated 5 months ago
Haiyang0226 / Symphony
View on GitHub
code of cvpr26 paper Symphony
☆17Apr 7, 2026Updated 3 months ago
KD-TAO / LVOmniBench
View on GitHub
LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs
☆41Apr 2, 2026Updated 3 months ago
KD-TAO / OmniAgent
View on GitHub
OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding
☆22Apr 9, 2026Updated 3 months ago
huawei-lin / Agent-Omni
View on GitHub
The official implementation for the paper "Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything".
☆23Nov 5, 2025Updated 8 months ago
mlvlab / VidChain
View on GitHub
Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Capti…
☆25Jan 26, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
wangruohui / EfficientVideoAgent
View on GitHub
EVA: Efficient Reinforcement Learning for End-to-End Video Agent
☆26May 6, 2026Updated 2 months ago
EIT-NLP / StreamingLLM
View on GitHub
Repository of Streaming LLMs
☆90Jun 20, 2026Updated last month
chenlong-clock / RULE-Unlearn
View on GitHub
[NeurIPS25] RULE: Reinforcement UnLEarning Achieves Forge-retain Pareto Optimality
☆20Oct 22, 2025Updated 9 months ago
chuntianli666 / CrossVid
View on GitHub
[AAAI 2026] CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models
☆23Jul 9, 2026Updated 2 weeks ago
OpenGVLab / VRBench
View on GitHub
[ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos
☆28Jun 4, 2026Updated last month
HuiGuanLab / RaTSG
View on GitHub
This is a repository contains the implementation of our NeurIPS'24 paper "Temporal Sentence Grounding with Relevance Feedback in Videos"
☆13Aug 22, 2025Updated 11 months ago
MILVLG / videoarm
View on GitHub
☆27Apr 9, 2026Updated 3 months ago
OpenGVLab / VKnowU
View on GitHub
[ECCV 2026] VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs
☆15Feb 3, 2026Updated 5 months ago
jylins / videoseek
View on GitHub
[CVPR 2026] VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking
☆64Mar 23, 2026Updated 4 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
louieworth / trd
View on GitHub
Official Implementation of Trajectory-Refined Distillation
☆29Jun 9, 2026Updated last month
nusnlp / d2vlm
View on GitHub
[ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models
☆24Apr 18, 2026Updated 3 months ago
maifoundations / Visionary-R1
View on GitHub
Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
☆44Jul 2, 2025Updated last year
LiBingyu01 / U3M
View on GitHub
[Pattern Recognition 2025 🌟]Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation
☆10Jun 12, 2024Updated 2 years ago
adxcreative / COPE
View on GitHub
☆15Dec 20, 2024Updated last year
lucidrains / AoA-pytorch
View on GitHub
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
☆43Nov 8, 2020Updated 5 years ago
dingyue772 / OmniSIFT
View on GitHub
[ICML2026] OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models
☆25May 21, 2026Updated 2 months ago
Lux0926 / ASPRM
View on GitHub
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence
☆10Mar 2, 2025Updated last year
renjie-liang / HUAL
View on GitHub
Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning
☆15Dec 12, 2023Updated 2 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
XiaoJinNK / FCMNet
View on GitHub
Jin, Xiao, et al. "FCMNet: Frequency-aware cross-modality attention networks for RGB-D salient object detection." Neurocomputing 491 (202…
☆11Apr 11, 2024Updated 2 years ago
EvolvingLMMs-Lab / MGPO
View on GitHub
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
☆55Jul 23, 2025Updated last year
I-ESC / Project-Ava
View on GitHub
An implementation of Paper "Empowering Agentic Video Analytics Systems with Video Language Models"
☆31Nov 5, 2025Updated 8 months ago
OpenGVLab / TPO
View on GitHub
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆65Jul 22, 2025Updated last year
Dixin-Lab / Automatic-Movie-Trailer-Generator
View on GitHub
[ACMMM 2024] An Inverse Partial Optimal Transport Framework for Music-guided Movie Trailer Generation
☆16Mar 15, 2025Updated last year
Svardfox / LaViT
View on GitHub
Official codebase for the paper LaViT
☆34Feb 15, 2026Updated 5 months ago
QCius / DataBaseLab-XJTU
View on GitHub
DataBaseLab,XJTU 西交数据库实验
☆14Jun 25, 2024Updated 2 years ago