Ahnsun/merlin

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Ahnsun/merlin)

Ahnsun / merlin

[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds

☆97

Alternatives and similar repositories for merlin

Users that are interested in merlin are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Ahnsun / LTrack
View on GitHub
Official implementation of the paper "LTrack: Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Rep…
☆12Jul 26, 2023Updated 2 years ago
linkangheng / Video-UTR
View on GitHub
[ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs
☆61Feb 27, 2025Updated last year
Open-Reasoner-Zero / Open-Vision-Reasoner
View on GitHub
[NeurIPS 2025] The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reason…
☆157Sep 12, 2025Updated 10 months ago
kennymckormick / ARAS-Dataset
View on GitHub
☆11Nov 5, 2024Updated last year
WayneMao / RoboMatrix
View on GitHub
The Official Implementation of RoboMatrix
☆108May 19, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
LancasterLi / RefSAM
View on GitHub
☆28Oct 31, 2024Updated last year
chen-si-jia / ReaMOT
View on GitHub
🚀 Reasoning-based Multi-Object Tracking
☆26Apr 30, 2026Updated 2 months ago
IVGSZ / Flash-VStream
View on GitHub
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
☆285Oct 15, 2025Updated 9 months ago
CeeZh / LLoVi
View on GitHub
Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
☆106Oct 27, 2024Updated last year
anonymous0769 / DreamVideo
View on GitHub
☆17Jul 30, 2024Updated last year
linkangheng / PR1
View on GitHub
[NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
☆289Jul 15, 2025Updated last year
yuangpeng / dreambench_plus
View on GitHub
[ICLR 2025] Official code implementation of DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
☆137Feb 23, 2025Updated last year
qiujihao19 / Artemis
View on GitHub
[NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videos
☆27Apr 8, 2025Updated last year
MinghanLi / MDQE_CVPR2023
View on GitHub
Code release for "MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos"(CVPR2023)
☆15Dec 14, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Aurora-slz / MM-Verify
View on GitHub
☆19Oct 28, 2025Updated 8 months ago
MinghanLi / BoxVIS
View on GitHub
Code release for "BoxVIS: Video Instance Segmentation with Box Annotation"
☆12Dec 22, 2023Updated 2 years ago
Rh-Dang / ECBench
View on GitHub
A Holistic Embodied Cognition Benchmark
☆18Apr 3, 2025Updated last year
ioanacroi / longmoment-detr
View on GitHub
Moment Detection in Long Tutorial Videos
☆20May 8, 2024Updated 2 years ago
SooLab / CoTDet
View on GitHub
[ICCV2023] CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection
☆19Apr 23, 2025Updated last year
wenhe-jia / TIVE
View on GitHub
☆11Jan 18, 2024Updated 2 years ago
NVlabs / FRAG
View on GitHub
☆15Apr 25, 2025Updated last year
aniki-ly / FlowZero
View on GitHub
FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax
☆18Nov 23, 2023Updated 2 years ago
yuweihao / MM-Vet
View on GitHub
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
☆329Jan 20, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
CASIA-IVA-Lab / VAST
View on GitHub
[NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
☆302Mar 14, 2024Updated 2 years ago
huangb23 / VTimeLLM
View on GitHub
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
☆295Jun 13, 2024Updated 2 years ago
RuizeHan / CVMHT
View on GitHub
CVMHT : Complementary-View Multiple Human Tracking (AAAI 2020).
☆10Dec 9, 2021Updated 4 years ago
CASIA-IVA-Lab / VideoNIAH
View on GitHub
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆57Mar 9, 2025Updated last year
Hon-Wong / Elysium
View on GitHub
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
☆88Oct 25, 2024Updated last year
wudongming97 / OnlineRefer
View on GitHub
[ICCV 2023] OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation
☆58Oct 7, 2023Updated 2 years ago
Yui010206 / SeViLA
View on GitHub
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
☆198Jan 14, 2024Updated 2 years ago
Tencent-QQMM / PureMM
View on GitHub
☆21Feb 29, 2024Updated 2 years ago
sukjunhwang / VITA
View on GitHub
VITA: Video Instance Segmentation via Object Token Association (NeurIPS 2022)
☆107Jan 4, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
WesLee88524 / LG-MOT
View on GitHub
Multi-Granularity Language-Guided Multi-Object Tracking
☆26Nov 3, 2025Updated 8 months ago
jshilong / GPT4RoI
View on GitHub
(ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆556Jun 3, 2025Updated last year
zyxElsa / MotionCrafter
View on GitHub
Official implementation of the paper "MotionCrafter: One-Shot Motion Customization of Diffusion Models"
☆29Jan 4, 2024Updated 2 years ago
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆963Aug 5, 2025Updated 11 months ago
mbzuai-oryx / VideoGPT-plus
View on GitHub
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
☆293Aug 5, 2025Updated 11 months ago
Ali2500 / BURST-benchmark
View on GitHub
☆81Aug 19, 2023Updated 2 years ago
RUCAIBox / Event-Bench
View on GitHub
Official code of *Towards Event-oriented Long Video Understanding*
☆12Jul 26, 2024Updated last year