wenhaochai/aurora

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wenhaochai/aurora)

wenhaochai / aurora

[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

☆147

Alternatives and similar repositories for aurora

Users that are interested in aurora are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Espere-1119-Song / Video-MMLU
View on GitHub
A Massive Multi-Discipline Lecture Understanding Benchmark
☆34Apr 20, 2026Updated 3 months ago
GXYM / VCapsBench
View on GitHub
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
☆20Jun 2, 2025Updated last year
wenhaochai / claude-plugins
View on GitHub
Personal Claude Code plugin marketplace
☆16Jul 21, 2026Updated last week
wenhaochai / MovieChat
View on GitHub
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
☆706Jan 29, 2025Updated last year
Espere-1119-Song / VideoNSA
View on GitHub
VideoNSA: Native Sparse Attention Scales Video Understanding
☆88Nov 16, 2025Updated 8 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
wenhaochai / STEVE
View on GitHub
[ECCV 2024] STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment
☆41Dec 27, 2023Updated 2 years ago
DCDmllm / Momentor
View on GitHub
☆81Nov 24, 2024Updated last year
Share14 / ShareGemini
View on GitHub
☆32Jul 29, 2024Updated 2 years ago
wenhaochai / PoseDA
View on GitHub
[ICCV 2023] Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation
☆24Aug 26, 2023Updated 2 years ago
Andy-Cheng / TEMPURA
View on GitHub
TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…
☆27Jun 4, 2025Updated last year
gyxxyg / TRACE
View on GitHub
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆156Aug 22, 2025Updated 11 months ago
llyx97 / TempCompass
View on GitHub
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆133Apr 4, 2025Updated last year
yunlong10 / CAT-V
View on GitHub
[AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…
☆68Jan 27, 2026Updated 6 months ago
jiyt17 / IDA-VLM
View on GitHub
[ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
☆37Nov 27, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
OpenGVLab / VideoChat-Flash
View on GitHub
[ICLR2026] VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
☆526Jul 19, 2026Updated last week
Jialuo-Li / Science-T2I
View on GitHub
[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis
☆62Mar 31, 2026Updated 3 months ago
yale-nlp / TOMATO
View on GitHub
☆42Nov 8, 2024Updated last year
longvideobench / LongVideoBench
View on GitHub
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆134Jul 27, 2024Updated 2 years ago
ziqipang / MR-Video
View on GitHub
MR. Video: MapReduce is the Principle for Long Video Understanding
☆31Jun 18, 2026Updated last month
vision-x-nyu / thinking-in-space
View on GitHub
Official repo and evaluation implementation of VSI-Bench
☆734Aug 5, 2025Updated 11 months ago
bytedance / Shot2Story
View on GitHub
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
☆179Jan 30, 2025Updated last year
bytedance / tarsier
View on GitHub
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆548Aug 14, 2025Updated 11 months ago
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆884Dec 14, 2025Updated 7 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
zai-org / LVBench
View on GitHub
[ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmark
☆145Jul 9, 2025Updated last year
snap-research / Panda-70M
View on GitHub
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
☆700Oct 25, 2024Updated last year
Ziyang412 / VideoTree
View on GitHub
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
☆166Jun 23, 2025Updated last year
MME-Benchmarks / Video-MME
View on GitHub
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆788Dec 8, 2025Updated 7 months ago
www-Ye / Time-R1
View on GitHub
R1-like Video-LLM for Temporal Grounding
☆138Jun 20, 2025Updated last year
CASIA-IVA-Lab / VideoNIAH
View on GitHub
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆57Mar 9, 2025Updated last year
AVoCaDO-Captioner / AVoCaDO
View on GitHub
https://avocado-captioner.github.io/
☆37Oct 16, 2025Updated 9 months ago
appletea233 / Temporal-R1
View on GitHub
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
☆62Jun 6, 2025Updated last year
TencentARC / ARC-Hunyuan-Video-7B
View on GitHub
Structured Video Comprehension of Real-World Shorts
☆239Sep 21, 2025Updated 10 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
RenShuhuai-Andy / TimeChat
View on GitHub
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
☆425May 8, 2025Updated last year
aiming-lab / MJ-Video
View on GitHub
[NeurIPS'25 Spotlight] MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation
☆20Feb 23, 2025Updated last year
ByteVisionLab / TokenFlow
View on GitHub
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
☆464Aug 8, 2025Updated 11 months ago
TencentARC / Video-Holmes
View on GitHub
[ECCV 2026] Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆95Jul 13, 2025Updated last year
magic-research / PLLaVA
View on GitHub
Official repository for the paper PLLaVA
☆669Jul 28, 2024Updated 2 years ago
VITA-Group / o1-planning
View on GitHub
[NeurIPS'24 LanGame workshop] On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability
☆42Apr 10, 2026Updated 3 months ago
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,712Jun 15, 2026Updated last month