rese1f/aurora

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/rese1f/aurora)

rese1f / aurora

[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

☆141

Alternatives and similar repositories for aurora

Users that are interested in aurora are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Espere-1119-Song / Video-MMLU
View on GitHub
A Massive Multi-Discipline Lecture Understanding Benchmark
☆34Nov 1, 2025Updated 5 months ago
rese1f / STEVE
View on GitHub
[ECCV 2024] STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment
☆41Dec 27, 2023Updated 2 years ago
Share14 / ShareGemini
View on GitHub
☆32Jul 29, 2024Updated last year
rese1f / MovieChat
View on GitHub
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
☆690Jan 29, 2025Updated last year
rese1f / PoseDA
View on GitHub
[ICCV 2023] Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation
☆24Aug 26, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Jialuo-Li / Science-T2I
View on GitHub
[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis
☆62Mar 31, 2026Updated 2 weeks ago
llyx97 / TempCompass
View on GitHub
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆131Apr 4, 2025Updated last year
jiyt17 / IDA-VLM
View on GitHub
[ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
☆37Nov 27, 2024Updated last year
zeyofu / Commonsense-T2I
View on GitHub
Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]
☆24Aug 13, 2024Updated last year
ziqipang / MR-Video
View on GitHub
MR. Video: MapReduce is the Principle for Long Video Understanding
☆31Apr 23, 2025Updated 11 months ago
vision-x-nyu / thinking-in-space
View on GitHub
Official repo and evaluation implementation of VSI-Bench
☆695Aug 5, 2025Updated 8 months ago
Owen718 / LongPrompt-LLamaGen
View on GitHub
This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…
☆30Oct 21, 2024Updated last year
Owen718 / AWRCP
View on GitHub
ICCV'23 | Adverse Weather Removal with Codebook Priors
☆10Aug 28, 2023Updated 2 years ago
CASIA-IVA-Lab / VideoNIAH
View on GitHub
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆55Mar 9, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
yale-nlp / TOMATO
View on GitHub
☆37Nov 8, 2024Updated last year
Andy-Cheng / TEMPURA
View on GitHub
TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…
☆25Jun 4, 2025Updated 10 months ago
longvideobench / LongVideoBench
View on GitHub
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆118Jul 27, 2024Updated last year
zai-org / LVBench
View on GitHub
[ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmark
☆143Jul 9, 2025Updated 9 months ago
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆845Dec 14, 2025Updated 4 months ago
OpenGVLab / VideoChat-Flash
View on GitHub
[ICLR2026] VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
☆516Nov 18, 2025Updated 4 months ago
NVlabs / FRAG
View on GitHub
☆14Apr 25, 2025Updated 11 months ago
DCDmllm / Momentor
View on GitHub
☆80Nov 24, 2024Updated last year
tang-bd / fuse-dit
View on GitHub
[CVPR 2025] Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
☆133May 16, 2025Updated 10 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
snap-research / Panda-70M
View on GitHub
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
☆684Oct 25, 2024Updated last year
appletea233 / Temporal-R1
View on GitHub
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
☆62Jun 6, 2025Updated 10 months ago
MME-Benchmarks / Video-MME
View on GitHub
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆756Dec 8, 2025Updated 4 months ago
TIGER-AI-Lab / VISTA
View on GitHub
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆21Feb 27, 2025Updated last year
aiming-lab / MJ-Video
View on GitHub
[NeurIPS'25 Spotlight] MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation
☆21Feb 23, 2025Updated last year
VITA-Group / o1-planning
View on GitHub
[NeurIPS'24 LanGame workshop] On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability
☆42Jul 7, 2025Updated 9 months ago
Fr0zenCrane / Cockatiel
View on GitHub
The official implementation of our paper "Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption"
☆38May 21, 2025Updated 10 months ago
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,628Sep 14, 2025Updated 7 months ago
MMStar-Benchmark / MMStar
View on GitHub
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆206Sep 26, 2024Updated last year
Deploy open-source AI quickly and easily - Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
ByteVisionLab / TokenFlow
View on GitHub
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
☆454Aug 8, 2025Updated 8 months ago
VidCapBench / VidCapBench
View on GitHub
☆13May 17, 2025Updated 10 months ago
egoschema / EgoSchema
View on GitHub
☆110Dec 30, 2024Updated last year
CG-Bench / CG-Bench
View on GitHub
☆19Jan 26, 2025Updated last year
bytedance / tarsier
View on GitHub
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆537Aug 14, 2025Updated 8 months ago
SCZwangxiao / video-ReTaKe
View on GitHub
Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
☆40Mar 16, 2025Updated last year
magic-research / PLLaVA
View on GitHub
Official repository for the paper PLLaVA
☆674Jul 28, 2024Updated last year