InternLM / StarBenchLinks
☆24Updated this week
Alternatives and similar repositories for StarBench
Users that are interested in StarBench are comparing it to the libraries listed below
Sorting:
- [ECCV 2024 Oral] Audio-Synchronized Visual Animation☆56Updated last year
- [CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners☆150Updated last year
- Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation☆61Updated 4 months ago
- [ICCV2025] TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/To…☆145Updated 3 months ago
- The author's implementation of FUDOKI, a multimodal large language model purely based on discrete flow matching.☆62Updated last month
- The official UniVerse-1 code.☆100Updated 2 weeks ago
- ☆183Updated 10 months ago
- [ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"☆189Updated 4 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆159Updated last month
- [🏆 IJCV 2025 & ACCV 2024 Best Paper Honorable Mention] Official pytorch implementation of the paper "High-Quality Visually-Guided Sound …☆20Updated 2 weeks ago
- ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer☆38Updated 10 months ago
- [Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions☆33Updated 8 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆71Updated last month
- ☆130Updated 2 weeks ago
- [Neurips 2025 NextVid Workshop Oral✨] Official Implementation of VideoGen-of-Thought: Step-by-step generating multi-shot video with minim…☆48Updated last month
- [CVPR 2025] VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?☆29Updated 5 months ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆57Updated 4 months ago
- [AAAI 2025] Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding☆31Updated 7 months ago
- Official implementation of Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning☆173Updated last month
- Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)☆74Updated 7 months ago
- Tracking the latest and greatest research papers on video generation.☆78Updated last week
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆103Updated 8 months ago
- ICML2025