Gen-Verse / Paper2VideoLinks
[ICCV 2025] Preacher: Paper-to-Video Agentic System
☆26Updated 3 months ago
Alternatives and similar repositories for Paper2Video
Users that are interested in Paper2Video are comparing it to the libraries listed below
Sorting:
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆40Updated 9 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆73Updated 2 months ago
- ☆64Updated 6 months ago
- ☆135Updated last month
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆74Updated last year
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 4 months ago
- [Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models☆40Updated 8 months ago
- Inference-only implementation of "One-Step Diffusion Distillation through Score Implicit Matching" [NIPS 2024]☆84Updated last year
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆49Updated 5 months ago
- Codebase for the paper-Elucidating the design space of language models for image generation☆46Updated last year
- [ICCV2025] TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/To…☆150Updated 4 months ago
- Explore how to get a VQ-VAE models efficiently!☆63Updated 4 months ago
- The author's implementation of FUDOKI, a multimodal large language model purely based on discrete flow matching.☆63Updated 2 months ago
- ☆139Updated last year
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Updated 11 months ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆112Updated 5 months ago
- ☆42Updated 6 months ago
- ☆19Updated last year
- [CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆107Updated 2 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆94Updated 9 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆20Updated 9 months ago
- ☆30Updated 8 months ago
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers" [ICCV 2025]☆95Updated 4 months ago
- [CVPR 2025] HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation☆58Updated 5 months ago
- Official PyTorch implementation of the paper "Equivariant Image Modeling"(https://arxiv.org/abs/2503.18948)☆34Updated 4 months ago
- ☆46Updated 11 months ago
- LMM solved catastrophic forgetting, AAAI2025☆44Updated 7 months ago
- Autoregressive Image Generation with Randomized Parallel Decoding☆81Updated last month
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆40Updated last month
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆159Updated last month