elad-amrani / xtra
PyTorch implementation of "Sample- and Parameter-Efficient Auto-Regressive Image Models" from CVPR 2025
☆11Updated 2 weeks ago
Alternatives and similar repositories for xtra:
Users that are interested in xtra are comparing it to the libraries listed below
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆63Updated 3 weeks ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆26Updated last month
- The official repo of continuous speculative decoding☆25Updated this week
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 9 months ago
- HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆51Updated last month
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆27Updated last year
- A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆19Updated last week
- Unifying Specialized Visual Encoders for Video Language Models☆16Updated 2 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆30Updated this week
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆44Updated last year
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆14Updated last month
- Codebase for the paper-Elucidating the design space of language models for image generation☆45Updated 4 months ago
- ☆40Updated 4 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆35Updated 9 months ago
- ☆28Updated last month
- Code for paper "Principal Components" Enable A New Language of Images☆23Updated last week
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Updated 9 months ago
- ☆17Updated 4 months ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆56Updated last year
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 7 months ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆18Updated 9 months ago
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers"☆54Updated last week
- Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"☆27Updated last year
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆34Updated last month
- Official Repository of Personalized Visual Instruct Tuning☆28Updated 3 weeks ago
- ☆29Updated 8 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆25Updated 6 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆45Updated 2 months ago
- ☆19Updated last year
- [ICLR 2025] Implementation of Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding☆29Updated 3 weeks ago