elad-amrani / xtraLinks
PyTorch implementation of "Sample- and Parameter-Efficient Auto-Regressive Image Models" from CVPR 2025
☆11Updated 3 months ago
Alternatives and similar repositories for xtra
Users that are interested in xtra are comparing it to the libraries listed below
Sorting:
- ☆42Updated 7 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆37Updated last year
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆16Updated 3 months ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆35Updated 11 months ago
- Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data☆34Updated last year
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆27Updated last year
- The official repo of continuous speculative decoding☆27Updated 2 months ago
- the official repo for "D-AR: Diffusion via Autoregressive Models"☆98Updated this week
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆35Updated 4 months ago
- Codebase for the paper-Elucidating the design space of language models for image generation☆45Updated 7 months ago
- ☆26Updated 8 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆75Updated 3 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆70Updated this week
- A curated list of papers and resources for text-to-image evaluation.☆29Updated last year
- ☆36Updated 4 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆45Updated last year
- Code for ICML 2025 Paper "Highly Compressed Tokenizer Can Generate Without Training"☆71Updated 2 weeks ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆56Updated last year
- ☆32Updated last month
- HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆63Updated 4 months ago
- Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"☆27Updated last year
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Updated last year
- ☆23Updated last year
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Updated 9 months ago
- ☆37Updated last month
- TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆30Updated 6 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆51Updated 5 months ago
- Unifying Specialized Visual Encoders for Video Language Models☆19Updated this week
- Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement"☆47Updated 6 months ago
- Official implementation of "Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization"☆77Updated last year