elad-amrani / xtraLinks
PyTorch implementation of "Sample- and Parameter-Efficient Auto-Regressive Image Models" from CVPR 2025
☆13Updated 7 months ago
Alternatives and similar repositories for xtra
Users that are interested in xtra are comparing it to the libraries listed below
Sorting:
- Unifying Specialized Visual Encoders for Video Language Models☆22Updated 3 months ago
- ☆30Updated last month
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆50Updated 3 months ago
- Image Tokenizer Needs Post-Training☆24Updated 3 weeks ago
- The official repo of continuous speculative decoding☆30Updated 7 months ago
- Official implementation of "Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization"☆81Updated last year
- ☆21Updated 5 months ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆28Updated last year
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆39Updated 8 months ago
- Single-pass Adaptive Image Tokenization for Minimum Program Search | What's the Kolmogorov Complexity of an Image?☆41Updated 3 months ago
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Updated last year
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆91Updated 8 months ago
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆77Updated 2 years ago
- ☆34Updated 5 months ago
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆48Updated last week
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆98Updated last year
- ☆22Updated 10 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆19Updated 8 months ago
- Official pytorch implementation of "AlphaFlow: Understanding and Improving MeanFlow Models"☆42Updated last week
- [CVPR 2025] Test-Time Visual In-Context Tuning☆25Updated 7 months ago
- Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.☆100Updated 7 months ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆41Updated 4 months ago
- ☆71Updated 11 months ago
- Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"☆27Updated 2 years ago
- ☆37Updated 8 months ago
- (ICLR 2024, CVPR 2024) SparseFormer☆75Updated 11 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆60Updated 3 months ago
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆45Updated 3 months ago
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆86Updated last year
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆32Updated 2 years ago