qhfan / RALA
[CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention
☆3Updated this week
Alternatives and similar repositories for RALA:
Users that are interested in RALA are comparing it to the libraries listed below
- SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality☆30Updated 3 months ago
- Video Diffusion State Space Models☆19Updated 11 months ago
- ☆42Updated this week
- ☆34Updated 4 months ago
- ☆16Updated last year
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆32Updated 8 months ago
- Implementation of Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding☆28Updated 3 months ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Updated 4 months ago
- [NeurIPS 2024] official code release for our paper "Revisiting the Integration of Convolution and Attention for Vision Backbone".☆33Updated last month
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆30Updated this week
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆46Updated 4 months ago
- Official PyTorch implementation - Video Motion Transfer with Diffusion Transformers☆36Updated 2 months ago
- [NeurIPS'24] I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing☆16Updated 2 months ago
- [ NeurIPS 2024 D&B Track ] Implementation for "FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models"☆66Updated 2 months ago
- Towards training VQ-VAE models robustly!☆54Updated last month
- ☆20Updated 8 months ago
- ☆38Updated last year
- Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement"☆44Updated 2 months ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆27Updated last year
- Official Repository of Personalized Visual Instruct Tuning☆26Updated 3 months ago
- Official implementation of LaVin-DiT☆22Updated last month
- ☆23Updated 2 months ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆41Updated last month
- [ICLR 2024] Official implementation of the paper "Toss: High-quality text-guided novel view synthesis from a single image"☆21Updated 9 months ago
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".☆20Updated 4 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆62Updated 4 months ago