rccchoudhury / aptLinks
Public release of the code for "Accelerating Vision Transformers with Adaptive Patches"
☆65Updated last week
Alternatives and similar repositories for apt
Users that are interested in apt are comparing it to the libraries listed below
Sorting:
- ☆21Updated last year
- Code for ICML 2025 Paper "Highly Compressed Tokenizer Can Generate Without Training"☆187Updated 5 months ago
- Program synthesis for 3D spatial reasoning☆53Updated 5 months ago
- Personalized Representation from Personalized Generation (ICLR 2025)☆67Updated 8 months ago
- ☆113Updated 3 months ago
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆157Updated last month
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆62Updated 6 months ago
- This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…☆34Updated 3 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆75Updated 5 months ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆170Updated last week
- Official PyTorch Implementation for Dual-Process Image Generation, ICCV 2025☆111Updated 2 months ago
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆47Updated 2 months ago
- [NeurIPS '25 Spotlight] Official Pytorch implementation of "Vision Transformers Don't Need Trained Registers"☆142Updated 2 months ago
- Official PyTorch Implementation for Diffusion Hyperfeatures, NeurIPS 2023☆109Updated last year
- ☆70Updated 2 weeks ago
- Visual Spatial Tuning☆133Updated this week
- [ICML 2024] Compositional Image Decomposition with Diffusion Models☆51Updated last year
- ☆37Updated 9 months ago
- ☆51Updated 11 months ago
- Official implementation of "Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive" (ICLR 2024)☆57Updated last year
- [AAAI 2025] GFlow: Recovering 4D World from Monocular Video☆55Updated 6 months ago
- [NeurIPS 2025 Oral] Exploring Diffusion Transformer Designs via Grafting☆61Updated 5 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆93Updated 8 months ago
- [CVPR'2025] EntitySAM: Segment Everything in Video☆53Updated 4 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆127Updated 3 months ago
- Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models☆92Updated last year
- [CVPR 2025] GPS as a Control Signal for Image Generation☆24Updated 8 months ago
- Code for "How far can we go with ImageNet for Text-to-Image generation?" paper☆94Updated last week
- [NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆115Updated 2 weeks ago
- Official implementation of "VIRAL: Visual Representation Alignment for MLLMs".☆136Updated 2 months ago