EvolvingLMMs-Lab / saeLinks
A framework that allows you to apply Sparse AutoEncoder on any models
☆41Updated 3 months ago
Alternatives and similar repositories for sae
Users that are interested in sae are comparing it to the libraries listed below
Sorting:
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM☆20Updated 5 months ago
- ☆76Updated 4 months ago
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Updated last year
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆165Updated last month
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆25Updated last year
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆91Updated 8 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆50Updated 3 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆126Updated 2 months ago
- The official implementation of paper “VChain: Chain-of-Visual-Thought for Reasoning in Video Generation”☆96Updated 3 weeks ago
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆143Updated last week
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆62Updated 6 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆170Updated 3 weeks ago
- ☆58Updated last month
- Codebase for the paper-Elucidating the design space of language models for image generation☆46Updated 11 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆94Updated 2 months ago
- This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…☆34Updated 3 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆39Updated 8 months ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆41Updated 5 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆73Updated 5 months ago
- Unifying Specialized Visual Encoders for Video Language Models☆22Updated 3 months ago
- [CVPR 2025] HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation☆56Updated 3 months ago
- T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation☆30Updated last month
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆46Updated 3 months ago
- Benchmarking and Analyzing Generative Data for Visual Recognition☆26Updated 2 years ago
- Test-time Scaling for VAR models☆25Updated last month
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆86Updated last year
- [CVPR2025] Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation☆17Updated 6 months ago
- ☆21Updated last year
- [NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.☆50Updated last year
- Official Implementation for "Editing Massive Concepts in Text-to-Image Diffusion Models"☆19Updated last year