EvolvingLMMs-Lab / saeLinks
A framework that allows you to apply Sparse AutoEncoder on any models
☆40Updated 2 months ago
Alternatives and similar repositories for sae
Users that are interested in sae are comparing it to the libraries listed below
Sorting:
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆117Updated 3 weeks ago
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Updated last year
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆39Updated 7 months ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆114Updated last week
- ☆74Updated 2 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆87Updated 6 months ago
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM☆19Updated 3 months ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆22Updated last year
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆77Updated last month
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆127Updated last month
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆86Updated 11 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆48Updated last month
- The code repository of UniRL☆40Updated 3 months ago
- Benchmarking and Analyzing Generative Data for Visual Recognition☆26Updated 2 years ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆60Updated 4 months ago
- [CVPR2025] A benchmark for evaluating video generative models in generating short stories☆17Updated 4 months ago
- ☆50Updated 3 weeks ago
- This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…☆34Updated last month
- Official PyTorch implementation of the paper "Equivariant Image Modeling"(https://arxiv.org/abs/2503.18948)☆34Updated last month
- Empowering Unified MLLM with Multi-granular Visual Generation☆130Updated 8 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆73Updated 2 months ago
- ☆31Updated last week
- Official repository for LLaVA-Reward (ICCV 2025): Multimodal LLMs as Customized Reward Models for Text-to-Image Generation☆20Updated last month
- Test-time Scaling for VAR models☆23Updated last month
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆45Updated 2 months ago
- ☆56Updated 2 weeks ago
- T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation☆29Updated last week
- [CVPR 2025] Test-Time Visual In-Context Tuning☆25Updated 5 months ago
- Codebase for the paper-Elucidating the design space of language models for image generation☆46Updated 10 months ago
- [NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.☆49Updated 11 months ago