EvolvingLMMs-Lab / multimodal-sae
Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
☆100Updated this week
Alternatives and similar repositories for multimodal-sae:
Users that are interested in multimodal-sae are comparing it to the libraries listed below
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆62Updated 7 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆128Updated 2 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆72Updated 4 months ago
- ☆70Updated 5 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆62Updated 2 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆58Updated 6 months ago
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"☆53Updated this week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆80Updated 2 weeks ago
- Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆127Updated last month
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆115Updated 6 months ago
- Official implementation of MAIA, A Multimodal Automated Interpretability Agent☆72Updated 5 months ago
- [NAACL 2025] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆36Updated this week
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]☆40Updated this week
- Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image …☆63Updated last month
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation"☆20Updated 3 weeks ago
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆150Updated 3 weeks ago
- Official PyTorch Implementation for Task Vectors are Cross-Modal☆21Updated last month
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆79Updated 2 months ago
- ☆24Updated 3 weeks ago
- Official implementation of the Law of Vision Representation in MLLMs☆148Updated 2 months ago
- ☆59Updated 9 months ago
- AnchorAttention: Improved attention for LLMs long-context training☆203Updated 2 weeks ago
- ☆134Updated 8 months ago
- [ICLR 2025] SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights☆47Updated last week
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆156Updated 3 months ago
- A brief and partial summary of RLHF algorithms.☆89Updated 2 months ago
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆55Updated 2 months ago
- ☆36Updated 2 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆45Updated last month
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆92Updated 4 months ago