neilwen987 / CSR_Adaptive_RepLinks
Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
☆128Updated 4 months ago
Alternatives and similar repositories for CSR_Adaptive_Rep
Users that are interested in CSR_Adaptive_Rep are comparing it to the libraries listed below
Sorting:
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆163Updated last month
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆54Updated 5 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆135Updated 4 months ago
- Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas☆88Updated 2 months ago
- [ICLR 2025 Oral] "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆82Updated last year
- JudgeLRM: Large Reasoning Models as a Judge☆40Updated 2 months ago
- Esoteric Language Models☆106Updated last month
- Code for Heima☆58Updated 7 months ago
- AnchorAttention: Improved attention for LLMs long-context training☆213Updated 10 months ago
- Geometric-Mean Policy Optimization☆92Updated this week
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆49Updated 6 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆48Updated last year
- [ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers☆73Updated 5 months ago
- ☆131Updated 8 months ago
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆38Updated last year
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆110Updated this week
- [EMNLP'25 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆66Updated 7 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆31Updated 6 months ago
- [NeurIPS 2025] Thinkless: LLM Learns When to Think☆242Updated last month
- One-shot Entropy Minimization☆187Updated 5 months ago
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆110Updated 11 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆81Updated 2 months ago
- TraceRL & TraDo-8B: Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models☆317Updated last week
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆180Updated 5 months ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆33Updated 9 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Updated last year
- A holistic benchmark for LLM abstention☆57Updated 2 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆31Updated last year
- [COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆18Updated 7 months ago
- ☆317Updated 2 weeks ago