neilwen987 / CSR_Adaptive_RepLinks
Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
☆125Updated 3 months ago
Alternatives and similar repositories for CSR_Adaptive_Rep
Users that are interested in CSR_Adaptive_Rep are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆126Updated 3 months ago
- [ICLR 2025 Oral] "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆82Updated 11 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆53Updated 4 months ago
- ☆129Updated 7 months ago
- [ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers☆73Updated 3 months ago
- JudgeLRM: Large Reasoning Models as a Judge☆39Updated 3 weeks ago
- Official repository for paper "DeepCritic: Deliberate Critique with Large Language Models"☆36Updated 3 months ago
- AnchorAttention: Improved attention for LLMs long-context training☆213Updated 8 months ago
- ☆85Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆30Updated 11 months ago
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆155Updated 2 weeks ago
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆39Updated last year
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆180Updated 3 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆46Updated 11 months ago
- Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas☆80Updated last month
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆106Updated this week
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆102Updated this week
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆55Updated 8 months ago
- [EMNLP 2025 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆64Updated 6 months ago
- [COLING'25] Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?☆80Updated 8 months ago
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆101Updated last month
- Tree Search for LLM Agent Reinforcement Learning☆127Updated 2 weeks ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆33Updated last year
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆43Updated 11 months ago
- PyTorch library for Active Fine-Tuning☆93Updated 2 weeks ago
- [COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆18Updated 6 months ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example☆361Updated this week
- Code for Heima☆55Updated 5 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆86Updated last year
- ☆133Updated last month