neilwen987 / CSR_Adaptive_RepLinks
Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
☆133Updated 3 weeks ago
Alternatives and similar repositories for CSR_Adaptive_Rep
Users that are interested in CSR_Adaptive_Rep are comparing it to the libraries listed below
Sorting:
- Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas☆99Updated 4 months ago
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆174Updated 4 months ago
- LLaDA2.0 is the diffusion language model series developed by InclusionAI team, Ant Group.☆236Updated last month
- JudgeLRM: Large Reasoning Models as a Judge☆40Updated last month
- [ICLR 2026] Geometric-Mean Policy Optimization☆98Updated this week
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆51Updated last year
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆57Updated 8 months ago
- AnchorAttention: Improved attention for LLMs long-context training☆213Updated last year
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆50Updated 8 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆31Updated 9 months ago
- ☆91Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆31Updated last year
- [ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers☆75Updated 7 months ago
- [ICLR 2025 Oral] "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆87Updated last year
- Defeating the Training-Inference Mismatch via FP16☆180Updated 2 months ago
- Easy and Efficient dLLM Fine-Tuning☆203Updated last week
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆152Updated 6 months ago
- ☆141Updated 10 months ago
- ☆21Updated 4 months ago
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]☆85Updated 9 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆88Updated 4 months ago
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆119Updated 3 weeks ago
- [ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)☆64Updated this week
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆57Updated last week
- [NeurIPS 2025] Thinkless: LLM Learns When to Think☆250Updated 4 months ago
- Esoteric Language Models☆110Updated 2 months ago
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆71Updated 8 months ago
- The official github repo for "Diffusion Language Models are Super Data Learners".☆219Updated 2 months ago
- [MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆145Updated 6 months ago
- [ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction☆84Updated 8 months ago