neilwen987 / CSR_Adaptive_RepLinks
Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
☆80Updated 2 weeks ago
Alternatives and similar repositories for CSR_Adaptive_Rep
Users that are interested in CSR_Adaptive_Rep are comparing it to the libraries listed below
Sorting:
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆40Updated 8 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆29Updated 7 months ago
- ☆29Updated 4 months ago
- ☆183Updated last year
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]☆57Updated 2 months ago
- Official implementation of "BERTs are Generative In-Context Learners"☆28Updated 3 months ago
- PyTorch library for Active Fine-Tuning☆80Updated 4 months ago
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆38Updated last year
- ☆79Updated 10 months ago
- ☆32Updated last year
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆45Updated 10 months ago
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)☆42Updated last year
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆85Updated last year
- Implementation of Bitune: Bidirectional Instruction-Tuning☆19Updated last week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- ☆32Updated 5 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆74Updated 7 months ago
- ☆26Updated 4 months ago
- Efficient Scaling laws and collaborative pretraining.☆16Updated 4 months ago
- [ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers☆69Updated this week
- ☆25Updated last year
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆31Updated last year
- Stick-breaking attention☆57Updated last week
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆88Updated 8 months ago
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"☆59Updated 7 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆89Updated 7 months ago
- MergeBench: A Benchmark for Merging Domain-Specialized LLMs☆14Updated last month
- Model Stock: All we need is just a few fine-tuned models☆117Updated 9 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆42Updated 8 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆33Updated 3 months ago