neilwen987 / CSR_Adaptive_Rep
Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
☆58Updated 3 weeks ago
Alternatives and similar repositories for CSR_Adaptive_Rep:
Users that are interested in CSR_Adaptive_Rep are comparing it to the libraries listed below
- Official implementation of "BERTs are Generative In-Context Learners"☆26Updated last month
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆28Updated last month
- ☆77Updated 8 months ago
- Official Repository for "Hypencoder: Hypernetworks for Information Retrieval"☆23Updated last month
- Using FlexAttention to compute attention with different masking patterns☆43Updated 7 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆25Updated 5 months ago
- MEXMA: Token-level objectives improve sentence representations☆40Updated 3 months ago
- ☆47Updated 7 months ago
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"☆59Updated 5 months ago
- ☆79Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆71Updated 5 months ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆39Updated 6 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆26Updated 7 months ago
- ☆22Updated 2 months ago
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)☆40Updated last year
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆65Updated 6 months ago
- ☆25Updated last year
- Implementation of Bitune: Bidirectional Instruction-Tuning☆19Updated 10 months ago
- Source code of our paper "PairDistill: Pairwise Relevance Distillation for Dense Retrieval", EMNLP 2024 Main.☆22Updated 4 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆84Updated last year
- The repository contains code for Adaptive Data Optimization☆23Updated 4 months ago
- Aioli: A unified optimization framework for language model data mixing☆23Updated 3 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆29Updated last month
- Easily run PyTorch on multiple GPUs & machines☆45Updated last month
- ☆54Updated 7 months ago
- [ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers☆64Updated 3 months ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆18Updated last month
- ☆67Updated 8 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 2 months ago