alvin-zyl / CoLA
Implementation of CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation
☆20Updated 2 months ago
Alternatives and similar repositories for CoLA:
Users that are interested in CoLA are comparing it to the libraries listed below
- An effective weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study uncovering how reasoning length…☆10Updated this week
- ☆49Updated last year
- Test-time-training on nearest neighbors for large language models☆41Updated last year
- Repo for ACL2023 Findings paper "Emergent Modularity in Pre-trained Transformers"☆23Updated last year
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆63Updated last month
- Source code of ACL 2023 Main Conference Paper "PAD-Net: An Efficient Framework for Dynamic Networks".☆9Updated last year
- ☆130Updated 9 months ago
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]☆17Updated last year
- Stick-breaking attention☆52Updated last month
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆16Updated 2 weeks ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆51Updated 2 years ago
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆30Updated 6 months ago
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆36Updated last year
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆59Updated 7 months ago
- ☆13Updated last year
- A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643☆76Updated last year
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆23Updated 2 months ago
- Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆20Updated last month
- A curated list of awesome resources dedicated to Scaling Laws for LLMs☆71Updated 2 years ago
- ☆93Updated last year
- Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation (ICML'24 Oral)☆14Updated 9 months ago
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆101Updated last year
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆41Updated 3 weeks ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆72Updated 6 months ago
- LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation☆23Updated 2 weeks ago
- [NeurIPS'24 Spotlight] Observational Scaling Laws☆54Updated 7 months ago
- Bayesian low-rank adaptation for large language models☆23Updated last year
- [NeurIPS 2023 Spotlight] Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training☆34Updated last month
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆136Updated last month
- ☆17Updated 11 months ago