JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆79Updated last year
Related projects ⓘ
Alternatives and complementary repositories for NoTrainNoGain
- Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).☆56Updated 2 years ago
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆15Updated 5 months ago
- ☆63Updated 2 years ago
- ☆50Updated 6 months ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆53Updated last month
- ☆36Updated 3 months ago
- ☆23Updated 5 months ago
- ☆25Updated 4 months ago
- ☆44Updated last year
- NanoGPT-like codebase for LLM training☆75Updated this week
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆44Updated last year
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆43Updated 6 months ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆54Updated 3 months ago
- The Efficiency Spectrum of LLM☆52Updated 11 months ago
- Code accompanying the paper "Massive Activations in Large Language Models"☆123Updated 8 months ago
- ☆15Updated 4 months ago
- ☆43Updated 9 months ago
- Code release for Dataless Knowledge Fusion by Merging Weights of Language Models (https://openreview.net/forum?id=FCnohuR6AnM)