google-deepmind / language_modeling_is_compressionLinks
☆171Updated last year
Alternatives and similar repositories for language_modeling_is_compression
Users that are interested in language_modeling_is_compression are comparing it to the libraries listed below
Sorting:
- Some preliminary explorations of Mamba's context scaling.☆218Updated 2 years ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆247Updated 8 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆147Updated last year
- Language models scale reliably with over-training and on downstream tasks☆99Updated last year
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆237Updated 3 months ago
- Code accompanying the paper "Massive Activations in Large Language Models"☆195Updated last year
- ☆112Updated last year
- ☆74Updated last year
- ☆108Updated last year
- Physics of Language Models: Part 4.2, Canon Layers at Scale where Synthetic Pretraining Resonates in Reality☆317Updated last month
- [NeurIPS'24 Spotlight] Observational Scaling Laws☆58Updated last year
- ☆208Updated 3 weeks ago
- ☆150Updated 2 years ago
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆123Updated last year
- Sparse Backpropagation for Mixture-of-Expert Training☆29Updated last year
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆229Updated last year
- [ICLR 2023] "Learning to Grow Pretrained Models for Efficient Transformer Training" by Peihao Wang, Rameswar Panda, Lucas Torroba Hennige…☆92Updated last year
- Code for studying the super weight in LLM☆121Updated last year
- Kinetics: Rethinking Test-Time Scaling Laws☆86Updated 6 months ago
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆182Updated 7 months ago
- ☆91Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆163Updated 9 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆177Updated last year
- [ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications☆52Updated 3 months ago
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆84Updated last year
- ☆203Updated 9 months ago
- Replicating O1 inference-time scaling laws☆92Updated last year
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆105Updated last year
- ☆105Updated 11 months ago
- The HELMET Benchmark☆198Updated 2 months ago