google-deepmind / language_modeling_is_compression
☆130Updated 7 months ago
Alternatives and similar repositories for language_modeling_is_compression:
Users that are interested in language_modeling_is_compression are comparing it to the libraries listed below
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆132Updated 7 months ago
- ☆69Updated 2 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆232Updated 2 months ago
- Some preliminary explorations of Mamba's context scaling.☆212Updated last year
- ☆89Updated 6 months ago
- Normalized Transformer (nGPT)☆168Updated 5 months ago
- Code accompanying the paper "Massive Activations in Large Language Models"☆156Updated last year
- ☆60Updated 11 months ago
- Async pipelined version of Verl☆60Updated 2 weeks ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]☆105Updated 2 months ago
- 🔥 A minimal training framework for scaling FLA models☆107Updated last week
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆81Updated 10 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆175Updated last month
- Simple and efficient pytorch-native transformer training and inference (batched)☆73Updated last year
- Code for studying the super weight in LLM☆98Updated 4 months ago
- ☆83Updated last year
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆214Updated last week
- The HELMET Benchmark☆135Updated last week
- Replicating O1 inference-time scaling laws☆83Updated 4 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆151Updated 5 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆185Updated 8 months ago
- Language models scale reliably with over-training and on downstream tasks☆96Updated last year
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 3 months ago
- ☆185Updated this week
- ☆102Updated last year
- ☆78Updated 8 months ago
- Efficient triton implementation of Native Sparse Attention.☆139Updated 2 weeks ago
- [NeurIPS'24 Spotlight] Observational Scaling Laws☆54Updated 6 months ago
- ☆137Updated 5 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆82Updated 6 months ago