google-deepmind / language_modeling_is_compression
☆101Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for language_modeling_is_compression
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆127Updated last month
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆56Updated 3 weeks ago
- [NeurIPS'24 Spotlight] Observational Scaling Laws☆42Updated last month
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆91Updated 4 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆111Updated last week
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆156Updated 3 months ago
- ☆132Updated last year
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆134Updated 4 months ago
- ☆112Updated 3 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆67Updated last month
- ☆61Updated 2 months ago
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆133Updated last month
- Language models scale reliably with over-training and on downstream tasks☆94Updated 7 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆137Updated this week
- Simple and efficient pytorch-native transformer training and inference (batched)☆61Updated 7 months ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆64Updated 5 months ago
- ☆97Updated 8 months ago
- Code accompanying the paper "Massive Activations in Large Language Models"☆121Updated 8 months ago
- ☆49Updated 6 months ago
- ☆50Updated 5 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆75Updated last month
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆77Updated 2 weeks ago
- Stick-breaking attention☆33Updated this week
- Some preliminary explorations of Mamba's context scaling.☆190Updated 9 months ago
- ☆65Updated 7 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆129Updated last month
- Benchmarking LLMs with Challenging Tasks from Real Users☆194Updated last week
- ☆107Updated 3 months ago
- ☆79Updated last year
- ☆50Updated last week