google-deepmind / language_modeling_is_compressionLinks
☆150Updated 11 months ago
Alternatives and similar repositories for language_modeling_is_compression
Users that are interested in language_modeling_is_compression are comparing it to the libraries listed below
Sorting:
- Some preliminary explorations of Mamba's context scaling.☆216Updated last year
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆140Updated 10 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆238Updated 2 months ago
- [NeurIPS'24 Spotlight] Observational Scaling Laws☆56Updated 10 months ago
- ☆101Updated 10 months ago
- Language models scale reliably with over-training and on downstream tasks☆97Updated last year
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆178Updated last month
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆92Updated 2 weeks ago
- ☆147Updated 2 years ago
- ☆192Updated this week
- Code accompanying the paper "Massive Activations in Large Language Models"☆174Updated last year
- Understand and test language model architectures on synthetic tasks.☆221Updated 3 weeks ago
- ☆106Updated last year
- Stick-breaking attention☆59Updated last month
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆113Updated 10 months ago
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆259Updated 2 months ago
- ☆184Updated last year
- ☆90Updated last year
- ☆67Updated last year
- AnchorAttention: Improved attention for LLMs long-context training☆212Updated 6 months ago
- Physics of Language Models, Part 4☆219Updated 2 weeks ago
- ☆187Updated 3 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆151Updated 4 months ago
- ☆50Updated last year
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆119Updated last year
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆205Updated last year
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆226Updated 3 months ago
- ☆81Updated last year
- The HELMET Benchmark☆163Updated 3 months ago
- Replicating O1 inference-time scaling laws☆89Updated 8 months ago