lucidrains / coconut-pytorchLinks
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆179Updated 4 months ago
Alternatives and similar repositories for coconut-pytorch
Users that are interested in coconut-pytorch are comparing it to the libraries listed below
Sorting:
- ☆108Updated last year
- ☆197Updated 6 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆231Updated 3 weeks ago
- Some preliminary explorations of Mamba's context scaling.☆216Updated last year
- ☆124Updated 8 months ago
- ☆87Updated last year
- [NeurIPS 2024] Low rank memory efficient optimizer without SVD☆30Updated 4 months ago
- ☆85Updated this week
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆132Updated 2 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]☆178Updated 4 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆104Updated last month
- AnchorAttention: Improved attention for LLMs long-context training☆213Updated 9 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆131Updated last week
- This is the official repository for Inheritune.☆115Updated 9 months ago
- Language models scale reliably with over-training and on downstream tasks☆100Updated last year
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆67Updated 8 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆162Updated 6 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆198Updated 2 weeks ago
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆177Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)☆147Updated last year
- Long Context Extension and Generalization in LLMs☆62Updated last year
- ☆75Updated last year
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆232Updated 3 months ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆98Updated 10 months ago
- Code for paper "Patch-Level Training for Large Language Models"☆92Updated 11 months ago
- The official github repo for "Diffusion Language Models are Super Data Learners".☆145Updated last week
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆175Updated last year
- Physics of Language Models, Part 4☆255Updated 3 months ago
- Replicating O1 inference-time scaling laws☆90Updated 11 months ago
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆327Updated 5 months ago