lucidrains / coconut-pytorchLinks
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆179Updated 2 months ago
Alternatives and similar repositories for coconut-pytorch
Users that are interested in coconut-pytorch are comparing it to the libraries listed below
Sorting:
- ☆101Updated 10 months ago
- ☆187Updated 4 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆226Updated 3 months ago
- Some preliminary explorations of Mamba's context scaling.☆216Updated last year
- ☆85Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆127Updated last year
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆60Updated 5 months ago
- ☆120Updated 6 months ago
- Language models scale reliably with over-training and on downstream tasks☆98Updated last year
- Physics of Language Models, Part 4☆232Updated 3 weeks ago
- This is the official repository for Inheritune.☆112Updated 6 months ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆122Updated last week
- [NeurIPS 2024] Low rank memory efficient optimizer without SVD☆30Updated last month
- AnchorAttention: Improved attention for LLMs long-context training☆212Updated 7 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains☆163Updated 2 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆228Updated last month
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆169Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆152Updated last month
- 📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.☆192Updated last week
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆276Updated 2 months ago
- ☆100Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆160Updated 4 months ago
- ☆85Updated 7 months ago
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆53Updated 8 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆176Updated 4 months ago
- Long Context Extension and Generalization in LLMs☆58Updated 11 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆95Updated 3 weeks ago