lucidrains / coconut-pytorch
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆165Updated 4 months ago
Alternatives and similar repositories for coconut-pytorch:
Users that are interested in coconut-pytorch are comparing it to the libraries listed below
- ☆91Updated 7 months ago
- ☆170Updated 2 weeks ago
- This is the official repository for Inheritune.☆111Updated 2 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆123Updated 8 months ago
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆158Updated last month
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆215Updated this week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆105Updated this week
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆153Updated 3 weeks ago
- Some preliminary explorations of Mamba's context scaling.☆213Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 3 months ago
- ☆69Updated 2 months ago
- The HELMET Benchmark☆142Updated 2 weeks ago
- Implementation of Infini-Transformer in Pytorch☆110Updated 4 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆177Updated last month
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆148Updated 3 weeks ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆84Updated 7 months ago
- ☆114Updated 2 months ago
- ☆77Updated 3 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆91Updated 2 weeks ago
- ☆97Updated 10 months ago
- ☆25Updated 3 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆232Updated 2 months ago
- ☆78Updated 8 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆51Updated 2 months ago
- ☆125Updated last year
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"☆100Updated 2 weeks ago
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆52Updated this week
- minimal GRPO implementation from scratch☆87Updated last month
- Language models scale reliably with over-training and on downstream tasks☆96Updated last year
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆128Updated 3 months ago