lucidrains / coconut-pytorchLinks
Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorch
β170Updated 5 months ago
Alternatives and similar repositories for coconut-pytorch
Users that are interested in coconut-pytorch are comparing it to the libraries listed below
Sorting:
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)β155Updated last month
- β92Updated 8 months ago
- Language models scale reliably with over-training and on downstream tasksβ97Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β126Updated 3 weeks ago
- Some preliminary explorations of Mamba's context scaling.β212Updated last year
- β79Updated 9 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Modelsβ221Updated 3 weeks ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ124Updated 9 months ago
- Understand and test language model architectures on synthetic tasks.β195Updated 2 months ago
- β97Updated 11 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β233Updated 3 months ago
- β174Updated last month
- EvaByte: Efficient Byte-level Language Models at Scaleβ98Updated last month
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Modelsβ55Updated 3 months ago
- This is the official repository for Inheritune.β111Updated 3 months ago
- β114Updated 3 months ago
- β74Updated 3 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"β160Updated 11 months ago
- β25Updated 4 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performanceβ¦β149Updated last month
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Modelsβ179Updated 2 months ago
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"β164Updated 3 weeks ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Modeβ¦β108Updated 8 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"β184Updated 2 months ago
- A brief and partial summary of RLHF algorithms.β128Updated 2 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasksβ143Updated 8 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.β171Updated 4 months ago
- The official implementation of Self-Exploring Language Models (SELM)β64Updated 11 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ126Updated 5 months ago
- The HELMET Benchmarkβ148Updated last month