lucidrains / coconut-pytorchLinks
Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorch
β177Updated 3 weeks ago
Alternatives and similar repositories for coconut-pytorch
Users that are interested in coconut-pytorch are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Modelsβ222Updated 2 months ago
- β96Updated 9 months ago
- β182Updated 2 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Modelsβ58Updated 4 months ago
- Some preliminary explorations of Mamba's context scaling.β214Updated last year
- β82Updated 10 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)β158Updated 2 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ127Updated 10 months ago
- β117Updated 4 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β141Updated 2 weeks ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Modelsβ114Updated 2 months ago
- AnchorAttention: Improved attention for LLMs long-context trainingβ208Updated 5 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).β258Updated 3 weeks ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Modeβ¦β108Updated 9 months ago
- β98Updated last year
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'β220Updated 7 months ago
- β80Updated 5 months ago
- EvaByte: Efficient Byte-level Language Models at Scaleβ103Updated 2 months ago
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmindβ177Updated 10 months ago
- πΎ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.β404Updated this week
- Language models scale reliably with over-training and on downstream tasksβ97Updated last year
- Code accompanying the paper "Massive Activations in Large Language Models"β169Updated last year
- This is the official repository for Inheritune.β111Updated 5 months ago
- Normalized Transformer (nGPT)β184Updated 7 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domainsβ147Updated last month
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Modelsβ234Updated last month
- [ICML 2025] Predictive Data Selection: The Data That Predicts Is the Data That Teachesβ51Updated 4 months ago
- π This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.β86Updated this week
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"β235Updated last week
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"β165Updated last year