lucidrains / coconut-pytorchLinks
Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorch
β179Updated 4 months ago
Alternatives and similar repositories for coconut-pytorch
Users that are interested in coconut-pytorch are comparing it to the libraries listed below
Sorting:
- β107Updated last year
- Some preliminary explorations of Mamba's context scaling.β216Updated last year
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Modelsβ231Updated last week
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)β162Updated 6 months ago
- β85Updated 9 months ago
- β195Updated 6 months ago
- β86Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ129Updated last year
- This is the official repository for Inheritune.β115Updated 8 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Modelsβ65Updated 7 months ago
- Language models scale reliably with over-training and on downstream tasksβ100Updated last year
- β122Updated 8 months ago
- Physics of Language Models, Part 4β250Updated 2 months ago
- [NeurIPS 2024] Low rank memory efficient optimizer without SVDβ30Updated 3 months ago
- AnchorAttention: Improved attention for LLMs long-context trainingβ213Updated 9 months ago
- Normalized Transformer (nGPT)β192Updated 11 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β241Updated 4 months ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Modelsβ132Updated 2 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"β173Updated last year
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Modeβ¦β116Updated last year
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"β102Updated last week
- β93Updated 7 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)β147Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β166Updated 3 months ago
- Understand and test language model architectures on synthetic tasks.β233Updated 3 weeks ago
- β73Updated last year
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]β142Updated last year
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,β¦β49Updated 6 months ago
- [NeurIPS-2024] π Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623β88Updated last year
- β55Updated 4 months ago