lucidrains / coconut-pytorch
Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorch
β145Updated 2 weeks ago
Alternatives and similar repositories for coconut-pytorch:
Users that are interested in coconut-pytorch are comparing it to the libraries listed below
- β69Updated 4 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ115Updated 4 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Modelsβ188Updated 2 weeks ago
- Language models scale reliably with over-training and on downstream tasksβ96Updated 9 months ago
- β78Updated 3 months ago
- This is the official repository for Inheritune.β108Updated 3 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)β148Updated last month
- β135Updated 3 months ago
- β89Updated this week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.β157Updated this week
- Some preliminary explorations of Mamba's context scaling.β206Updated 11 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β90Updated last month
- Normalized Transformer (nGPT)β145Updated last month
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β219Updated last month
- β93Updated 6 months ago
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmindβ172Updated 4 months ago
- This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.β92Updated 6 months ago
- β65Updated 6 months ago
- β69Updated this week
- The official implementation of Self-Exploring Language Models (SELM)β59Updated 7 months ago
- Implementation of Infini-Transformer in Pytorchβ107Updated 2 weeks ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)β182Updated 7 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Modeβ¦β92Updated 4 months ago
- β124Updated 11 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasksβ139Updated 3 months ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.β97Updated 2 weeks ago
- β58Updated 8 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"β96Updated 3 months ago
- [NeurIPS-2024] π Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623β75Updated 3 months ago
- Understand and test language model architectures on synthetic tasks.β175Updated this week