facebookresearch / coconut
Training Large Language Model to Reason in a Continuous Latent Space
β388Updated this week
Alternatives and similar repositories for coconut:
Users that are interested in coconut are comparing it to the libraries listed below
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.β157Updated this week
- A Self-adaptation Frameworkπ that adapts LLMs for unseen tasks in real-time!β343Updated this week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β277Updated last month
- System 2 Reasoning Link Collectionβ722Updated this week
- β96Updated 3 weeks ago
- Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorchβ145Updated 2 weeks ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)β182Updated 7 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"β278Updated last month
- Long context evaluation for large language modelsβ195Updated this week
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resourcesβ119Updated this week
- β116Updated this week
- Fast bare-bones BPE for modern tokenizer trainingβ141Updated 2 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β259Updated last week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β90Updated last month
- Sparse autoencodersβ407Updated this week
- smolLM with Entropix sampler on pytorchβ147Updated 2 months ago
- Minimalistic 4D-parallelism distributed training framework for education purposeβ644Updated this week
- Simple Transformer in Jaxβ128Updated 6 months ago
- Normalized Transformer (nGPT)β145Updated last month
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.β273Updated 2 months ago
- Code for the paper π³ Tree Search for Language Model Agentsβ163Updated 5 months ago
- The official evaluation suite and dynamic data release for MixEval.β233Updated 2 months ago
- DeMo: Decoupled Momentum Optimizationβ170Updated last month
- All credits go to HuggingFace's Daily AI papers (https://huggingface.co/papers) and the research community. πAudio summaries here (httpsβ¦β143Updated this week
- A simple unified framework for evaluating LLMsβ164Updated 3 weeks ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"β831Updated last month
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAIβ270Updated 2 months ago
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overheadβ210Updated last week
- Automatic Evals for Instruction-Tuned Modelsβ100Updated this week