facebookresearch / memoryLinks
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense feed-forward layers, providing dedicated capacity to store and retrieve information cheaply.
β371Updated last year
Alternatives and similar repositories for memory
Users that are interested in memory are comparing it to the libraries listed below
Sorting:
- πΎ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.β623Updated last week
- Tina: Tiny Reasoning Models via LoRAβ316Updated 4 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β356Updated 2 weeks ago
- β232Updated 2 months ago
- [ICML 2024] CLLMs: Consistency Large Language Modelsβ410Updated last year
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).β344Updated last month
- LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.β277Updated 3 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β252Updated last year
- Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorchβ182Updated 7 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.β175Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.β201Updated last year
- Normalized Transformer (nGPT)β198Updated last year
- Parallel Scaling Law for Language Model β Beyond Parameter and Inference Time Scalingβ468Updated 8 months ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"β316Updated 2 years ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) modelsβ227Updated 3 months ago
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attentionβ¦β294Updated last year
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"β344Updated 2 months ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.β456Updated last year
- β208Updated 3 weeks ago
- Reproducible, flexible LLM evaluationsβ337Updated last week
- Exploring Applications of GRPOβ251Updated 5 months ago
- An extension of the nanoGPT repository for training small MOE models.β233Updated 10 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Modelsβ237Updated 3 months ago
- [ICLR 2026] Learning to Reason without External Rewardsβ389Updated last week
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'β235Updated 6 months ago
- BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.β238Updated 5 months ago
- Pretraining and inference code for a large-scale depth-recurrent language modelβ861Updated last month
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.β361Updated this week
- PyTorch implementation of models from the Zamba2 series.β186Updated last year
- A family of compressed models obtained via pruning and knowledge distillationβ364Updated 3 months ago