facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense feed-forward layers, providing dedicated capacity to store and retrieve information cheaply.
β320Updated 4 months ago
Alternatives and similar repositories for memory:
Users that are interested in memory are comparing it to the libraries listed below
- πΎ OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.β331Updated this week
- [ICML 2024] CLLMs: Consistency Large Language Modelsβ391Updated 5 months ago
- β175Updated 4 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.β198Updated 9 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β231Updated 2 months ago
- PyTorch implementation of models from the Zamba2 series.β179Updated 3 months ago
- β185Updated last week
- PyTorch building blocks for the OLMo ecosystemβ197Updated this week
- An extension of the nanoGPT repository for training small MOE models.β131Updated last month
- Normalized Transformer (nGPT)β171Updated 5 months ago
- β419Updated this week
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).β217Updated 3 weeks ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasksβ182Updated last week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β240Updated last week
- β519Updated last week
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Modelsβ214Updated last week
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"β419Updated 2 weeks ago
- β169Updated 2 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.β171Updated 3 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"β304Updated 5 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'β187Updated 4 months ago
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ511Updated 6 months ago
- A project to improve skills of large language modelsβ295Updated this week
- Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorchβ165Updated 3 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β286Updated last week
- Muon optimizer: +>30% sample efficiency with <3% wallclock overheadβ577Updated last month
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAIβ279Updated last month
- Pretraining code for a large-scale depth-recurrent language modelβ745Updated last week
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)β152Updated last week
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ453Updated 2 months ago