facebookresearch / memoryLinks
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense feed-forward layers, providing dedicated capacity to store and retrieve information cheaply.
β370Updated last year
Alternatives and similar repositories for memory
Users that are interested in memory are comparing it to the libraries listed below
Sorting:
- Tina: Tiny Reasoning Models via LoRAβ313Updated 4 months ago
- πΎ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.β621Updated 3 weeks ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β355Updated last week
- [ICML 2024] CLLMs: Consistency Large Language Modelsβ411Updated last year
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).β344Updated last month
- A project to improve skills of large language modelsβ786Updated last week
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β251Updated last year
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) modelsβ227Updated 2 months ago
- Parallel Scaling Law for Language Model β Beyond Parameter and Inference Time Scalingβ469Updated 8 months ago
- β206Updated last year
- β230Updated 2 months ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cacheβ140Updated 5 months ago
- Normalized Transformer (nGPT)β197Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.β175Updated last year
- LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.β276Updated 3 months ago
- Simple & Scalable Pretraining for Neural Architecture Researchβ307Updated last month
- Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorchβ182Updated 7 months ago
- Pretraining and inference code for a large-scale depth-recurrent language modelβ859Updated last month
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attentionβ¦β294Updated last year
- A family of compressed models obtained via pruning and knowledge distillationβ364Updated 2 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.β201Updated last year
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Modelsβ236Updated 3 months ago
- Exploring Applications of GRPOβ251Updated 5 months ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.β158Updated 11 months ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.β358Updated 7 months ago
- PyTorch implementation of models from the Zamba2 series.β186Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)β163Updated 9 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'β235Updated 6 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"β343Updated 2 months ago
- Reproducible, flexible LLM evaluationsβ331Updated this week