facebookresearch / memoryLinks
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense feed-forward layers, providing dedicated capacity to store and retrieve information cheaply.
β342Updated 10 months ago
Alternatives and similar repositories for memory
Users that are interested in memory are comparing it to the libraries listed below
Sorting:
- Tina: Tiny Reasoning Models via LoRAβ299Updated 3 weeks ago
- πΎ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.β535Updated last week
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β343Updated 5 months ago
- β222Updated 2 weeks ago
- LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.β260Updated last year
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cacheβ125Updated 2 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β248Updated 8 months ago
- PyTorch building blocks for the OLMo ecosystemβ307Updated this week
- [ICML 2024] CLLMs: Consistency Large Language Modelsβ404Updated 11 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.β201Updated last year
- An extension of the nanoGPT repository for training small MOE models.β202Updated 7 months ago
- A project to improve skills of large language modelsβ587Updated this week
- Parallel Scaling Law for Language Model β Beyond Parameter and Inference Time Scalingβ447Updated 5 months ago
- PyTorch implementation of models from the Zamba2 series.β185Updated 8 months ago
- Exploring Applications of GRPOβ248Updated last month
- Simple & Scalable Pretraining for Neural Architecture Researchβ296Updated 2 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.β172Updated 9 months ago
- β201Updated 10 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).β294Updated this week
- Normalized Transformer (nGPT)β192Updated 11 months ago
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"β538Updated 2 weeks ago
- Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorchβ179Updated 4 months ago
- Build your own visual reasoning modelβ413Updated 2 weeks ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) modelsβ220Updated last month
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.β136Updated 8 months ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.β345Updated 3 months ago
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attentionβ¦β291Updated last year
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β240Updated 11 months ago
- Async RL Training at Scaleβ709Updated this week
- Pretraining and inference code for a large-scale depth-recurrent language modelβ836Updated this week