facebookresearch / memoryLinks

Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense feed-forward layers, providing dedicated capacity to store and retrieve information cheaply.

☆342

Alternatives and similar repositories for memory

Users that are interested in memory are comparing it to the libraries listed below

Sorting:

shangshang-wang / Tina
Tina: Tiny Reasoning Models via LoRA
☆299Updated 3 weeks ago
sail-sg / oat
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
☆535Updated last week
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆343Updated 5 months ago
huggingface / picotron_tutorial
☆222Updated 2 weeks ago
microsoft / LongRoPE
LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.
☆260Updated last year
eqimp / hogwild_llm
Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache
☆125Updated 2 months ago
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆248Updated 8 months ago
allenai / OLMo-core
PyTorch building blocks for the OLMo ecosystem
☆307Updated this week
hao-ai-lab / Consistency_LLM
[ICML 2024] CLLMs: Consistency Large Language Models
☆404Updated 11 months ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆201Updated last year
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆202Updated 7 months ago
NVIDIA-NeMo / Skills
A project to improve skills of large language models
☆587Updated this week
QwenLM / ParScale
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
☆447Updated 5 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆185Updated 8 months ago
brendanhogan / DeepSeekRL-Extended
Exploring Applications of GRPO
☆248Updated last month
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆296Updated 2 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆172Updated 9 months ago
NVlabs / hymba
☆201Updated 10 months ago
facebookresearch / RAM
A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).
☆294Updated this week
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆192Updated 11 months ago
McGill-NLP / nano-aha-moment
Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"
☆538Updated 2 weeks ago
lucidrains / coconut-pytorch
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆179Updated 4 months ago
groundlight / r1_vlm
Build your own visual reasoning model
☆413Updated 2 weeks ago
ZihanWang314 / CoE
Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models
☆220Updated last month
Mohammadjafari80 / GSM8K-RLVR
A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.
☆136Updated 8 months ago
SakanaAI / RLT
Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.
☆345Updated 3 months ago
dingo-actual / infini-transformer
PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…
☆291Updated last year
arcee-ai / EvolKit
EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…
☆240Updated 11 months ago
PrimeIntellect-ai / prime-rl
Async RL Training at Scale
☆709Updated this week
seal-rg / recurrent-pretraining
Pretraining and inference code for a large-scale depth-recurrent language model
☆836Updated this week