SakanaAI / evo-memoryLinks
Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.
☆314Updated 8 months ago
Alternatives and similar repositories for evo-memory
Users that are interested in evo-memory are comparing it to the libraries listed below
Sorting:
- PyTorch implementation of models from the Zamba2 series.☆183Updated 5 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆341Updated 7 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 5 months ago
- ☆115Updated 6 months ago
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input☆798Updated last month
- Train your own SOTA deductive reasoning model☆96Updated 4 months ago
- Code for ExploreTom☆84Updated 2 weeks ago
- smolLM with Entropix sampler on pytorch☆150Updated 8 months ago
- ☆179Updated 7 months ago
- A compact LLM pretrained in 9 days by using high quality data☆317Updated 3 months ago
- GRadient-INformed MoE☆263Updated 9 months ago
- Long context evaluation for large language models☆219Updated 4 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆103Updated 2 months ago
- prime-rl is a codebase for decentralized async RL training at scale☆362Updated this week
- code for training & evaluating Contextual Document Embedding models☆194Updated last month
- Build your own visual reasoning model☆395Updated this week
- ☆98Updated 5 months ago
- ☆134Updated 10 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆101Updated 4 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆243Updated 5 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆220Updated 7 months ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆112Updated this week
- DeMo: Decoupled Momentum Optimization☆189Updated 7 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆318Updated 7 months ago
- ☆118Updated 10 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆198Updated 11 months ago
- Alice in Wonderland code base for experiments and raw experiments data☆131Updated 3 weeks ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆295Updated 2 weeks ago
- ☆162Updated 2 months ago
- Fast parallel LLM inference for MLX☆198Updated last year