kmccleary3301 / nested_learningLinks
A Reproduction of GDM's Nested Learning Paper
☆524Updated last month
Alternatives and similar repositories for nested_learning
Users that are interested in nested_learning are comparing it to the libraries listed below
Sorting:
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"☆279Updated last month
- PyTorch Code for Energy-Based Transformers paper -- generalizable reasoning and scalable learning☆569Updated last month
- ☆163Updated 4 months ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆355Updated 6 months ago
- ☆365Updated last month
- Official implementation of "Continuous Autoregressive Language Models"☆676Updated last month
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆345Updated last year
- RLP: Reinforcement as a Pretraining Objective☆220Updated 2 months ago
- Open-source release accompanying Gao et al. 2025☆478Updated 3 weeks ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆341Updated last month
- Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation (NeurIPS 2025)☆527Updated 3 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆233Updated 2 months ago
- PyTorch implementation of models from the Zamba2 series.☆186Updated 11 months ago
- H-Net: Hierarchical Network with Dynamic Chunking☆798Updated last month
- Library for text-to-text regression, applicable to any input string representation and allows pretraining and fine-tuning over multiple r…☆304Updated 2 weeks ago
- Pretraining and inference code for a large-scale depth-recurrent language model☆857Updated this week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆365Updated last year
- An open source implementation of LFMs from Liquid AI: Liquid Foundation Models☆198Updated last week
- dLLM: Simple Diffusion Language Modeling☆1,526Updated last week
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆138Updated 7 months ago
- Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆441Updated 2 months ago
- Official JAX implementation of End-to-End Test-Time Training for Long Context☆102Updated this week
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆227Updated 2 months ago
- [ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models☆927Updated 5 months ago
- Physics of Language Models, Part 4☆281Updated 3 weeks ago
- Simple & Scalable Pretraining for Neural Architecture Research☆305Updated 3 weeks ago
- EvaByte: Efficient Byte-level Language Models at Scale☆112Updated 8 months ago
- ☆205Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆132Updated 2 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆294Updated 7 months ago