kmccleary3301/nested_learning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kmccleary3301/nested_learning)

kmccleary3301 / nested_learning

A Reproduction of GDM's Nested Learning Paper

☆704

Alternatives and similar repositories for nested_learning

Users that are interested in nested_learning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

obekt / HOPE-nested-learning
View on GitHub
This is a clean, from-scratch PyTorch implementation of the **HOPE architecture**, based on the groundbreaking paper *"Nested Learning: T…
☆85Jun 11, 2026Updated last month
lucidrains / titans-pytorch
View on GitHub
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
☆1,969Jul 13, 2026Updated last week
fabienfrfr / tptt
View on GitHub
😊 TPTT: Transforming Pretrained Transformers into Titans
☆65Jun 7, 2026Updated last month
test-time-training / e2e
View on GitHub
Official JAX implementation of End-to-End Test-Time Training for Long Context
☆626Feb 15, 2026Updated 5 months ago
tokenbender / mHC-manifold-constrained-hyper-connections
View on GitHub
implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880
☆369Feb 17, 2026Updated 5 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
SakanaAI / continuous-thought-machines
View on GitHub
Continuous Thought Machines, because thought takes time and reasoning is a process.
☆2,000Dec 29, 2025Updated 6 months ago
test-time-training / discover
View on GitHub
☆611May 24, 2026Updated 2 months ago
ankit-vaidya19 / Share
View on GitHub
The Official PyTorch implementation of Shared LoRA Subspaces for almost Strict Continual Learning
☆33May 7, 2026Updated 2 months ago
deepseek-ai / Engram
View on GitHub
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
☆4,561Jan 14, 2026Updated 6 months ago
goombalab / hnet
View on GitHub
H-Net: Hierarchical Network with Dynamic Chunking
☆869Nov 20, 2025Updated 8 months ago
ByteDance-Seed / In-Place-TTT
View on GitHub
☆248Apr 21, 2026Updated 3 months ago
mzbac / qlora-inference-multi-gpu
View on GitHub
☆14May 25, 2023Updated 3 years ago
KellerJordan / Muon
View on GitHub
Muon is an optimizer for hidden layers in neural networks
☆2,731May 24, 2026Updated 2 months ago
MatN23 / AdaptiveTrainingSystem
View on GitHub
A PyTorch framework for training transformer language models with Mixture of Experts (MoE) architecture support, Mixture of Depths (MoD),…
☆21Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
facebookresearch / vjepa2
View on GitHub
PyTorch code and models for VJEPA2 self-supervised learning from video.
☆4,392Mar 23, 2026Updated 4 months ago
tyler-romero / nanogpt-speedrun
View on GitHub
NanoGPT (124M) as fast as possible
☆20Apr 15, 2025Updated last year
HazyResearch / cartridges
View on GitHub
Storing long contexts in tiny caches with self-study
☆305Mar 23, 2026Updated 4 months ago
idanshen / Self-Distillation
View on GitHub
☆663Apr 7, 2026Updated 3 months ago
fla-org / flash-linear-attention
View on GitHub
🚀 Efficient implementations for emerging model architectures
☆5,414Updated this week
DreamLM / Dream
View on GitHub
Dream 7B, a large diffusion language model
☆1,255Nov 21, 2025Updated 8 months ago
kuleshov-group / bd3lms
View on GitHub
[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
☆1,024Jul 10, 2025Updated last year
ZHZisZZ / dllm
View on GitHub
dLLM: Simple Diffusion Language Modeling
☆2,653Jul 17, 2026Updated last week
rohinmanvi / Capability-Aware-and-Mid-Generation-Self-Evaluations
View on GitHub
☆21Jul 25, 2025Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
state-spaces / mamba
View on GitHub
Mamba SSM architecture
☆18,662Updated this week
jaepil / geometric-adam
View on GitHub
A Ray Tracing-Inspired Approach to Neural Network Optimization
☆17Jun 11, 2025Updated last year
tilde-research / nitrobrew-release
View on GitHub
Fused KL divergence from hidden states for knowledge distillation
☆19Apr 28, 2026Updated 2 months ago
metauto-ai / HGM
View on GitHub
🧬 The Huxley-Gödel Machine
☆404Feb 7, 2026Updated 5 months ago
SakanaAI / text-to-lora
View on GitHub
Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input
☆1,296Jun 8, 2025Updated last year
galilai-group / lejepa
View on GitHub
☆1,287Jan 25, 2026Updated 6 months ago
NVlabs / GatedDeltaNet
View on GitHub
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆630Mar 13, 2026Updated 4 months ago
thu-ml / i-DODE
View on GitHub
Official code for "Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs" (ICML 2023)
☆19Jan 27, 2026Updated 5 months ago
olivkoch / TinyRecursiveModels
View on GitHub
☆35Nov 11, 2025Updated 8 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
yifanzhang-pro / deep-delta-learning
View on GitHub
Official Project Page for Deep Delta Learning (https://arxiv.org/abs/2601.00417)
☆356Jun 15, 2026Updated last month
algorithmicsuperintelligence / openevolve
View on GitHub
Open-source implementation of AlphaEvolve
☆6,794Jul 18, 2026Updated last week
facebookresearch / coconut
View on GitHub
Training Large Language Model to Reason in a Continuous Latent Space
☆1,667Jul 2, 2026Updated 3 weeks ago
ML-GSAI / LLaDA
View on GitHub
Official PyTorch implementation for "Large Language Diffusion Models"
☆3,912Jul 15, 2026Updated last week
hustvl / MoDA
View on GitHub
An hardware-aware Efficient Implementation for "Mixture-of-Depths Attention".
☆274May 6, 2026Updated 2 months ago
lucas-maes / le-wm
View on GitHub
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
☆4,202May 26, 2026Updated 2 months ago
Chengsong-Huang / R-Zero
View on GitHub
[ICLR2026] codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)
☆823Feb 4, 2026Updated 5 months ago