goombalab/hnet

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/goombalab/hnet)

goombalab / hnet

H-Net: Hierarchical Network with Dynamic Chunking

☆869

Alternatives and similar repositories for hnet

Users that are interested in hnet are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

main-horse / hnet-old
View on GitHub
H-Net Dynamic Hierarchical Architecture
☆81Sep 11, 2025Updated 10 months ago
lucidrains / h-net-dynamic-chunking
View on GitHub
Implementation of the dynamic chunking mechanism in H-net by Hwang et al. of Carnegie Mellon
☆79Jun 14, 2026Updated last month
main-horse / hnet-impl
View on GitHub
Trainable H-Net Package
☆34Sep 3, 2025Updated 10 months ago
fla-org / flash-linear-attention
View on GitHub
🚀 Efficient implementations for emerging model architectures
☆5,463Updated this week
facebookresearch / PhysicsLM4
View on GitHub
Physics of Language Models: Part 4.2, Canon Layers at Scale where Synthetic Pretraining Resonates in Reality
☆356May 20, 2026Updated 2 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
facebookresearch / blt
View on GitHub
Code for BLT research paper
☆2,053Nov 3, 2025Updated 8 months ago
raymin0223 / mixture_of_recursions
View on GitHub
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation (NeurIPS 2025)
☆579Sep 26, 2025Updated 10 months ago
HanGuo97 / log-linear-attention
View on GitHub
☆284Jun 6, 2025Updated last year
KellerJordan / Muon
View on GitHub
Muon is an optimizer for hidden layers in neural networks
☆2,747May 24, 2026Updated 2 months ago
goombalab / hydra
View on GitHub
Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"
☆175Jan 30, 2025Updated last year
declare-lab / EFLA
View on GitHub
Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics
☆76Mar 26, 2026Updated 4 months ago
microsoft / ArchScale
View on GitHub
Simple & Scalable Pretraining for Neural Architecture Research
☆340Mar 31, 2026Updated 3 months ago
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,076Updated this week
KellerJordan / modded-nanogpt
View on GitHub
NanoGPT (124M) in 90 seconds
☆5,600Updated this week
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
nil0x9 / flash-muon
View on GitHub
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆258Jun 15, 2025Updated last year
state-spaces / mamba
View on GitHub
Mamba SSM architecture
☆18,675Jul 22, 2026Updated last week
seal-rg / recurrent-pretraining
View on GitHub
Pretraining and inference code for a large-scale depth-recurrent language model
☆903Dec 29, 2025Updated 7 months ago
lucidrains / titans-pytorch
View on GitHub
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
☆1,970Jul 13, 2026Updated 2 weeks ago
tilde-research / nsa-release
View on GitHub
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆133Jun 24, 2025Updated last year
kuleshov-group / bd3lms
View on GitHub
[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
☆1,025Jul 10, 2025Updated last year
ML-GSAI / LLaDA
View on GitHub
Official PyTorch implementation for "Large Language Diffusion Models"
☆3,917Jul 15, 2026Updated 2 weeks ago
Doraemonzzz / xmixers
View on GitHub
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Sep 4, 2025Updated 10 months ago
fla-org / flame
View on GitHub
🔥 A minimal training framework for scaling FLA models
☆406Apr 22, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
test-time-training / e2e
View on GitHub
Official JAX implementation of End-to-End Test-Time Training for Long Context
☆627Feb 15, 2026Updated 5 months ago
HazyResearch / cartridges
View on GitHub
Storing long contexts in tiny caches with self-study
☆305Mar 23, 2026Updated 4 months ago
JinjieNi / dlms-are-super-data-learners
View on GitHub
The official github repo for "Diffusion Language Models are Super Data Learners".
☆227Nov 6, 2025Updated 8 months ago
NVlabs / GatedDeltaNet
View on GitHub
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆634Mar 13, 2026Updated 4 months ago
HazyResearch / based
View on GitHub
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆256Jun 6, 2025Updated last year
wmn-231314 / diffusion-data-constraint
View on GitHub
Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…
☆127Jan 10, 2026Updated 6 months ago
zhixuan-lin / forgetting-transformer
View on GitHub
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning
☆150Feb 25, 2026Updated 5 months ago
Dao-AILab / gemm-cublas
View on GitHub
☆22May 5, 2025Updated last year
Dao-AILab / grouped-latent-attention
View on GitHub
☆136May 29, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
srush / annotated-mamba
View on GitHub
Annotated version of the Mamba paper
☆501Feb 27, 2024Updated 2 years ago
apple / ml-flextok
View on GitHub
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
☆322Jun 2, 2025Updated last year
a1600012888 / LaCT
View on GitHub
Code release for paper "Test-Time Training Done Right"
☆499Jan 5, 2026Updated 6 months ago
MoonshotAI / Kimi-Linear
View on GitHub
☆1,521Nov 17, 2025Updated 8 months ago
sustcsonglin / linear-attention-and-beyond-slides
View on GitHub
☆119Feb 25, 2025Updated last year
buoyancy99 / diffusion-forcing
View on GitHub
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
☆1,280Jul 6, 2026Updated 3 weeks ago
Gen-Verse / MMaDA
View on GitHub
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models (dLLMs with block diffusion, mixed-CoT, unified RL)
☆1,660Feb 14, 2026Updated 5 months ago