H-Net: Hierarchical Network with Dynamic Chunking
☆817Nov 20, 2025Updated 3 months ago
Alternatives and similar repositories for hnet
Users that are interested in hnet are comparing it to the libraries listed below
Sorting:
- H-Net Dynamic Hierarchical Architecture☆81Sep 11, 2025Updated 5 months ago
- 🚀 Efficient implementations of state-of-the-art linear attention models☆4,474Updated this week
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆129Jun 24, 2025Updated 8 months ago
- Code for BLT research paper☆2,029Nov 3, 2025Updated 4 months ago
- Muon is an optimizer for hidden layers in neural networks☆2,350Jan 19, 2026Updated last month
- Pretraining and inference code for a large-scale depth-recurrent language model☆865Dec 29, 2025Updated 2 months ago
- The evaluation framework for training-free sparse attention in LLMs☆121Jan 27, 2026Updated last month
- Physics of Language Models: Part 4.2, Canon Layers at Scale where Synthetic Pretraining Resonates in Reality☆327Jan 5, 2026Updated 2 months ago
- FlexiTokens☆18Dec 27, 2025Updated 2 months ago
- Mamba SSM architecture☆17,311Feb 18, 2026Updated 2 weeks ago
- [NeurIPS 2024] Simple and Effective Masked Diffusion Language Model☆639Sep 29, 2025Updated 5 months ago
- Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics☆71Jan 13, 2026Updated last month
- Some preliminary explorations of Mamba's context scaling.☆218Feb 8, 2024Updated 2 years ago
- NanoGPT (124M) in 2 minutes☆4,734Feb 27, 2026Updated last week
- [AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs☆53Dec 7, 2025Updated 3 months ago
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Sep 4, 2025Updated 6 months ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆170Jan 30, 2025Updated last year
- [CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models☆1,409Dec 16, 2025Updated 2 months ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆477Feb 17, 2026Updated 2 weeks ago
- Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)☆693Sep 24, 2025Updated 5 months ago
- ☆208Dec 11, 2024Updated last year
- CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval☆23Jun 28, 2025Updated 8 months ago
- A PyTorch native platform for training generative AI models☆5,111Updated this week
- Fast and memory-efficient exact kmeans☆140Feb 18, 2026Updated 2 weeks ago
- Official PyTorch implementation for "Large Language Diffusion Models"☆3,643Nov 12, 2025Updated 3 months ago
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆476Jan 17, 2025Updated last year
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated 11 months ago
- [ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models☆967Jul 10, 2025Updated 7 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆308Dec 6, 2025Updated 3 months ago
- Tile primitives for speedy kernels☆3,202Feb 24, 2026Updated last week
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆2,181Feb 11, 2026Updated 3 weeks ago
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Models (dLLMs with block diffusion, mixed-CoT, unified RL)☆1,591Feb 14, 2026Updated 3 weeks ago
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆472May 17, 2025Updated 9 months ago
- Annotated version of the Mamba paper☆497Feb 27, 2024Updated 2 years ago
- LLM training in simple, raw C/CUDA☆15Dec 5, 2024Updated last year
- A partial implementation of Generative Infinite Vocabulary Transformer (GIVT) from Google Deepmind, in PyTorch.☆21Mar 28, 2024Updated last year
- [NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL☆2,045Nov 4, 2025Updated 4 months ago
- [ACMMM'2024] Generative Expressive Conversational Speech Synthesis☆44Oct 28, 2024Updated last year
- [ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)☆713Feb 29, 2024Updated 2 years ago