H-Net Dynamic Hierarchical Architecture
☆81Sep 11, 2025Updated 5 months ago
Alternatives and similar repositories for hnet-old
Users that are interested in hnet-old are comparing it to the libraries listed below
Sorting:
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆14Apr 30, 2025Updated 10 months ago
- ☆19Dec 4, 2025Updated 3 months ago
- ☆67Mar 21, 2025Updated 11 months ago
- Code for the paper "Function-Space Learning Rates"☆25Jun 3, 2025Updated 9 months ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- Minimal Transformer base in JAX. A single backbone for language modelling, diffusion, classification, etc...☆14May 28, 2025Updated 9 months ago
- smolLM with Entropix sampler on pytorch☆149Oct 31, 2024Updated last year
- Lightweight package that tracks and summarizes code changes using LLMs (Large Language Models)☆34Feb 27, 2025Updated last year
- ☆15Mar 2, 2025Updated last year
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆129Jun 24, 2025Updated 8 months ago
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆29Jul 30, 2020Updated 5 years ago
- PathPiece tokenizer☆13Nov 10, 2024Updated last year
- MLX binary vectors and associated algorithms.☆14Mar 13, 2025Updated 11 months ago
- Training Models Daily☆16Dec 19, 2023Updated 2 years ago
- look how they massacred my boy☆63Oct 16, 2024Updated last year
- Schedule free optimiser implemented in JAX using Optimistix☆15May 29, 2024Updated last year
- Minimal Implimentation of VCRec (2024) for collapse provention.☆18Jan 28, 2025Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆52Feb 24, 2026Updated last week
- H-Net: Hierarchical Network with Dynamic Chunking☆819Nov 20, 2025Updated 3 months ago
- Fork of Flame repo for training of some new stuff in development☆19Feb 27, 2026Updated last week
- ☆24Dec 11, 2024Updated last year
- ROSA+: RWKV's ROSA implementation with fallback statistical predictor☆34Oct 13, 2025Updated 4 months ago
- High quality implementations of imitation and inverse reinforcement learning algorithms☆23Aug 19, 2025Updated 6 months ago
- Entropy Based Sampling and Parallel CoT Decoding☆17Oct 9, 2024Updated last year
- Custom triton kernels for training Karpathy's nanoGPT.☆19Oct 21, 2024Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆92Oct 30, 2024Updated last year
- DeMo: Decoupled Momentum Optimization☆198Dec 2, 2024Updated last year
- LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence☆61Feb 21, 2022Updated 4 years ago
- Next-gen Foundation Model for Embodied AI☆25Nov 21, 2025Updated 3 months ago
- Where we keep our notes about model training runs.☆16Mar 12, 2023Updated 2 years ago
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 8 months ago
- ☆136May 29, 2025Updated 9 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆110Mar 7, 2025Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆188Jan 19, 2026Updated last month
- Mapping out the "memory" of neural nets with data attribution☆47Updated this week
- ☆123Feb 4, 2026Updated last month
- Stick-breaking attention☆62Jul 1, 2025Updated 8 months ago
- ☆63Oct 3, 2024Updated last year
- WIP☆94Aug 13, 2024Updated last year