Understand and test language model architectures on synthetic tasks.
โ257Feb 24, 2026Updated last week
Alternatives and similar repositories for zoology
Users that are interested in zoology are comparing it to the libraries listed below
Sorting:
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"โ248Jun 6, 2025Updated 9 months ago
- A MAD laboratory to improve AI architecture designs ๐งชโ138Dec 17, 2024Updated last year
- HGRN2: Gated Linear RNNs with State Expansionโ56Aug 20, 2024Updated last year
- โ53May 20, 2024Updated last year
- Parallel Associative Scan for Language Modelsโ18Jan 8, 2024Updated 2 years ago
- Official PyTorch Implementation of the Longhorn Deep State Space Modelโ56Dec 4, 2024Updated last year
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Seโฆโ67Apr 24, 2024Updated last year
- โ58Jul 9, 2024Updated last year
- โ36Feb 26, 2024Updated 2 years ago
- Accelerated First Order Parallel Associative Scanโ195Jan 7, 2026Updated 2 months ago
- โ51Jan 28, 2024Updated 2 years ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"โ27Apr 17, 2024Updated last year
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"โ18Mar 15, 2024Updated last year
- Here we will test various linear attention designs.โ62Apr 25, 2024Updated last year
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)โ24Jun 6, 2024Updated last year
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"โ562Dec 28, 2024Updated last year
- ๐ Efficient implementations of state-of-the-art linear attention modelsโ4,474Updated this week
- โ11Oct 11, 2023Updated 2 years ago
- Official Code Repository for the paper "Key-value memory in the brain"โ31Feb 25, 2025Updated last year
- Official code for the paper "Attention as a Hypernetwork"โ51Feb 24, 2026Updated last week
- Experiment of using Tangent to autodiff tritonโ82Jan 22, 2024Updated 2 years ago
- Triton Implementation of HyperAttention Algorithmโ48Dec 11, 2023Updated 2 years ago
- train with kittens!โ63Oct 25, 2024Updated last year
- Annotated version of the Mamba paperโ497Feb 27, 2024Updated 2 years ago
- Combining SOAP and MUONโ19Feb 11, 2025Updated last year
- โ20May 30, 2024Updated last year
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"