HazyResearch / zoologyLinks
Understand and test language model architectures on synthetic tasks.
β217Updated last week
Alternatives and similar repositories for zoology
Users that are interested in zoology are comparing it to the libraries listed below
Sorting:
- A MAD laboratory to improve AI architecture designs π§ͺβ120Updated 6 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β235Updated 2 weeks ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β134Updated this week
- Some preliminary explorations of Mamba's context scaling.β214Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ126Updated 6 months ago
- β53Updated last year
- Normalized Transformer (nGPT)β183Updated 7 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)β190Updated last year
- supporting pytorch FSDP for optimizersβ82Updated 6 months ago
- β78Updated 11 months ago
- Language models scale reliably with over-training and on downstream tasksβ97Updated last year
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β239Updated 4 months ago
- Triton-based implementation of Sparse Mixture of Experts.β217Updated 6 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.β70Updated last week
- Accelerated First Order Parallel Associative Scanβ182Updated 10 months ago
- β190Updated 2 weeks ago
- β166Updated last year
- seqax = sequence modeling + JAXβ159Updated last week
- Token Omission Via Attentionβ127Updated 8 months ago
- Multipack distributed sampler for fast padding-free training of LLMsβ191Updated 10 months ago
- β147Updated 2 years ago
- Mixture of A Million Expertsβ46Updated 10 months ago
- β189Updated this week
- nanoGPT-like codebase for LLM trainingβ98Updated last month
- EvaByte: Efficient Byte-level Language Models at Scaleβ101Updated 2 months ago
- β98Updated 5 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ127Updated last year
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Modeβ¦β108Updated 9 months ago
- WIPβ93Updated 10 months ago
- π₯ A minimal training framework for scaling FLA modelsβ170Updated last week