HazyResearch / zoologyLinks
Understand and test language model architectures on synthetic tasks.
β221Updated last month
Alternatives and similar repositories for zoology
Users that are interested in zoology are comparing it to the libraries listed below
Sorting:
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β240Updated 2 months ago
- A MAD laboratory to improve AI architecture designs π§ͺβ124Updated 8 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β152Updated last month
- β53Updated last year
- Some preliminary explorations of Mamba's context scaling.β216Updated last year
- nanoGPT-like codebase for LLM trainingβ102Updated 3 months ago
- β194Updated 2 weeks ago
- Normalized Transformer (nGPT)β186Updated 9 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)β191Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ128Updated 8 months ago
- β148Updated 2 years ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β244Updated 6 months ago
- Token Omission Via Attentionβ128Updated 10 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)β78Updated last year
- β81Updated last year
- Language models scale reliably with over-training and on downstream tasksβ98Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.β84Updated last month
- β166Updated 2 years ago
- β87Updated last year
- some common Huggingface transformers in maximal update parametrization (Β΅P)β82Updated 3 years ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"β81Updated 9 months ago
- supporting pytorch FSDP for optimizersβ84Updated 8 months ago
- β118Updated last year
- Accelerated First Order Parallel Associative Scanβ187Updated last year
- Multipack distributed sampler for fast padding-free training of LLMsβ199Updated last year
- Triton-based implementation of Sparse Mixture of Experts.β233Updated 8 months ago
- β184Updated last year
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.β70Updated last year
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Modeβ¦β114Updated 11 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ127Updated last year