HazyResearch / zoologyLinks
Understand and test language model architectures on synthetic tasks.
β237Updated last month
Alternatives and similar repositories for zoology
Users that are interested in zoology are comparing it to the libraries listed below
Sorting:
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β241Updated 5 months ago
- A MAD laboratory to improve AI architecture designs π§ͺβ132Updated 10 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β171Updated 4 months ago
- Some preliminary explorations of Mamba's context scaling.β216Updated last year
- β53Updated last year
- β204Updated 2 weeks ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)β193Updated last year
- Language models scale reliably with over-training and on downstream tasksβ100Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ130Updated 11 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)β78Updated last year
- β149Updated 2 years ago
- nanoGPT-like codebase for LLM trainingβ110Updated this week
- β91Updated last year
- Accelerated First Order Parallel Associative Scanβ189Updated last year
- Normalized Transformer (nGPT)β192Updated 11 months ago
- β185Updated last year
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ132Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"β84Updated last year
- Token Omission Via Attentionβ127Updated last year
- β166Updated 2 years ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β249Updated 9 months ago
- β83Updated last year
- some common Huggingface transformers in maximal update parametrization (Β΅P)β86Updated 3 years ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.β92Updated 3 months ago
- β121Updated last year
- seqax = sequence modeling + JAXβ168Updated 3 months ago
- A toolkit for scaling law research ββ53Updated 9 months ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.β70Updated last year
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Modeβ¦β116Updated last year
- supporting pytorch FSDP for optimizersβ83Updated 11 months ago