HazyResearch / zoologyLinks
Understand and test language model architectures on synthetic tasks.
โ226Updated last week
Alternatives and similar repositories for zoology
Users that are interested in zoology are comparing it to the libraries listed below
Sorting:
- A MAD laboratory to improve AI architecture designs ๐งชโ129Updated 9 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"โ240Updated 3 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ164Updated 3 months ago
- Some preliminary explorations of Mamba's context scaling.โ217Updated last year
- โ196Updated last month
- โ53Updated last year
- nanoGPT-like codebase for LLM trainingโ107Updated 4 months ago
- Normalized Transformer (nGPT)โ191Updated 10 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)โ193Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersโ130Updated 10 months ago
- โ149Updated 2 years ago
- โ83Updated last year
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"โ246Updated 8 months ago
- Token Omission Via Attentionโ128Updated 11 months ago
- โ166Updated 2 years ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"โ84Updated 11 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.โ89Updated 2 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)โ78Updated last year
- โ89Updated last year
- Multipack distributed sampler for fast padding-free training of LLMsโ201Updated last year
- Language models scale reliably with over-training and on downstream tasksโ100Updated last year
- Accelerated First Order Parallel Associative Scanโ188Updated last year
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023โ136Updated last year
- JAX bindings for Flash Attention v2โ92Updated 3 weeks ago
- โ102Updated 2 months ago
- some common Huggingface transformers in maximal update parametrization (ยตP)โ82Updated 3 years ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingโ132Updated last year
- Physics of Language Models, Part 4โ247Updated 2 months ago
- โ122Updated last year
- supporting pytorch FSDP for optimizersโ84Updated 9 months ago