HazyResearch / zoologyLinks
Understand and test language model architectures on synthetic tasks.
โ219Updated last month
Alternatives and similar repositories for zoology
Users that are interested in zoology are comparing it to the libraries listed below
Sorting:
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"โ237Updated last month
- A MAD laboratory to improve AI architecture designs ๐งชโ123Updated 6 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ141Updated 2 weeks ago
- โ53Updated last year
- Some preliminary explorations of Mamba's context scaling.โ214Updated last year
- โ191Updated this week
- nanoGPT-like codebase for LLM trainingโ99Updated last month
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)โ190Updated last year
- โ259Updated this week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersโ127Updated 7 months ago
- โ81Updated last year
- Token Omission Via Attentionโ128Updated 8 months ago
- Normalized Transformer (nGPT)โ184Updated 7 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"โ243Updated 5 months ago
- โ79Updated last year
- โ166Updated 2 years ago
- โ112Updated last year
- some common Huggingface transformers in maximal update parametrization (ยตP)โ81Updated 3 years ago
- Simple and efficient pytorch-native transformer training and inference (batched)โ77Updated last year
- Accelerated First Order Parallel Associative Scanโ182Updated 10 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.โ78Updated 3 weeks ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingโ129Updated last year
- โ147Updated 2 years ago
- Language models scale reliably with over-training and on downstream tasksโ97Updated last year
- Griffin MQA + Hawk Linear RNN Hybridโ87Updated last year
- supporting pytorch FSDP for optimizersโ82Updated 7 months ago
- Triton-based implementation of Sparse Mixture of Experts.โ224Updated 7 months ago
- Multipack distributed sampler for fast padding-free training of LLMsโ194Updated 11 months ago
- JAX bindings for Flash Attention v2โ90Updated 11 months ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.โ68Updated 11 months ago