TobiasNorlund / retro
Official repo to On the Generalization Ability of Retrieval-Enhanced Transformers
☆35Updated 3 months ago
Related projects: ⓘ
- ☆34Updated this week
- ☆66Updated 3 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆36Updated 8 months ago
- ☆129Updated last year
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆84Updated 7 months ago
- ☆31Updated last year
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference☆58Updated this week
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆56Updated last year
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆81Updated 2 weeks ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆51Updated 3 months ago
- ☆38Updated 5 months ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆87Updated last year
- Explorations into some recent techniques surrounding speculative decoding☆190Updated 11 months ago
- ☆61Updated 3 weeks ago
- Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers☆183Updated last month
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆48Updated last month
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆57Updated 5 months ago
- ☆69Updated 4 months ago
- A toolkit for scaling law research ⚖☆41Updated 6 months ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆123Updated 4 months ago
- ☆65Updated 9 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆166Updated last month
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆73Updated last month
- Simple and efficient pytorch-native transformer training and inference (batched)☆53Updated 5 months ago
- ☆83Updated 3 weeks ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆47Updated 2 weeks ago
- some common Huggingface transformers in maximal update parametrization (µP)☆76Updated 2 years ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting☆60Updated 6 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆104Updated 3 months ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆77Updated last year