dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆122Updated last year
Alternatives and similar repositories for llama-ssp:
Users that are interested in llama-ssp are comparing it to the libraries listed below
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆204Updated 7 months ago
- Code repository for the c-BTM paper☆105Updated last year
- Multipack distributed sampler for fast padding-free training of LLMs☆184Updated 5 months ago
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆186Updated last month
- Explorations into some recent techniques surrounding speculative decoding☆229Updated 3 weeks ago
- ☆190Updated last month
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆297Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆263Updated last year
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆76Updated last year
- ☆124Updated 11 months ago
- A simple unified framework for evaluating LLMs☆164Updated 3 weeks ago
- ☆107Updated 3 months ago
- scalable and robust tree-based speculative decoding algorithm☆329Updated 5 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆114Updated 7 months ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆90Updated last year
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆115Updated last year
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆125Updated 5 months ago
- A pipeline for LLM knowledge distillation☆83Updated 5 months ago
- PB-LLM: Partially Binarized Large Language Models☆150Updated last year
- Pre-training code for Amber 7B LLM☆160Updated 8 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆145Updated 6 months ago
- batched loras☆336Updated last year
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆146Updated last month
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆106Updated 5 months ago
- [ICML 2024] CLLMs: Consistency Large Language Models☆368Updated 2 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆139Updated 3 months ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆377Updated 3 months ago
- Codes for the paper "