microsoft / LongRoPE
LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.
☆82Updated 3 weeks ago
Related projects: ⓘ
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆69Updated 6 months ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆119Updated 2 months ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆87Updated last year
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆106Updated this week
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆118Updated 2 weeks ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆195Updated 4 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆114Updated 2 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆123Updated 6 months ago
- ☆117Updated 7 months ago
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆55Updated this week
- Explorations into some recent techniques surrounding speculative decoding☆190Updated 11 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆123Updated 3 months ago
- This is the official repository for Inheritune.☆89Updated 4 months ago
- ☆75Updated this week
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆133Updated 3 months ago
- Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆130Updated this week
- Low-bit optimizers for PyTorch☆109Updated 11 months ago
- Expert Specialized Fine-Tuning☆129Updated last month
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆107Updated last year
- A pipeline to improve skills of large language models☆149Updated this week
- Unofficial implementation of AlpaGasus☆83Updated 11 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting☆60Updated 6 months ago
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆158Updated 4 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆48Updated last week
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆72Updated 8 months ago
- ☆164Updated 4 months ago
- Official implementation for the paper "LongEmbed: Extending Embedding Models for Long Context Retrieval"☆108Updated 4 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆87Updated 8 months ago
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆45Updated 4 months ago
- Multipack distributed sampler for fast padding-free training of LLMs☆170Updated last month