jshuadvd / LongRoPE
Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper
☆128Updated 8 months ago
Alternatives and similar repositories for LongRoPE:
Users that are interested in LongRoPE are comparing it to the libraries listed below
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆206Updated 10 months ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆393Updated 5 months ago
- Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718☆313Updated 5 months ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆229Updated last month
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆246Updated 3 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆140Updated 6 months ago
- ☆263Updated 7 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆152Updated 9 months ago
- A pipeline for LLM knowledge distillation☆98Updated last month
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆454Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆147Updated 6 months ago
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆179Updated 11 months ago
- LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.☆209Updated 7 months ago
- ☆312Updated 6 months ago
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆201Updated last week
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆129Updated 9 months ago
- FuseAI Project☆84Updated last month
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆91Updated last year
- Reformatted Alignment☆115Updated 6 months ago
- A project to improve skills of large language models☆256Updated this week
- The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Mem…☆344Updated 11 months ago
- [EMNLP 2023] Adapting Language Models to Compress Long Contexts☆296Updated 6 months ago
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆179Updated 5 months ago
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆198Updated 3 months ago
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆117Updated last year
- This is the official repository for Inheritune.☆109Updated last month
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆154Updated 9 months ago
- Experiments on speculative sampling with Llama models☆125Updated last year