MooreThreads / TurboRAG
☆73Updated 3 months ago
Alternatives and similar repositories for TurboRAG:
Users that are interested in TurboRAG are comparing it to the libraries listed below
- Modular and structured prompt caching for low-latency LLM inference☆89Updated 4 months ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆229Updated last month
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆129Updated 8 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆87Updated this week
- ☆181Updated last month
- Simple extension on vLLM to help you speed up reasoning model without training.☆137Updated 2 weeks ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆110Updated 3 months ago