MooreThreads / TurboRAG
☆71Updated 2 months ago
Alternatives and similar repositories for TurboRAG:
Users that are interested in TurboRAG are comparing it to the libraries listed below
- Modular and structured prompt caching for low-latency LLM inference☆87Updated 3 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆107Updated 2 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆84Updated 4 months ago
- LLM Serving Performance Evaluation Harness☆68Updated this week
- PGRAG☆47Updated 7 months ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆125Updated 7 months ago
- ☆45Updated 3 months ago
- ☆30Updated 6 months ago
- StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization☆105Updated last month
- ☆168Updated 2 months ago
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆77Updated last month
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆217Updated this week
- ☆125Updated 3 weeks ago
- ☆42Updated 2 months ago
- ☆77Updated last month
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆152Updated this week
- The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Mem…☆326Updated 10 months ago
- A flexible and efficient training framework for large-scale alignment tasks☆304Updated last week
- A pipeline for LLM knowledge distillation☆91Updated 3 weeks ago
- ☆258Updated 6 months ago
- Evaluation tools for Retrieval-augmented Generation (RAG) methods.☆147Updated 3 months ago
- The driver for LMCache core to run in vLLM☆29Updated 2 weeks ago
- [ACL24] Official repo for "Synthesizing Text-to-SQL Data from Weak and Strong LLMs"☆64Updated 6 months ago
- Repository of LV-Eval Benchmark☆58Updated 5 months ago
- ☆57Updated 2 months ago
- The Multi-Faceted Optimizer for GenAI Workflows☆186Updated this week
- Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception☆114Updated 3 weeks ago
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆35Updated 3 months ago
- ☆220Updated 9 months ago