MooreThreads / TurboRAGLinks
☆92Updated last year
Alternatives and similar repositories for TurboRAG
Users that are interested in TurboRAG are comparing it to the libraries listed below
Sorting:
- Modular and structured prompt caching for low-latency LLM inference☆110Updated last year
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆217Updated 7 months ago
- LLM Serving Performance Evaluation Harness☆82Updated 10 months ago
- Multi-Faceted AI Agent and Workflow Autotuning. Automatically optimizes LangChain, LangGraph, DSPy programs for better quality, lower exe…☆266Updated 8 months ago
- A High-Efficiency System of Large Language Model Based Search Agents☆74Updated 6 months ago
- ☆63Updated 8 months ago
- ☆47Updated 8 months ago
- PGRAG☆52Updated last year
- ☆96Updated last year
- ☆34Updated last year
- The driver for LMCache core to run in vLLM☆58Updated 11 months ago
- Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".☆241Updated last year
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆113Updated 9 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆136Updated last year
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆250Updated last year
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆216Updated 3 months ago
- Evaluation tools for Retrieval-augmented Generation (RAG) methods.☆167Updated last year
- A Comprehensive Library for Memory of LLM-based Agents.☆95Updated 8 months ago
- Self-host LLMs with LMDeploy and BentoML☆22Updated 3 weeks ago
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆61Updated last year
- ☆83Updated last year
- The code for paper: Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search☆63Updated 6 months ago
- The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Mem…☆397Updated last year
- KV cache compression for high-throughput LLM inference☆148Updated 11 months ago
- The code for LaRA Benchmark☆46Updated 7 months ago
- Benchmark baseline for retrieval qa applications☆118Updated last year
- ☆27Updated 9 months ago
- [ICLR 2024] Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation☆184Updated last year
- A Text-to-SQL Agent with Self-Refinement, Format Restriction, and Column Exploration☆118Updated 5 months ago
- [EMNLP 2024] LongRAG: A Dual-perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering☆118Updated 11 months ago