MooreThreads / TurboRAGLinks

☆90

Alternatives and similar repositories for TurboRAG

Users that are interested in TurboRAG are comparing it to the libraries listed below

Sorting:

yale-sys / prompt-cache
Modular and structured prompt caching for low-latency LLM inference
☆103Updated last year
hao-ai-lab / Dynasor
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
☆209Updated 6 months ago
GenseeAI / cognify
Multi-Faceted AI Agent and Workflow Autotuning. Automatically optimizes LangChain, LangGraph, DSPy programs for better quality, lower exe…
☆266Updated 6 months ago
project-etalon / etalon
LLM Serving Performance Evaluation Harness
☆81Updated 9 months ago
zhaochenyang20 / ModelServer
Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang
☆61Updated last year
codefuse-ai / D2LLM
☆35Updated last year
IAAR-Shanghai / PGRAG
PGRAG
☆51Updated last year
inferflow / inferflow
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
☆249Updated last year
tiannuo-yang / SearchAgent-X
A High-Efficiency System of Large Language Model Based Search Agents
☆75Updated 5 months ago
gomate-community / rageval
Evaluation tools for Retrieval-augmented Generation (RAG) methods.
☆167Updated last year
Bui1dMySea / MemLong
☆95Updated 11 months ago
zenrran4nlp / Awesome-LLM-Inference-Serving
☆46Updated 7 months ago
OpenBMB / CPM.cu
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…
☆205Updated last month
spcl / MRAG
Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"
☆235Updated 2 months ago
jshuadvd / LongRoPE
Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper
☆152Updated last year
LMCache / lmcache-vllm
The driver for LMCache core to run in vLLM
☆58Updated 10 months ago
thunlp / Ouroboros
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
☆112Updated 8 months ago
OpenBMB / RAGEval
☆203Updated 8 months ago
RUC-NLPIR / HiRA
The code for paper: Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search
☆63Updated 5 months ago
thunlp / InfLLM
The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Mem…
☆390Updated last year
Snowflake-Labs / ReFoRCE
A Text-to-SQL Agent with Self-Refinement, Format Restriction, and Column Exploration
☆110Updated 4 months ago
wdlctc / headinfer
☆61Updated 6 months ago
vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆132Updated last week
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆132Updated last year
imagination-research / sot
[ICLR 2024] Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation
☆182Updated last year
TIGER-AI-Lab / LongRAG
Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".
☆242Updated last year
GAIR-NLP / ProX
[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
☆263Updated 4 months ago
QwenLM / WorldPM
☆91Updated 6 months ago
IsaacRe / vllm-kvcompress
KV cache compression for high-throughput LLM inference
☆145Updated 9 months ago
QingFei1 / LongRAG
[EMNLP 2024] LongRAG: A Dual-perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering
☆114Updated 10 months ago