Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang
☆61Nov 8, 2024Updated last year
Alternatives and similar repositories for ModelServer
Users that are interested in ModelServer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across vLLM, SGLang, TRT-LLM, OpenAI, Gemini & more. Indus…☆108Updated this week
- ☆23May 30, 2025Updated 9 months ago
- ☆150Jan 9, 2025Updated last year
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆55Oct 29, 2024Updated last year
- ☆155Mar 4, 2025Updated last year
- Materials for learning SGLang☆775Jan 5, 2026Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆17Jun 3, 2024Updated last year
- Getting Started with Triton: A Tutorial for Python Beginners☆45Oct 21, 2025Updated 5 months ago
- DeepSeek-V3/R1 inference performance simulator☆189Mar 27, 2025Updated 11 months ago
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆33Nov 29, 2024Updated last year
- Memory footprint reduction for transformer models☆11Jan 24, 2023Updated 3 years ago
- Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…☆397Updated this week
- Code for the MTEB leaderboard☆30Feb 4, 2025Updated last year
- Here is a demo for PDF parser (Including OCR, object detection tools)☆36Oct 14, 2024Updated last year
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Nov 11, 2024Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- Implementation of algorithms for memory optimized deep neural network training☆10Jul 23, 2020Updated 5 years ago
- 使用Qwen1.5-0.5B-Chat模型进行通用信息抽取任务的微调,旨在: 验证生成式方法相较于抽取式NER的效果; 为新手提供简易的模型微调流程,尽量减少代码量; 大模型训练的数据格式处理。☆15Sep 6, 2024Updated last year
- ☆15Nov 22, 2023Updated 2 years ago
- 📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉☆526Updated this week
- 📚 LaTeX templates and tools for creating beautiful, structured documents 📝☆14Oct 24, 2025Updated 4 months ago
- Long Context Research☆29Jan 26, 2026Updated last month
- InfiniBand SR-IOV CNI☆13Mar 17, 2026Updated last week
- Python C++ Code Manager☆15Sep 29, 2024Updated last year
- [EMNLP 2024 Findings] Benchmarking Language Model Agents for Data-Driven Science☆35Oct 25, 2024Updated last year
- https://bbuf.github.io/gpu-glossary-zh/☆26Nov 7, 2025Updated 4 months ago
- coded with and corrected by Google Anti-Gravity☆13Nov 23, 2025Updated 4 months ago
- Tool for converting LLMs from uni-directional to bi-directional by removing causal mask for tasks like classification and sentence embedd…☆65Dec 12, 2024Updated last year
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,953Updated this week
- Apply Iprompt on GLM with innovative new methods. Currently support Chinese QA, English QA and Chinese poem generation.☆20Jun 16, 2022Updated 3 years ago
- FlashInfer: Kernel Library for LLM Serving☆5,194Updated this week
- ☆530Feb 10, 2026Updated last month
- Implement Flash Attention using Cute.☆102Dec 17, 2024Updated last year
- Yet Another Papers With Code☆37Sep 7, 2025Updated 6 months ago
- CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter☆23May 28, 2025Updated 9 months ago
- Code for Robust Fine-tuning (RbFT)☆17Jan 31, 2025Updated last year
- ☆18Mar 18, 2024Updated 2 years ago
- A highly contextualized retrieval system integrating Large Language Models (LLMs), embeddings, and a dynamic agent-driven framework. Supp…☆27Sep 24, 2025Updated 6 months ago