Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang
☆61Nov 8, 2024Updated last year
Alternatives and similar repositories for ModelServer
Users that are interested in ModelServer are comparing it to the libraries listed below
Sorting:
- coded with and corrected by Google Anti-Gravity☆13Nov 23, 2025Updated 3 months ago
- Inference deployment of the llama3☆11Apr 21, 2024Updated last year
- Code for the MTEB leaderboard☆30Feb 4, 2025Updated last year
- OS Signal Handlers in Go☆11Jan 6, 2021Updated 5 years ago
- InfiniBand SR-IOV CNI☆13Feb 13, 2026Updated 2 weeks ago
- ☆15Nov 22, 2023Updated 2 years ago
- A from-scratch re-implementation of Ultralytics YOLOv8☆20Jan 25, 2024Updated 2 years ago
- ggml学习笔记,ggml是一个机器学习的推理框架☆18Mar 24, 2024Updated last year
- 使用Qwen1.5-0.5B-Chat模型进行通用信息抽取任务的微调,旨在: 验证生成式方法相较于抽取式NER的效果; 为新手提供简易的模型微调流程,尽量减少代码量; 大模型训练的数据格式处理。☆15Sep 6, 2024Updated last year
- CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter☆22May 28, 2025Updated 9 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆17Jun 3, 2024Updated last year
- DRAM/SSD hybrid caching system☆14Mar 13, 2025Updated 11 months ago
- ☆14Feb 3, 2022Updated 4 years ago
- Here is a demo for PDF parser (Including OCR, object detection tools)☆36Oct 14, 2024Updated last year
- Tool for converting LLMs from uni-directional to bi-directional by removing causal mask for tasks like classification and sentence embedd…☆63Dec 12, 2024Updated last year
- ☆15Apr 3, 2025Updated 10 months ago
- Yet Another Papers With Code☆35Sep 7, 2025Updated 5 months ago
- Code for Robust Fine-tuning (RbFT)☆17Jan 31, 2025Updated last year
- 用大模型批量处理数据,现支持各种大模型做OCR,支持通义千问, 月之暗面, 百度飞桨OCR, OpenAI 和LLAVA。Use LLM to generate or clean data for academic use. Support OCR with qwen, m…☆16Sep 15, 2024Updated last year
- vllm混合推理扩展插件,支持多NUMA混合推理,单卡推理Qwen3-Next模型可达1000+ prefill☆31Nov 7, 2025Updated 3 months ago
- Getting Started with Triton: A Tutorial for Python Beginners☆37Oct 21, 2025Updated 4 months ago
- Leveraging passage embeddings for efficient listwise reranking with large language models.☆50Dec 7, 2024Updated last year
- Train deepseek r1-like reasoning LLM with ease | 轻松训练1个deepseek r1类的推理LLM☆18Feb 15, 2025Updated last year
- Lighter, cheaper and faster RAG toolkit (Graph RAG) supported by TargetPilot☆46Jun 9, 2025Updated 8 months ago
- 介绍docker、docker compose的使用。☆21Sep 4, 2024Updated last year
- Gemma2(9B), Llama3-8B-Finetune-and-RAG, code base for sample, implemented in Kaggle platform☆22Feb 8, 2025Updated last year
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆55Oct 29, 2024Updated last year
- ☆141Apr 23, 2024Updated last year
- Code for KaLM-Embedding models☆114Jun 30, 2025Updated 8 months ago
- Materials for learning SGLang☆753Jan 5, 2026Updated last month
- ☆155Mar 4, 2025Updated 11 months ago
- A highly contextualized retrieval system integrating Large Language Models (LLMs), embeddings, and a dynamic agent-driven framework. Supp…☆27Sep 24, 2025Updated 5 months ago
- ☆152Jan 9, 2025Updated last year
- This is a meta-model distilled from LLMs for information extraction. This is an intermediate checkpoint that can be well-transferred to a…☆28Feb 23, 2025Updated last year
- ☆26May 11, 2025Updated 9 months ago
- A travel agent based on Qwen2.5, fine-tuned by SFT + DPO/PPO/GRPO using traveling question-answer dataset, a mindmap can be output using …☆56Nov 14, 2025Updated 3 months ago
- Estimate MFU for DeepSeekV3☆26Jan 5, 2025Updated last year
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆64Aug 2, 2024Updated last year
- ☆120Jun 30, 2024Updated last year