ztxz16 / exvllmLinks
vllm混合推理扩展插件,支持多NUMA混合推理,单卡推理Qwen3-Next模型可达1000+ prefill
☆31Updated 2 months ago
Alternatives and similar repositories for exvllm
Users that are interested in exvllm are comparing it to the libraries listed below
Sorting:
- 一起来养一只拥有专属记忆的AI猫猫吧!☆10Updated last year
- mcp的webui界面,支持客户端连接多个sse服务端,支持 openai、deepseek、qwen等大模型,另外附上构建的 agent的 stdio和sse的简单 天气查询的完整示例☆39Updated 8 months ago
- Built on the robust XTuner backend framework, XTuner Chat GUI offers a user-friendly platform for quick and efficient local model inferen…☆13Updated last year
- LLM智能路由网关、 Enterprise Intelligent AI-API Distribution Gateway☆13Updated last year
- 用大模型批量处理数据,现支持各种大模型做OCR,支持通义千问, 月之暗面, 百度飞桨OCR, OpenAI 和LLAVA。Use LLM to generate or clean data for academic use. Support OCR with qwen, m…☆16Updated last year
- CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter☆21Updated 8 months ago
- 纯c++的全平台llm加速库,支持python调用,支持chatglm-6B, llama, baichuan, moss基座,x86 / ARM☆12Updated last week
- 本项目借助飞桨平台,构建起一套创新的多模型协同系统,实现 PDF 文件到 Markdown 文件的高效、精准转换。☆27Updated 10 months ago
- 一个基于多模态向量模型及视觉多模态模型构建的图片搜索引擎&管理系统,实现精准的以文搜文,文搜图、以图搜图多种智能检索方式。An image search engine management system built upon multimodal vector models…☆77Updated 4 months ago
- You can play any API server that compatible with OpenAI API☆24Updated last year
- xllamacpp - a Python wrapper of llama.cpp☆71Updated 2 weeks ago
- 中文版hf-alignment-handbook,大模型全套sft、dpo、orpo、cpt训练教程.☆14Updated last year
- Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …☆18Updated last year
- ✨🦋 illufly - 【幻蝶】基于记忆蒸馏、资料检索的自我进化智能体☆78Updated last month
- HearSight智能音视频内容分析工具,支持多源视频(包括 URL和上传文件方式)导入能够从输入的视频源中提取上下文信息,从而提供更精准的 AI问答交互。平台基于视频语义单元进行智能切片,用户可通过问答方式灵活调整切片维度,快速定位所需内容同时,HearSight支持自动…☆32Updated last month
- 🔥Your Daily Dose of AI Research from Hugging Face 🔥 Stay updated with the latest AI breakthroughs! This bot automatically collects and…☆56Updated last week
- 使用FastAPI+vLLM部署Qwen2.5☆25Updated last year
- ☆28Updated last year
- Port of Facebook's LLaMA model in C/C++☆67Updated 9 months ago
- GLM Series Edge Models☆156Updated 7 months ago
- ☆52Updated last month
- llms related stuff , including code, docs☆13Updated 11 months ago
- [ACL2025 demo track] ROGRAG: A Robustly Optimized GraphRAG Framework☆194Updated last month
- ☆15Updated last year
- An open-source chat text to control actions agentic workflow framework/showcase powered by Agently AI application development framework.☆29Updated last year
- MCP DeepResearch Server: 基于 LangGraph + Ollama + Tavily 的深度研究服务器,支持异步运行、超时控制与进度推送☆31Updated 7 months ago
- A mini assistant to help you read paper quickly☆54Updated 8 months ago
- Python implementation of AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, w…☆49Updated 10 months ago
- AgileGen: Empowering Agile-Based Generative Software Development through Human-AI Teamwork (accepted by ACM TOSEM)☆23Updated last year
- ☆26Updated 2 months ago