A collection of all available inference solutions for the LLMs
☆95Mar 1, 2025Updated last year
Alternatives and similar repositories for llm-inference-solutions
Users that are interested in llm-inference-solutions are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Awesome-SLM: a curated list of Small Language Model☆31Jun 24, 2024Updated last year
- Llama.cpp-qt is a Python-based GUI wrapper for the LLama.cpp server, providing a user-friendly interface for configuring and running the …☆16Oct 4, 2023Updated 2 years ago
- Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which includ…☆36Jan 2, 2025Updated last year
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆190Mar 23, 2026Updated last month
- ☆51May 31, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- flux1非官方的量化模型(flux1 unofficial quantize model)☆11Aug 14, 2024Updated last year
- ☆14Feb 7, 2024Updated 2 years ago
- Advanced Video Graph RAG using SAM2,CLIP,BLIP,Qwen2-VL,YOLO-World ,Neo4j, WebGPU, local LLM☆14Nov 25, 2024Updated last year
- Viva la machina.☆66Apr 22, 2026Updated 2 weeks ago
- A model serving framework for various research and production scenarios. Seamlessly built upon the PyTorch and HuggingFace ecosystem.☆23Oct 11, 2024Updated last year
- Probably one of the lightest native RAG + Agent apps out there,experience the power of Agent-powered models and Agent-driven knowledge ba…☆33May 30, 2025Updated 11 months ago
- Example of a Streamlit data app powered by Vaex☆11Jul 7, 2022Updated 3 years ago
- 5X faster 60% less memory QLoRA finetuning☆21May 28, 2024Updated last year
- These are papers that I read and reviewed related to NLP, CV, and Deep Learning 😉 You can check paper links and my reviews 😊☆13Jan 3, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- NixOps VirtualBox backend [maintainer=@AmineChikhaoui]☆25Aug 10, 2023Updated 2 years ago
- Offline-first, desktop AI assistant tailored for educators, enabling them to generate questions directly from source materials.☆24Aug 2, 2025Updated 9 months ago
- 🤖 AI-powered CLI for file reorganization. Runs fully locally — no data leaves your machine.☆20Jul 2, 2025Updated 10 months ago
- ☆16Nov 22, 2025Updated 5 months ago
- ☆59Aug 19, 2025Updated 8 months ago
- Dataset Resplitting for Generalization in KGQA. See also https://github.com/semantic-systems/KGQA-datasets☆17Jun 29, 2022Updated 3 years ago
- Simple GUI to load a PDF/Docx/txt file and have LM Studio Answer based off of it.☆14Jul 31, 2024Updated last year
- [ICDAR 2024] (Best Student Paper🏆) Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation☆15Sep 6, 2024Updated last year
- mnn asr demo.☆26Mar 24, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- MTTN: Multi-Pair Text to Text Narratives for Prompt Generation☆11Feb 4, 2023Updated 3 years ago
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,771May 21, 2025Updated 11 months ago
- ☆16Feb 10, 2023Updated 3 years ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆273Aug 6, 2025Updated 9 months ago
- Convert Huggingface Pytorch checkpoint to Tensorflow checkpoint☆17Sep 4, 2023Updated 2 years ago
- A simple no-install web UI for Ollama and OAI-Compatible APIs!☆31Jan 30, 2025Updated last year
- llama INT4 cuda inference with AWQ☆54Jan 20, 2025Updated last year
- fine-tuning tutorial☆18Apr 25, 2026Updated 2 weeks ago
- (ACL2025 Findings) Official code for the paper "STeCa: Step-level Trajectory Calibration for LLM Agent Learning"☆27Mar 2, 2026Updated 2 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆250Mar 15, 2024Updated 2 years ago
- QLoRA for Masked Language Modeling☆23Sep 11, 2023Updated 2 years ago
- PyTorch library for cost-effective, fast and easy serving of MoE models.☆303Updated this week
- Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.☆868Jan 15, 2024Updated 2 years ago
- Chain-of-thought 방식을 활용하여 llama2를 fine-tuning☆10Nov 18, 2023Updated 2 years ago
- ☆17Jun 9, 2024Updated last year
- An MCP server that helps you find MCP servers that are listed on PulseMCP.com☆28Apr 23, 2025Updated last year