A collection of all available inference solutions for the LLMs
☆94Mar 1, 2025Updated last year
Alternatives and similar repositories for llm-inference-solutions
Users that are interested in llm-inference-solutions are comparing it to the libraries listed below
Sorting:
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- flux1非官方的量化模型(flux1 unofficial quantize model)☆12Aug 14, 2024Updated last year
- A powerful, custom opencode configuration, complete with a suite of agents, commands, rules, skills, and a pre-configured MCP server. It'…☆58Updated this week
- Llama.cpp-qt is a Python-based GUI wrapper for the LLama.cpp server, providing a user-friendly interface for configuring and running the …☆16Oct 4, 2023Updated 2 years ago
- 🤖 AI-powered CLI for file reorganization. Runs fully locally — no data leaves your machine.☆20Jul 2, 2025Updated 8 months ago
- NixOps VirtualBox backend [maintainer=@AmineChikhaoui]☆25Aug 10, 2023Updated 2 years ago
- Awesome-SLM: a curated list of Small Language Model☆28Jun 24, 2024Updated last year
- Kubernetes 中的 gRPC 负载均衡☆13Dec 15, 2021Updated 4 years ago
- Offline-first, desktop AI assistant tailored for educators, enabling them to generate questions directly from source materials.☆23Aug 2, 2025Updated 7 months ago
- [⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI☆49Jun 25, 2025Updated 8 months ago
- ATAT is an email client for AI Agents. Deploy dozens of AI agents through a single email address (IMAP/SMTP) using the OpenAI API. Just a…☆30Feb 18, 2025Updated last year
- A simple speech-to-text and text-to-speech AI chatbot that can be run fully offline.☆45Jan 28, 2024Updated 2 years ago
- ☆51May 31, 2024Updated last year
- Add ipython magic commands to Jupyter notebooks that provide LLM-driven enhancements☆22Jul 1, 2024Updated last year
- Python client library for improving your LLM app accuracy☆96Feb 11, 2025Updated last year
- Open Source AI with Granite and Granite Code☆27Oct 6, 2025Updated 5 months ago
- A simple no-install web UI for Ollama and OAI-Compatible APIs!☆31Jan 30, 2025Updated last year
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆251Mar 15, 2024Updated last year
- Extract information, summarize, ask questions, and search videos using OpenAI's Vision API 🚀🎦☆62Nov 7, 2023Updated 2 years ago
- GPT-4o-Realtime based AI Podcast Generator☆38Oct 18, 2024Updated last year
- The GPT-4o Research Assistant is a tool designed to leverage the power of GPT-4o in assisting with academic research. It searches for aca…☆119Jan 12, 2025Updated last year
- Probably one of the lightest native RAG + Agent apps out there,experience the power of Agent-powered models and Agent-driven knowledge ba…☆32May 30, 2025Updated 9 months ago
- Text-to-Speech (TTS) engine for the Armenian language☆12Sep 29, 2024Updated last year
- Running Microsoft's BitNet inference framework via FastAPI, Uvicorn and Docker.☆36Jul 2, 2025Updated 8 months ago
- LLM-powered Q/A over arXiv preprints☆32Apr 5, 2023Updated 2 years ago
- Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.☆866Jan 15, 2024Updated 2 years ago
- Use Codestral Mamba with Visual Studio Code and the Continue extension. A local LLM alternative to GitHub Copilot.☆29Jul 18, 2024Updated last year
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,732May 21, 2025Updated 9 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆150Jan 7, 2026Updated 2 months ago
- Efficient non-uniform quantization with GPTQ for GGUF☆61Sep 17, 2025Updated 5 months ago
- Run Ollama LLM models in Google Colab for free☆38Nov 24, 2024Updated last year
- Experience the power of AI with this free AI voice generator demo. Utilizing Deepgram and Groq, we transform text into voice seamlessly. …☆37Jun 12, 2024Updated last year
- LLM Serving Performance Evaluation Harness☆83Feb 25, 2025Updated last year
- LCM Drawing app☆12Dec 1, 2023Updated 2 years ago
- This repository contains the registries for components, agents and services, the second part of the autonolas-v1 protocol.☆15Updated this week
- This repo is for the Linkedin Learning course: Creating GitHub Portfolios☆10Oct 3, 2023Updated 2 years ago
- fine-tuning tutorial☆18Feb 20, 2026Updated 2 weeks ago
- AI system powered by large language models.☆33Updated this week
- Natural language control for Python CLI tools using locally-trained SLMs (CPU inference)☆30Feb 21, 2026Updated 2 weeks ago