mani-kantap / llm-inference-solutionsView external linksLinks
A collection of all available inference solutions for the LLMs
☆94Mar 1, 2025Updated 11 months ago
Alternatives and similar repositories for llm-inference-solutions
Users that are interested in llm-inference-solutions are comparing it to the libraries listed below
Sorting:
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- flux1非官方的量化模型(flux1 unofficial quantize model)☆12Aug 14, 2024Updated last year
- Inference Llama 2 with a model compiled to native code by TorchInductor☆14Feb 8, 2024Updated 2 years ago
- 🤖 AI-powered CLI for file reorganization. Runs fully locally — no data leaves your machine.☆19Jul 2, 2025Updated 7 months ago
- Llama.cpp-qt is a Python-based GUI wrapper for the LLama.cpp server, providing a user-friendly interface for configuring and running the …☆16Oct 4, 2023Updated 2 years ago
- NixOps VirtualBox backend [maintainer=@AmineChikhaoui]☆25Aug 10, 2023Updated 2 years ago
- Offline-first, desktop AI assistant tailored for educators, enabling them to generate questions directly from source materials.☆23Aug 2, 2025Updated 6 months ago
- A micro LLM multi-agent system for data analysis☆17Apr 27, 2025Updated 9 months ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆184Apr 2, 2025Updated 10 months ago
- ATAT is an email client for AI Agents. Deploy dozens of AI agents through a single email address (IMAP/SMTP) using the OpenAI API. Just a…☆29Feb 18, 2025Updated 11 months ago
- This project allows you to plug in a GitHub repository URL, generate vectors for a LLM and use ChatGPT models to interact. The main frame…☆19Jun 4, 2023Updated 2 years ago
- A simple speech-to-text and text-to-speech AI chatbot that can be run fully offline.☆45Jan 28, 2024Updated 2 years ago
- QLoRA for Masked Language Modeling☆22Sep 11, 2023Updated 2 years ago
- mnn asr demo.☆25Mar 24, 2025Updated 10 months ago
- ☆51May 31, 2024Updated last year
- llama INT4 cuda inference with AWQ☆54Jan 20, 2025Updated last year
- Add ipython magic commands to Jupyter notebooks that provide LLM-driven enhancements☆22Jul 1, 2024Updated last year
- Python client library for improving your LLM app accuracy☆96Feb 11, 2025Updated last year
- A simple no-install web UI for Ollama and OAI-Compatible APIs!☆31Jan 30, 2025Updated last year
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆250Mar 15, 2024Updated last year
- This repo provides a simple Gradio UI to run Qwen2 VL 72B AWQ in venv and have both image and video inferencing work.☆33Oct 3, 2024Updated last year
- Extract information, summarize, ask questions, and search videos using OpenAI's Vision API 🚀🎦☆62Nov 7, 2023Updated 2 years ago
- ☆17Sep 1, 2024Updated last year
- Probably one of the lightest native RAG + Agent apps out there,experience the power of Agent-powered models and Agent-driven knowledge ba…☆32May 30, 2025Updated 8 months ago
- Text-to-Speech (TTS) engine for the Armenian language☆12Sep 29, 2024Updated last year
- Simple & Scalable Pretraining for Neural Architecture Research☆308Dec 6, 2025Updated 2 months ago
- LLM-powered Q/A over arXiv preprints☆32Apr 5, 2023Updated 2 years ago
- Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.☆867Jan 15, 2024Updated 2 years ago
- Assignments of "Machine Learning Engineering for Production (MLOps) Specialization" by Coursera (https://www.coursera.org/specializations…☆28Oct 5, 2021Updated 4 years ago
- Use Codestral Mamba with Visual Studio Code and the Continue extension. A local LLM alternative to GitHub Copilot.☆29Jul 18, 2024Updated last year
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,719May 21, 2025Updated 8 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆150Jan 7, 2026Updated last month
- AI system powered by large language models.☆33Feb 8, 2026Updated last week
- Experience the power of AI with this free AI voice generator demo. Utilizing Deepgram and Groq, we transform text into voice seamlessly. …☆37Jun 12, 2024Updated last year
- Run Ollama LLM models in Google Colab for free☆37Nov 24, 2024Updated last year
- A daemon that makes a desktop OS accessible to AI agents☆39May 29, 2025Updated 8 months ago
- Todos los proyectos del curso de django en canal de youtube.☆10Jan 15, 2021Updated 5 years ago
- This repository contains the registries for components, agents and services, the second part of the autonolas-v1 protocol.☆15Updated this week
- fine-tuning tutorial☆17Dec 13, 2025Updated 2 months ago