A collection of all available inference solutions for the LLMs
☆95Mar 1, 2025Updated last year
Alternatives and similar repositories for llm-inference-solutions
Users that are interested in llm-inference-solutions are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Awesome-SLM: a curated list of Small Language Model☆29Jun 24, 2024Updated last year
- The Code Converter App is a versatile tool that allows users to convert, debug, and analyze code written in various programming languages…☆14Aug 27, 2023Updated 2 years ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆187Mar 23, 2026Updated last week
- An open-source implementaion for fine-tuning Phi3-Vision and Phi3.5-Vision by Microsoft.☆98Oct 2, 2025Updated 5 months ago
- AIRA is an open-source experiment (demo) created by Google Cloud Education Engineers that was put together to help educators across the g…☆23Mar 20, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆51May 31, 2024Updated last year
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- Benchmarking tool for assessing LLM models' performance across different hardwares☆17Dec 8, 2023Updated 2 years ago
- flux1非官方的量化模型(flux1 unofficial quantize model)☆11Aug 14, 2024Updated last year
- ☆14Feb 7, 2024Updated 2 years ago
- Notes and Examples to get started Parallel Computing with CUDA.☆13Nov 1, 2019Updated 6 years ago
- a fast and customizable CUDA int4 tensor core gemm☆15Aug 2, 2024Updated last year
- Advanced Video Graph RAG using SAM2,CLIP,BLIP,Qwen2-VL,YOLO-World ,Neo4j, WebGPU, local LLM☆14Nov 25, 2024Updated last year
- flask and litegraph.js☆11Jun 10, 2021Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Probably one of the lightest native RAG + Agent apps out there,experience the power of Agent-powered models and Agent-driven knowledge ba…☆32May 30, 2025Updated 10 months ago
- Example of a Streamlit data app powered by Vaex☆11Jul 7, 2022Updated 3 years ago
- 5X faster 60% less memory QLoRA finetuning☆21May 28, 2024Updated last year
- EPCC I/O benchmarking applications☆12Dec 15, 2021Updated 4 years ago
- Offline-first, desktop AI assistant tailored for educators, enabling them to generate questions directly from source materials.☆23Aug 2, 2025Updated 7 months ago
- 🤖 AI-powered CLI for file reorganization. Runs fully locally — no data leaves your machine.☆20Jul 2, 2025Updated 8 months ago
- Code repository for TIDMAD: Time series Dataset for Discovering Dark Matter with AI Denoising.☆15Mar 4, 2026Updated 3 weeks ago
- A powerful, custom opencode configuration, complete with a suite of agents, commands, rules, skills, and a pre-configured MCP server. It'…☆85Mar 20, 2026Updated last week
- Portable LLM - A rust library for LLM inference☆11Apr 13, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- 🎓Automatically Update CV Papers Daily using Github Actions (Update Every 12th hours)☆12Updated this week
- ☆58Aug 19, 2025Updated 7 months ago
- Shotit is a screenshot-to-video search engine tailored for TV & Film, blazing-fast and compute-efficient.☆20Jan 6, 2026Updated 2 months ago
- Glacier: Guided Locally Constrained Counterfactual Explanations for Time Series Classification (Machine Learning journal)☆10Mar 15, 2024Updated 2 years ago
- Simple GUI to load a PDF/Docx/txt file and have LM Studio Answer based off of it.☆14Jul 31, 2024Updated last year
- Running Large Language Model easily.☆13Feb 12, 2026Updated last month
- This repository contains some examples of using borb in google colab. These examples enable you to try out the features of borb without i…☆13Sep 4, 2022Updated 3 years ago
- Steering LLM Thinking with Budget Guidance☆29Feb 19, 2026Updated last month
- project #1 of Digital Visual Effects, Spring 2011☆13Dec 11, 2016Updated 9 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,740May 21, 2025Updated 10 months ago
- Convert a dynamically linked binary to a statically linked binary going thorugh LLVM IR, using mcsema☆12May 27, 2019Updated 6 years ago
- MiorSoft web pages☆12Feb 10, 2026Updated last month
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆275Aug 6, 2025Updated 7 months ago
- model compression and optimization for deployment for Pytorch, including knowledge distillation, quantization and pruning.(知识蒸馏,量化,剪枝)☆20Sep 10, 2024Updated last year
- Bringing AI practically to science!☆24Feb 23, 2026Updated last month
- llama INT4 cuda inference with AWQ☆54Jan 20, 2025Updated last year