A collection of all available inference solutions for the LLMs
☆95Mar 1, 2025Updated last year
Alternatives and similar repositories for llm-inference-solutions
Users that are interested in llm-inference-solutions are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An open-source implementaion for fine-tuning DINOv2 by Meta.☆14Jul 21, 2025Updated 10 months ago
- Awesome-SLM: a curated list of Small Language Model☆31Jun 24, 2024Updated last year
- Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which includ…☆34Jan 2, 2025Updated last year
- Inference Llama 2 in one file of pure Java☆19Nov 13, 2023Updated 2 years ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆190Mar 23, 2026Updated 2 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆51May 31, 2024Updated 2 years ago
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- Benchmarking tool for assessing LLM models' performance across different hardwares☆17Dec 8, 2023Updated 2 years ago
- YOLOv8 Knowledge Distillation☆10Dec 28, 2024Updated last year
- flux1非官方的量化模型(flux1 unofficial quantize model)☆11Aug 14, 2024Updated last year
- ☆14Feb 7, 2024Updated 2 years ago
- Notes and Examples to get started Parallel Computing with CUDA.☆13Nov 1, 2019Updated 6 years ago
- a fast and customizable CUDA int4 tensor core gemm☆15Aug 2, 2024Updated last year
- Efficient non-uniform quantization with GPTQ for GGUF☆63Sep 17, 2025Updated 9 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- flask and litegraph.js☆11Jun 10, 2021Updated 5 years ago
- Multichannel Looper/Feedback System for Riffusion☆14May 6, 2023Updated 3 years ago
- Viva la machina.☆75Apr 22, 2026Updated last month
- An insanely secure password manager.☆17Mar 10, 2026Updated 3 months ago
- It shows how to deploy and use an agent with LLM.☆20Mar 1, 2025Updated last year
- Inference Llama 2 with a model compiled to native code by TorchInductor☆14Feb 8, 2024Updated 2 years ago
- Probably one of the lightest native RAG + Agent apps out there,experience the power of Agent-powered models and Agent-driven knowledge ba…☆33May 30, 2025Updated last year
- Example of a Streamlit data app powered by Vaex☆11Jul 7, 2022Updated 3 years ago
- 5X faster 60% less memory QLoRA finetuning☆21May 28, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- EPCC I/O benchmarking applications☆12Dec 15, 2021Updated 4 years ago
- These are papers that I read and reviewed related to NLP, CV, and Deep Learning 😉 You can check paper links and my reviews 😊☆13Jan 3, 2024Updated 2 years ago
- NixOps VirtualBox backend [maintainer=@AmineChikhaoui]☆24Aug 10, 2023Updated 2 years ago
- 🤖 AI-powered CLI for file reorganization. Runs fully locally — no data leaves your machine.☆20Jul 2, 2025Updated 11 months ago
- ☆16Nov 22, 2025Updated 6 months ago
- 🎓Automatically Update CV Papers Daily using Github Actions (Update Every 12th hours)☆12May 17, 2026Updated last month
- Dataset Resplitting for Generalization in KGQA. See also https://github.com/semantic-systems/KGQA-datasets☆17Jun 29, 2022Updated 3 years ago
- Shotit is a screenshot-to-video search engine tailored for TV & Film, blazing-fast and compute-efficient.☆23May 27, 2026Updated 3 weeks ago
- Simple GUI to load a PDF/Docx/txt file and have LM Studio Answer based off of it.☆14Jul 31, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [ICDAR 2024] (Best Student Paper🏆) Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation☆15Sep 6, 2024Updated last year
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,795May 28, 2026Updated 3 weeks ago
- Convert a dynamically linked binary to a statically linked binary going thorugh LLVM IR, using mcsema☆12May 27, 2019Updated 7 years ago
- ☆16Feb 10, 2023Updated 3 years ago
- lightsocks client implements by golang☆13Sep 11, 2015Updated 10 years ago
- Convert Huggingface Pytorch checkpoint to Tensorflow checkpoint☆17Sep 4, 2023Updated 2 years ago
- model compression and optimization for deployment for Pytorch, including knowledge distillation, quantization and pruning.(知识蒸馏,量化,剪枝)☆21Sep 10, 2024Updated last year