aidatatools / ollama-benchmark
LLM Benchmark for Throughput via Ollama (Local LLMs)
☆162Updated last week
Alternatives and similar repositories for ollama-benchmark:
Users that are interested in ollama-benchmark are comparing it to the libraries listed below
- Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.☆114Updated 8 months ago
- ☆74Updated last month
- A proxy server for multiple ollama instances with Key security☆318Updated 3 weeks ago
- beep boop 🤖☆72Updated 3 weeks ago
- GPU Power and Performance Manager☆52Updated 3 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- A fast batching API to serve LLM models☆180Updated 9 months ago
- Code execution utilities for Open WebUI & Ollama☆237Updated 2 months ago
- VSCode AI coding assistant powered by self-hosted llama.cpp endpoint.☆176Updated 2 weeks ago
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆76Updated this week
- transparent proxy server for llama.cpp's server to provide automatic model swapping☆152Updated last week
- a Repository of Open-WebUI tools to use with your favourite LLMs☆103Updated 2 weeks ago
- Distributed Inference for mlx LLm☆79Updated 5 months ago
- A simple to use Ollama autocompletion engine with options exposed and streaming functionality☆111Updated 3 months ago
- ☆40Updated 6 months ago
- automatically quant GGUF models☆151Updated this week
- Integrates AI tools into Microsoft Word☆107Updated last month
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆220Updated last month
- AI powered Chatbot with real time updates.☆44Updated 3 months ago
- Easily view and modify JSON datasets for large language models☆69Updated 3 months ago
- This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…☆119Updated 11 months ago
- One click templates for inferencing Language Models☆145Updated 2 weeks ago
- Practical and advanced guide to LLMOps. It provides a solid understanding of large language models’ general concepts, deployment techniqu…☆58Updated 5 months ago
- Serving LLMs in the HF-Transformers format via a PyFlask API☆69Updated 4 months ago
- Self-host LLMs with vLLM and BentoML☆79Updated 2 weeks ago
- Parse files (e.g. code repos) and websites to clipboard or a file for ingestions by AI / LLMs☆134Updated last month
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆141Updated 8 months ago
- Ollama client written in Python☆155Updated last month
- WIP: Open WebUI desktop application, based on Electron.☆175Updated last week