MinhNgyuen / llm-benchmarkLinks
Benchmark llm performance
☆101Updated last year
Alternatives and similar repositories for llm-benchmark
Users that are interested in llm-benchmark are comparing it to the libraries listed below
Sorting:
- LLM Benchmark for Throughput via Ollama (Local LLMs)☆269Updated last month
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆158Updated last year
- InferX is a Inference Function as a Service Platform☆123Updated 2 weeks ago
- automatically quant GGUF models☆190Updated last week
- A open webui function for better R1 experience☆79Updated 5 months ago
- 🚀 Retrieval Augmented Generation (RAG) with txtai. Combine search and LLMs to find insights with your own data.☆390Updated 3 months ago
- Code execution utilities for Open WebUI & Ollama☆295Updated 9 months ago
- ☆208Updated 3 weeks ago
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.☆604Updated 9 months ago
- A proxy server for multiple ollama instances with Key security☆476Updated last week
- Fully-featured, beautiful web interface for vLLM - built with NextJS.☆150Updated 3 months ago
- A fast batching API to serve LLM models☆185Updated last year
- This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…☆126Updated last year
- This small API downloads and exposes access to NeuML's txtai-wikipedia and full wikipedia datasets, taking in a query and returning full …☆98Updated 3 weeks ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆583Updated 5 months ago
- Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPU…☆381Updated this week
- Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation …☆184Updated last year
- Open‑WebUI Tools is a modular toolkit designed to extend and enrich your Open WebUI instance, turning it into a powerful AI workstation. …☆302Updated this week
- The Fastest Way to Fine-Tune LLMs Locally☆314Updated 4 months ago
- ☆95Updated 7 months ago
- This project demonstrates a basic chain-of-thought interaction with any LLM (Large Language Model)☆321Updated 10 months ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆43Updated 10 months ago
- A simple experiment on letting two local LLM have a conversation about anything!☆110Updated last year
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆260Updated 5 months ago
- Web UI for ExLlamaV2☆505Updated 6 months ago
- A curated list of awesome Large Language Model (LLM) Web User Interfaces.☆501Updated last month
- Lightweight Inference server for OpenVINO☆193Updated this week
- ☆117Updated 9 months ago
- This is the Mixture-of-Agents (MoA) concept, adapted from the original work by TogetherAI. My version is tailored for local model usage a…☆118Updated last year
- Convenience scripts to finetune (chat-)LLaMa3 and other models for any language☆310Updated last year