MinhNgyuen / llm-benchmarkLinks
Benchmark llm performance
☆104Updated last year
Alternatives and similar repositories for llm-benchmark
Users that are interested in llm-benchmark are comparing it to the libraries listed below
Sorting:
- LLM Benchmark for Throughput via Ollama (Local LLMs)☆286Updated 3 weeks ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆163Updated last year
- A fast batching API to serve LLM models☆187Updated last year
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.☆607Updated 10 months ago
- Fully-featured, beautiful web interface for vLLM - built with NextJS.☆150Updated 3 months ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆587Updated 6 months ago
- Code execution utilities for Open WebUI & Ollama☆297Updated 9 months ago
- ☆209Updated last month
- Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation …☆184Updated last year
- The RunPod worker template for serving our large language model endpoints. Powered by vLLM.☆361Updated this week
- 🚀 Retrieval Augmented Generation (RAG) with txtai. Combine search and LLMs to find insights with your own data.☆403Updated 3 months ago
- ☆96Updated last week
- Docker compose to run vLLM on Windows☆98Updated last year
- A open webui function for better R1 experience☆79Updated 5 months ago
- This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…☆127Updated last year
- A Python-based web-assisted large language model (LLM) search assistant using Llama.cpp☆359Updated 10 months ago
- Your Trusty Memory-enabled AI Companion - Simple RAG chatbot optimized for local LLMs | 12 Languages Supported | OpenAI API Compatible☆335Updated 6 months ago
- automatically quant GGUF models☆197Updated this week
- A proxy server for multiple ollama instances with Key security☆483Updated last month
- This project demonstrates a basic chain-of-thought interaction with any LLM (Large Language Model)☆323Updated 11 months ago
- Dolphin System Messages☆346Updated 6 months ago
- InferX is a Inference Function as a Service Platform☆129Updated last week
- One click templates for inferencing Language Models☆213Updated 3 weeks ago
- This small API downloads and exposes access to NeuML's txtai-wikipedia and full wikipedia datasets, taking in a query and returning full …☆100Updated last week
- A simple experiment on letting two local LLM have a conversation about anything!☆110Updated last year
- Comparison of Language Model Inference Engines☆229Updated 8 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆348Updated last year
- Web UI for ExLlamaV2☆510Updated 6 months ago
- Serving LLMs in the HF-Transformers format via a PyFlask API☆71Updated 11 months ago
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆342Updated 4 months ago