MinhNgyuen / llm-benchmarkLinks
Benchmark llm performance
☆105Updated last year
Alternatives and similar repositories for llm-benchmark
Users that are interested in llm-benchmark are comparing it to the libraries listed below
Sorting:
- LLM Benchmark for Throughput via Ollama (Local LLMs)☆303Updated 2 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆165Updated last year
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆601Updated 8 months ago
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆262Updated 7 months ago
- automatically quant GGUF models☆214Updated last week
- Fully-featured, beautiful web interface for vLLM - built with NextJS.☆159Updated 5 months ago
- ☆206Updated last month
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.☆618Updated last year
- ☆104Updated 2 months ago
- This project demonstrates a basic chain-of-thought interaction with any LLM (Large Language Model)☆321Updated last year
- A fast batching API to serve LLM models☆188Updated last year
- Code execution utilities for Open WebUI & Ollama☆302Updated 11 months ago
- InferX: Inference as a Service Platform☆137Updated this week
- A proxy server for multiple ollama instances with Key security☆515Updated 2 weeks ago
- Convenience scripts to finetune (chat-)LLaMa3 and other models for any language☆316Updated last year
- 🚀 Retrieval Augmented Generation (RAG) with txtai. Combine search and LLMs to find insights with your own data.☆411Updated 5 months ago
- Use locally running LLMs directly from Siri 🦙🟣☆182Updated last year
- Your Trusty Memory-enabled AI Companion - Simple RAG chatbot optimized for local LLMs | 12 Languages Supported | OpenAI API Compatible☆338Updated 8 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated last year
- The Fastest Way to Fine-Tune LLMs Locally☆323Updated 7 months ago
- 1.58-bit LLaMa model☆83Updated last year
- Docker compose to run vLLM on Windows☆103Updated last year
- Compare open-source local LLM inference projects by their metrics to assess popularity and activeness.☆664Updated last month
- A open webui function for better R1 experience☆77Updated 7 months ago
- This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…☆126Updated last year
- Practical and advanced guide to LLMOps. It provides a solid understanding of large language models’ general concepts, deployment techniqu…☆76Updated last year
- WilmerAI is one of the oldest LLM semantic routers. It uses multi-layer prompt routing and complex workflows to allow you to not only cre…☆784Updated 2 weeks ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated last year
- Comparison of Language Model Inference Engines☆232Updated 10 months ago
- This small API downloads and exposes access to NeuML's txtai-wikipedia and full wikipedia datasets, taking in a query and returning full …☆100Updated 2 months ago