MinhNgyuen / llm-benchmarkLinks
Benchmark llm performance
☆108Updated last year
Alternatives and similar repositories for llm-benchmark
Users that are interested in llm-benchmark are comparing it to the libraries listed below
Sorting:
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆165Updated last year
- A fast batching API to serve LLM models☆188Updated last year
- LLM Benchmark for Throughput via Ollama (Local LLMs)☆323Updated this week
- ☆109Updated 5 months ago
- This project demonstrates a basic chain-of-thought interaction with any LLM (Large Language Model)☆322Updated last year
- A open webui function for better R1 experience☆78Updated 10 months ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆612Updated 11 months ago
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.☆626Updated last year
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆267Updated 10 months ago
- automatically quant GGUF models☆220Updated 3 weeks ago
- This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…☆127Updated last year
- Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation …☆192Updated last year
- A simple experiment on letting two local LLM have a conversation about anything!☆112Updated last year
- A proxy server for multiple ollama instances with Key security☆565Updated 2 months ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated last year
- ☆210Updated 2 weeks ago
- An extension that lets the AI take the wheel, allowing it to use the mouse and keyboard, recognize UI elements, and prompt itself :3...no…☆127Updated last year
- Docker compose to run vLLM on Windows☆113Updated 2 years ago
- A Python-based web-assisted large language model (LLM) search assistant using Llama.cpp☆367Updated last year
- Comparison of Language Model Inference Engines☆239Updated last year
- 🚀 Retrieval Augmented Generation (RAG) with txtai. Combine search and LLMs to find insights with your own data.☆435Updated last month
- This small API downloads and exposes access to NeuML's txtai-wikipedia and full wikipedia datasets, taking in a query and returning full …☆103Updated 4 months ago
- The RunPod worker template for serving our large language model endpoints. Powered by vLLM.☆391Updated last week
- Experimental LLM Inference UX to aid in creative writing☆127Updated last year
- Practical and advanced guide to LLMOps. It provides a solid understanding of large language models’ general concepts, deployment techniqu…☆77Updated last year
- InferX: Inference as a Service Platform☆146Updated this week
- MVP of an idea using multiple local LLM models to simulate and play D&D☆94Updated 8 months ago
- Convenience scripts to finetune (chat-)LLaMa3 and other models for any language☆314Updated last year
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆274Updated this week
- The Fastest Way to Fine-Tune LLMs Locally☆333Updated last month