DeutscheKI / llm-performance-testsLinks
These are performance benchmarks we did to prepare for our own privacy-preserving and NDA-compliant in-house AI coding assistant. If by any chance, you're a German KMU, and you want strong in-house AI, too, feel free to contact us.
☆30Updated 10 months ago
Alternatives and similar repositories for llm-performance-tests
Users that are interested in llm-performance-tests are comparing it to the libraries listed below
Sorting:
- Fast parallel LLM inference for MLX☆246Updated last year
- Smart proxy for LLM APIs that enables model-specific parameter control, automatic mode switching (like Qwen3's /think and /no_think), and…☆50Updated 8 months ago
- Distributed Inference for mlx LLm☆100Updated last year
- ☆109Updated 5 months ago
- InferX: Inference as a Service Platform☆156Updated this week
- Route LLM requests to the best model for the task at hand.☆177Updated 3 weeks ago
- Sparse Inferencing for transformer based LLMs☆217Updated 6 months ago
- DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.☆185Updated 8 months ago
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆100Updated 7 months ago
- REAP: Router-weighted Expert Activation Pruning for SMoE compression☆232Updated 2 months ago
- Docs for GGUF quantization (unofficial)☆366Updated 6 months ago
- ☆304Updated 3 months ago
- 1.58 Bit LLM on Apple Silicon using MLX☆243Updated last year
- Verify Precision of all Kimi K2 API Vendor☆507Updated 2 weeks ago
- Enhancing LLMs with LoRA☆206Updated 3 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆165Updated last year
- ☆134Updated 2 months ago
- API Server for Transformer Lab☆83Updated 2 months ago
- Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp☆170Updated 9 months ago
- Benchmarking tool for vLLM inference performance with GPU monitoring☆40Updated 2 months ago
- Tutorial for building LLM router☆244Updated last year
- Practical and advanced guide to LLMOps. It provides a solid understanding of large language models’ general concepts, deployment techniqu…☆79Updated last year
- Community maintained hardware plugin for vLLM on Apple Silicon☆400Updated last week
- Train Large Language Models on MLX.☆258Updated this week
- LLM inference in C/C++☆104Updated 2 weeks ago
- This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang☆100Updated this week
- A command-line interface tool for serving LLM using vLLM.☆471Updated 2 weeks ago
- The RunPod worker template for serving our large language model endpoints. Powered by vLLM.☆401Updated 2 weeks ago
- NVIDIA Linux open GPU with P2P support☆129Updated this week
- Very basic framework for composable parameterized large language model (Q)LoRA / (Q)Dora fine-tuning using mlx, mlx_lm, and OgbujiPT.☆43Updated 7 months ago