dmatora / LLM-inference-speed-benchmarksLinks
☆20Updated last year
Alternatives and similar repositories for LLM-inference-speed-benchmarks
Users that are interested in LLM-inference-speed-benchmarks are comparing it to the libraries listed below
Sorting:
- Trying to deconstruct RWKV in understandable terms☆14Updated 2 years ago
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆58Updated 11 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆81Updated last week
- ☆24Updated 9 months ago
- AirLLM 70B inference with single 4GB GPU☆14Updated 4 months ago
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆29Updated 7 months ago
- ☆26Updated 2 years ago
- Modified Beam Search with periodical restart☆12Updated last year
- BlinkDL's RWKV-v4 running in the browser☆46Updated 2 years ago
- run ollama & gguf easily with a single command☆52Updated last year
- Course Project for COMP4471 on RWKV☆17Updated last year
- Loader extension for tabbyAPI in SillyTavern☆24Updated 4 months ago
- Yet another frontend for LLM, written using .NET and WinUI 3☆10Updated last month
- Easily convert HuggingFace models to GGUF-format for llama.cpp☆23Updated last year
- PowerShell automation to rebuild llama.cpp for a Windows environment.☆32Updated last month
- Experimental sampler to make LLMs more creative☆31Updated 2 years ago
- Simple LLM inference server☆20Updated last year
- ☆27Updated 2 months ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated last year
- ☆30Updated last year
- Attend - to what matters.☆17Updated 8 months ago
- Yet Another (LLM) Web UI, made with Gemini☆12Updated 10 months ago
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU☆13Updated last year
- Experiments with BitNet inference on CPU☆54Updated last year
- A list of language models with permissive licenses such as MIT or Apache 2.0☆24Updated 8 months ago
- ☆51Updated last year
- Port of Microsoft's BioGPT in C/C++ using ggml☆85Updated last year
- MilimoChat: Privacy-first, self-hosted AI chat with customizable personas, context-aware memory, and local analytics. Built on Python/Str…☆14Updated 7 months ago
- OpenPipe Reinforcement Learning Experiments☆32Updated 7 months ago
- LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…☆32Updated 7 months ago