dmatora / LLM-inference-speed-benchmarksLinks
☆21Updated last year
Alternatives and similar repositories for LLM-inference-speed-benchmarks
Users that are interested in LLM-inference-speed-benchmarks are comparing it to the libraries listed below
Sorting:
- ☆51Updated last year
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆88Updated this week
- Modified Beam Search with periodical restart☆12Updated last year
- Trying to deconstruct RWKV in understandable terms☆14Updated 2 years ago
- AirLLM 70B inference with single 4GB GPU☆17Updated 7 months ago
- Llama cute voice assistant☆27Updated 2 years ago
- LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…☆35Updated 10 months ago
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆29Updated 10 months ago
- run ollama & gguf easily with a single command☆52Updated last year
- Controllable Language Model Interactions in TypeScript☆10Updated last year
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated 2 years ago
- ☆11Updated 2 years ago
- Loader extension for tabbyAPI in SillyTavern☆26Updated 7 months ago
- Chat WebUI is an easy-to-use user interface for interacting with AI, and it comes with multiple useful built-in tools such as web search …☆49Updated 5 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated last year
- Who needs o1 anyways. Add CoT to any OpenAI compatible endpoint.☆44Updated last year
- Attend - to what matters.☆17Updated 11 months ago
- Experiments with BitNet inference on CPU☆55Updated last year
- Course Project for COMP4471 on RWKV☆17Updated 2 years ago
- A Qt GUI for large language models☆45Updated 2 years ago
- ☆63Updated 7 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated last year
- A simple GUI utility for gathering LIMA-like chat data.☆23Updated 4 months ago
- Yet Another (LLM) Web UI, made with Gemini☆12Updated last year
- ☆24Updated last year
- An unsupervised model merging algorithm for Transformers-based language models.☆108Updated last year
- llama.cpp to PyTorch Converter☆37Updated last year
- Accepts a Hugging Face model URL, automatically downloads and quantizes it using Bits and Bytes.☆38Updated last year
- Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs☆110Updated 2 years ago
- ☆27Updated 2 years ago