dmatora / LLM-inference-speed-benchmarksLinks
☆20Updated 11 months ago
Alternatives and similar repositories for LLM-inference-speed-benchmarks
Users that are interested in LLM-inference-speed-benchmarks are comparing it to the libraries listed below
Sorting:
- ☆24Updated 7 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆73Updated 2 weeks ago
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆26Updated 5 months ago
- AirLLM 70B inference with single 4GB GPU☆14Updated 2 months ago
- Trying to deconstruct RWKV in understandable terms☆14Updated 2 years ago
- Attend - to what matters.☆17Updated 6 months ago
- Who needs o1 anyways. Add CoT to any OpenAI compatible endpoint.☆44Updated 11 months ago
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU☆13Updated last year
- Llama cute voice assistant☆27Updated last year
- OpenPipe Reinforcement Learning Experiments☆30Updated 5 months ago
- Chat WebUI is an easy-to-use user interface for interacting with AI, and it comes with multiple useful built-in tools such as web search …☆43Updated this week
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 9 months ago
- Yet Another (LLM) Web UI, made with Gemini☆12Updated 8 months ago
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?☆22Updated last year
- llama.cpp to PyTorch Converter☆34Updated last year
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated 11 months ago
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆16Updated last year
- V.I.S.O.R., my in-development AI-powered voice assistant with integrated memory!☆38Updated this week
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated last year
- Loader extension for tabbyAPI in SillyTavern☆26Updated 2 months ago
- Demo of an "always-on" AI assistant.☆24Updated last year
- LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…☆33Updated 5 months ago
- cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…☆42Updated 2 months ago
- Accepts a Hugging Face model URL, automatically downloads and quantizes it using Bits and Bytes.☆38Updated last year
- PowerShell automation to rebuild llama.cpp for a Windows environment.☆32Updated last week
- ☆22Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆64Updated last year
- ☆67Updated last year
- Senna is an advanced AI-powered search engine designed to provide users with immediate answers to their queries by leveraging natural lan…☆19Updated 11 months ago
- Modified Beam Search with periodical restart☆12Updated 11 months ago