dmatora / LLM-inference-speed-benchmarksLinks
☆20Updated last year
Alternatives and similar repositories for LLM-inference-speed-benchmarks
Users that are interested in LLM-inference-speed-benchmarks are comparing it to the libraries listed below
Sorting:
- AirLLM 70B inference with single 4GB GPU☆14Updated 5 months ago
- Trying to deconstruct RWKV in understandable terms☆14Updated 2 years ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆85Updated this week
- run ollama & gguf easily with a single command☆52Updated last year
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU☆13Updated last year
- LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…☆34Updated 9 months ago
- ☆24Updated 10 months ago
- Experiments with BitNet inference on CPU☆54Updated last year
- Yet Another (LLM) Web UI, made with Gemini☆12Updated 11 months ago
- Controllable Language Model Interactions in TypeScript☆10Updated last year
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆59Updated last year
- Loader extension for tabbyAPI in SillyTavern☆26Updated 5 months ago
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated 2 years ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated last year
- A simple GUI utility for gathering LIMA-like chat data.☆23Updated 2 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆97Updated 7 months ago
- Easily convert HuggingFace models to GGUF-format for llama.cpp☆23Updated last year
- llama.cpp to PyTorch Converter☆34Updated last year
- OpenPipe Reinforcement Learning Experiments☆32Updated 9 months ago
- ☆51Updated last year
- GGML implementation of BERT model with Python bindings and quantization.☆58Updated last year
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Updated last year
- Senna is an advanced AI-powered search engine designed to provide users with immediate answers to their queries by leveraging natural lan…☆19Updated last year
- BlinkDL's RWKV-v4 running in the browser☆47Updated 2 years ago
- Yet another frontend for LLM, written using .NET and WinUI 3☆10Updated 3 months ago
- cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…☆41Updated 5 months ago
- ☆74Updated 2 years ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆47Updated last month
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated last year