dmatora / LLM-inference-speed-benchmarksLinks
☆20Updated last year
Alternatives and similar repositories for LLM-inference-speed-benchmarks
Users that are interested in LLM-inference-speed-benchmarks are comparing it to the libraries listed below
Sorting:
- Trying to deconstruct RWKV in understandable terms☆14Updated 2 years ago
- AirLLM 70B inference with single 4GB GPU☆14Updated 5 months ago
- LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…☆33Updated 8 months ago
- Training a reward model for RLHF using RWKV.☆15Updated 2 years ago
- Experiments with BitNet inference on CPU☆54Updated last year
- A simple GUI utility for gathering LIMA-like chat data.☆23Updated last month
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆29Updated 8 months ago
- Who needs o1 anyways. Add CoT to any OpenAI compatible endpoint.☆44Updated last year
- ☆22Updated last year
- A list of language models with permissive licenses such as MIT or Apache 2.0☆24Updated 9 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated last year
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆84Updated this week
- ☆24Updated 10 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆96Updated 6 months ago
- Simple implementation of a GPT (training and inference) in PyTorch.☆13Updated last year
- Port of Microsoft's BioGPT in C/C++ using ggml☆85Updated last year
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆58Updated 11 months ago
- GGUF Quantization of any LLM.☆41Updated last year
- ☆11Updated 2 years ago
- ☆63Updated 10 months ago
- LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.☆106Updated 4 months ago
- ☆74Updated 2 years ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated last year
- A simple, easy-to-customize pipeline for local RAG evaluation. Starter prompts and metric definitions included.☆25Updated last month
- Forces DeepSeek R1 models to engage in extended reasoning by intercepting early termination tokens.☆19Updated 9 months ago
- ☆51Updated last year
- convert a saved pytorch model to gguf and generate as much corresponding ggml c code as possible☆15Updated last year
- BlinkDL's RWKV-v4 running in the browser☆47Updated 2 years ago
- Modified Beam Search with periodical restart☆12Updated last year
- Attend - to what matters.☆17Updated 9 months ago