dmatora / LLM-inference-speed-benchmarksLinks

☆20

Alternatives and similar repositories for LLM-inference-speed-benchmarks

Users that are interested in LLM-inference-speed-benchmarks are comparing it to the libraries listed below

Sorting:

cwhy / rwkv-decon
Trying to deconstruct RWKV in understandable terms
☆14Updated 2 years ago
Codys12 / airllm
AirLLM 70B inference with single 4GB GPU
☆14Updated 5 months ago
lechmazur / divergent
LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…
☆33Updated 8 months ago
jiamingkong / rwkv_reward
Training a reward model for RLHF using RWKV.
☆15Updated 2 years ago
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆54Updated last year
oKatanaaa / lima-gui
A simple GUI utility for gathering LIMA-like chat data.
☆23Updated last month
huseinzol05 / transformers-continuous-batching
Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.
☆29Updated 8 months ago
andrewginns / CoT-at-Home
Who needs o1 anyways. Add CoT to any OpenAI compatible endpoint.
☆44Updated last year
bdytx5 / open_answer_engine
☆22Updated last year
mmhamdy / open-language-models
A list of language models with permissive licenses such as MIT or Apache 2.0
☆24Updated 9 months ago
kyegomez / Exa
Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…
☆26Updated last year
perk11 / large-model-proxy
Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…
☆84Updated this week
rodrigobaron / anthill
☆24Updated 10 months ago
agokrani / distillKitPlus
Easy to use, High Performant Knowledge Distillation for LLMs
☆96Updated 6 months ago
jaymody / simpleGPT
Simple implementation of a GPT (training and inference) in PyTorch.
☆13Updated last year
PABannier / biogpt.cpp
Port of Microsoft's BioGPT in C/C++ using ggml
☆85Updated last year
astramind-ai / Pulsar
The hearth of The Pulsar App, fast, secure and shared inference with modern UI
☆58Updated 11 months ago
AIAnytime / GGUF-Quantization-of-any-LLM
GGUF Quantization of any LLM.
☆41Updated last year
npk48 / rwkv_cuda
☆11Updated 2 years ago
vcskaushik / LLMzip
☆63Updated 10 months ago
uukuguy / speechless
LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.
☆106Updated 4 months ago
emrgnt-cmplxty / zero-shot-replication
☆74Updated 2 years ago
serp-ai / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
☆31Updated last year
aizip / Rag-Eval-flow
A simple, easy-to-customize pipeline for local RAG evaluation. Starter prompts and metric definitions included.
☆25Updated last month
latent-variable / r1_reasoning_effort
Forces DeepSeek R1 models to engage in extended reasoning by intercepting early termination tokens.
☆19Updated 9 months ago
nyunAI / PruneGPT
☆51Updated last year
Leikoe / torch_to_ggml
convert a saved pytorch model to gguf and generate as much corresponding ggml c code as possible
☆15Updated last year
josephrocca / rwkv-v4-web
BlinkDL's RWKV-v4 running in the browser
☆47Updated 2 years ago
attashe / ModifiedBeamSampler
Modified Beam Search with periodical restart
☆12Updated last year
hyperfocAIs / Attend
Attend - to what matters.
☆17Updated 9 months ago