dmatora / LLM-inference-speed-benchmarksLinks
☆18Updated 8 months ago
Alternatives and similar repositories for LLM-inference-speed-benchmarks
Users that are interested in LLM-inference-speed-benchmarks are comparing it to the libraries listed below
Sorting:
- A combination of Oobabooga's fork and the main cuda branch of GPTQ-for-LLaMa in a package format.☆22Updated last year
- ☆28Updated 9 months ago
- Yet Another (LLM) Web UI, made with Gemini☆12Updated 5 months ago
- Trying to deconstruct RWKV in understandable terms☆14Updated 2 years ago
- PowerShell automation to rebuild llama.cpp for a Windows environment.☆30Updated last week
- run ollama & gguf easily with a single command☆50Updated last year
- Modified Beam Search with periodical restart☆12Updated 8 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 6 months ago
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU☆13Updated last year
- ☆11Updated 2 weeks ago
- Y'all thought the dead internet theory wasn't real, but HERE IT IS☆14Updated last year
- Local LLM inference & management server with built-in OpenAI API☆31Updated last year
- Nexusflow function call, tool use, and agent benchmarks.☆19Updated 5 months ago
- ☆20Updated 2 months ago
- LLM Chat is an open-source serverless alternative to ChatGPT.☆34Updated 8 months ago
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆23Updated 2 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆29Updated this week
- Public reports detailing responses to sets of prompts by Large Language Models.☆30Updated 5 months ago
- Experimental sampler to make LLMs more creative☆31Updated last year
- Experiments with BitNet inference on CPU☆54Updated last year
- Build HTML artefacts with Ollama☆11Updated 5 months ago
- LLMtranslator translates and generates text in multiple languages.☆46Updated last year
- AirLLM 70B inference with single 4GB GPU☆13Updated 10 months ago
- Yet another frontend for LLM, written using .NET and WinUI 3☆10Updated 6 months ago
- Make Qwen3 Think like Gemini 2.5 Pro | Open webui function☆21Updated 3 weeks ago
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆56Updated 6 months ago
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.☆30Updated this week
- Glyphs, acting as collaboratively defined symbols linking related concepts, add a layer of multidimensional semantic richness to user-AI …☆49Updated 3 months ago
- Large-Language-Model to Machine Interface project.☆19Updated last year
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated last year