gpustack / gguf-parser-goLinks
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
☆221Updated 4 months ago
Alternatives and similar repositories for gguf-parser-go
Users that are interested in gguf-parser-go are comparing it to the libraries listed below
Sorting:
- LM inference server implementation based on *.cpp.☆294Updated last month
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆184Updated 3 weeks ago
- Download models from the Ollama library, without Ollama☆117Updated last year
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆758Updated this week
- run DeepSeek-R1 GGUFs on KTransformers☆258Updated 9 months ago
- Comparison of Language Model Inference Engines☆238Updated last year
- automatically quant GGUF models☆219Updated 2 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆765Updated this week
- A proxy server for multiple ollama instances with Key security☆549Updated last month
- ☆109Updated 4 months ago
- Wraps any OpenAI API interface as Responses with MCPs support so it supports Codex. Adding any missing stateful features. Ollama and Vllm…☆139Updated last month
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆78Updated last year
- Docker compose to run vLLM on Windows☆111Updated last year
- VSCode AI coding assistant powered by self-hosted llama.cpp endpoint.☆183Updated 10 months ago
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆156Updated 4 months ago
- Library for model distillation☆158Updated 3 months ago
- Docs for GGUF quantization (unofficial)☆340Updated 5 months ago
- No-code CLI designed for accelerating ONNX workflows☆221Updated 6 months ago
- The LLM API Benchmark Tool is a flexible Go-based utility designed to measure and analyze the performance of OpenAI-compatible API endpoi…☆62Updated last month
- Code execution utilities for Open WebUI & Ollama☆310Updated last year
- Fully-featured, beautiful web interface for vLLM - built with NextJS.☆165Updated last week
- LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU vi…☆943Updated this week
- 🌟 Yi-Coder is a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion …☆436Updated last year
- OpenAI compatible API for TensorRT LLM triton backend☆218Updated last year
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆150Updated 5 months ago
- LLM inference in C/C++☆104Updated last week
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated last year
- ☆108Updated 2 weeks ago
- xllamacpp - a Python wrapper of llama.cpp☆67Updated this week
- Self-hosted huggingface mirror service. 自建huggingface镜像服务。☆209Updated 5 months ago