gpustack / gguf-parser-go
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
☆43Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for gguf-parser-go
- HTTP proxy for on-demand model loading with llama.cpp (or other OpenAI compatible backends)☆38Updated this week
- ☆117Updated this week
- A python application that routes incoming prompts to an LLM by category, and can support a single incoming connection from a front end to…☆164Updated this week
- cli tool to quantize gguf, gptq, awq, hqq and exl2 models☆62Updated last month
- A fast batching API to serve LLM models☆172Updated 6 months ago
- An endpoint server for efficiently serving quantized open-source LLMs for code.☆53Updated last year
- ☆39Updated 2 months ago
- automatically quant GGUF models☆137Updated this week
- A library and CLI utilities for managing performance states of NVIDIA GPUs.☆19Updated last month
- Serving LLMs in the HF-Transformers format via a PyFlask API☆68Updated 2 months ago
- Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation …☆160Updated 3 months ago
- klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs☆58Updated last month
- Deploy your GGML models to HuggingFace Spaces with Docker and gradio☆35Updated last year
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆126Updated 5 months ago
- 🚀 Retrieval Augmented Generation (RAG) with txtai. Combine search and LLMs to find insights with your own data.☆274Updated 3 weeks ago
- This is the Mixture-of-Agents (MoA) concept, adapted from the original work by TogetherAI. My version is tailored for local model usage a…☆105Updated 4 months ago
- This small API downloads and exposes access to NeuML's txtai-wikipedia and full wikipedia datasets, taking in a query and returning full …☆46Updated 3 months ago
- Something similar to Apple Intelligence?☆57Updated 4 months ago
- Local LLM inference & management server with built-in OpenAI API☆31Updated 6 months ago
- Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.☆110Updated 5 months ago
- After my server ui improvements were successfully merged, consider this repo a playground for experimenting, tinkering and hacking around…☆56Updated 2 months ago
- Realtime tts reading of large textfiles by your favourite voice. +Translation via LLM (Python script)☆47Updated 3 weeks ago
- ☆64Updated last month
- A simple light terminal style chat app that lets you use connect to your local llama.cpp server☆27Updated 4 months ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆39Updated last month
- ☆148Updated 3 months ago
- Gradio based tool to run opensource LLM models directly from Huggingface☆87Updated 4 months ago
- A local AI companion that uses a collection of free, open source AI models in order to create two virtual companions that will follow you…☆92Updated this week
- Easily view and modify JSON datasets for large language models☆62Updated last month
- Practical and advanced guide to LLMOps. It provides a solid understanding of large language models’ general concepts, deployment techniqu…☆52Updated 2 months ago