akx / ggify
Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp
☆128Updated 6 months ago
Alternatives and similar repositories for ggify:
Users that are interested in ggify are comparing it to the libraries listed below
- GRDN.AI app for garden optimization☆70Updated last year
- Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.☆115Updated 10 months ago
- ☆83Updated 3 months ago
- Extract structured data from local or remote LLM models☆41Updated 9 months ago
- Download models from the Ollama library, without Ollama☆68Updated 4 months ago
- A guidance compatibility layer for llama-cpp-python☆34Updated last year
- A fast batching API to serve LLM models☆183Updated 11 months ago
- Unsloth Studio☆74Updated 3 weeks ago
- Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon☆264Updated 6 months ago
- An endpoint server for efficiently serving quantized open-source LLMs for code.☆54Updated last year
- Lightweight Inference server for OpenVINO☆143Updated this week
- Local LLM inference & management server with built-in OpenAI API☆31Updated 11 months ago
- Kosmos-2.5 is a cutting-edge Multimodal-LLM (MLLM) specializing in image OCR. However, its stringent software requirements & Python-scrip…☆59Updated 8 months ago
- A lightweight proxy for filtering `<think>` tags from any OpenAI-compatible API endpoint. Designed for chain-of-thought language models t…☆36Updated 2 months ago
- ☆66Updated 10 months ago
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆56Updated 4 months ago
- ☆38Updated last year
- LLM inference in C/C++☆67Updated last week
- Accepts a Hugging Face model URL, automatically downloads and quantizes it using Bits and Bytes.☆38Updated last year
- ☆125Updated last week
- For inferring and serving local LLMs using the MLX framework☆99Updated last year
- This is our own implementation of 'Layer Selective Rank Reduction'☆233Updated 10 months ago
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆76Updated 3 months ago
- A benchmark for emotional intelligence in large language models☆253Updated 8 months ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆43Updated 6 months ago
- Maybe the new state of the art vision model? we'll see 🤷♂️☆161Updated last year
- Guaranteed Structured Output from any Language Model via Hierarchical State Machines☆122Updated this week
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆243Updated 3 weeks ago
- ☆152Updated 8 months ago
- llama.cpp fork used by GPT4All☆54Updated last month