akx / ggifyLinks
Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp
☆170Updated 9 months ago
Alternatives and similar repositories for ggify
Users that are interested in ggify are comparing it to the libraries listed below
Sorting:
- Download models from the Ollama library, without Ollama☆121Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆64Updated 2 years ago
- LLM inference in C/C++☆104Updated last month
- Unsloth Studio☆125Updated 9 months ago
- Falcon LLM ggml framework with CPU and GPU support☆249Updated 2 years ago
- LLaVA server (llama.cpp).☆183Updated 2 years ago
- ☆109Updated 5 months ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated last year
- Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub☆161Updated 2 years ago
- Maybe the new state of the art vision model? we'll see 🤷♂️☆171Updated 2 years ago
- A simple Jupyter Notebook for learning MLX text-completion fine-tuning!☆123Updated last year
- automatically quant GGUF models☆219Updated last month
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆266Updated 10 months ago
- LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.☆130Updated 2 years ago
- ☆165Updated 5 months ago
- An endpoint server for efficiently serving quantized open-source LLMs for code.☆58Updated 2 years ago
- 1.58 Bit LLM on Apple Silicon using MLX☆240Updated last year
- Gemma 2 optimized for your local machine.☆378Updated last year
- Unofficial python bindings for the rust llm library. 🐍❤️🦀☆76Updated 2 years ago
- Inference of Large Multimodal Models in C/C++. LLaVA and others☆48Updated 2 years ago
- Extract structured data from local or remote LLM models☆54Updated last year
- Distributed Inference for mlx LLm☆100Updated last year
- Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon☆273Updated 2 months ago
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆100Updated 7 months ago
- Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.☆118Updated last year
- API Server for Transformer Lab☆82Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆53Updated 2 years ago
- VSCode AI coding assistant powered by self-hosted llama.cpp endpoint.☆183Updated last year
- GRDN.AI app for garden optimization☆69Updated 2 months ago
- Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first app…☆169Updated 2 years ago