akx / ggifyLinks
Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp
☆158Updated 3 months ago
Alternatives and similar repositories for ggify
Users that are interested in ggify are comparing it to the libraries listed below
Sorting:
- Download models from the Ollama library, without Ollama☆90Updated 8 months ago
- LLM inference in C/C++☆98Updated last week
- automatically quant GGUF models☆190Updated this week
- Unsloth Studio☆98Updated 4 months ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆64Updated last year
- ☆95Updated 7 months ago
- Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon☆271Updated 11 months ago
- 1.58 Bit LLM on Apple Silicon using MLX☆217Updated last year
- Wheels for llama-cpp-python compiled with cuBLAS support☆97Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆53Updated last year
- An endpoint server for efficiently serving quantized open-source LLMs for code.☆56Updated last year
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆260Updated 5 months ago
- Distributed Inference for mlx LLm☆94Updated last year
- Gemma 2 optimized for your local machine.☆376Updated last year
- Inference of Large Multimodal Models in C/C++. LLaVA and others☆47Updated last year
- GRDN.AI app for garden optimization☆70Updated last year
- Self-host LLMs with vLLM and BentoML☆139Updated last week
- Falcon LLM ggml framework with CPU and GPU support☆246Updated last year
- LLaVA server (llama.cpp).☆181Updated last year
- The RunPod worker template for serving our large language model endpoints. Powered by vLLM.☆342Updated this week
- Minimal, clean code implementation of RAG with mlx using gguf model weights☆52Updated last year
- ☆157Updated last year
- LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.☆128Updated 2 years ago
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated 2 years ago
- ⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.☆144Updated last year
- Fast parallel LLM inference for MLX☆204Updated last year
- Python bindings for ggml☆143Updated 11 months ago
- Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.☆116Updated last year
- Port of Facebook's LLaMA model in C/C++☆22Updated last year
- Extract structured data from local or remote LLM models☆44Updated last year