akx / ggifyLinks
Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp
☆148Updated last month
Alternatives and similar repositories for ggify
Users that are interested in ggify are comparing it to the libraries listed below
Sorting:
- Download models from the Ollama library, without Ollama☆84Updated 6 months ago
- ☆90Updated 5 months ago
- LLaVA server (llama.cpp).☆179Updated last year
- Pressure testing the context window of open LLMs☆25Updated 9 months ago
- For inferring and serving local LLMs using the MLX framework☆104Updated last year
- 1.58 Bit LLM on Apple Silicon using MLX☆212Updated last year
- Scripts to create your own moe models using mlx☆89Updated last year
- ☆157Updated 10 months ago
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆84Updated 5 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆238Updated last year
- GRDN.AI app for garden optimization☆70Updated last year
- These are performance benchmarks we did to prepare for our own privacy-preserving and NDA-compliant in-house AI coding assistant. If by a…☆23Updated 2 months ago
- Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.☆115Updated last year
- Distributed Inference for mlx LLm☆92Updated 10 months ago
- automatically quant GGUF models☆181Updated this week
- A fast batching API to serve LLM models☆181Updated last year
- Falcon LLM ggml framework with CPU and GPU support☆245Updated last year
- ☆22Updated last year
- LLM inference in C/C++☆77Updated 3 weeks ago
- Unofficial python bindings for the rust llm library. 🐍❤️🦀☆75Updated last year
- Blazing fast whisper turbo for ASR (speech-to-text) tasks☆208Updated 7 months ago
- Embedding models from Jina AI☆60Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆63Updated last year
- ☆114Updated 5 months ago
- Fast parallel LLM inference for MLX☆189Updated 10 months ago
- Gemma 2 optimized for your local machine.☆370Updated 9 months ago
- Minimal, clean code implementation of RAG with mlx using gguf model weights☆50Updated last year
- LLM-based code completion engine☆188Updated 4 months ago
- Extend the original llama.cpp repo to support redpajama model.☆117Updated 9 months ago
- Web UI for ExLlamaV2☆495Updated 4 months ago