akx / ggify
Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp
β70Updated 4 months ago
Related projects: β
- Unofficial python bindings for the rust llm library. πβ€οΈπ¦β72Updated last year
- Local LLM inference & management server with built-in OpenAI APIβ30Updated 5 months ago
- Simple, Fast, Parallel Huggingface GGML model downloader written in pythonβ24Updated last year
- Inference of Large Multimodal Models in C/C++. LLaVA and othersβ46Updated 11 months ago
- automatically quant GGUF modelsβ119Updated this week
- Python bindings for ggmlβ125Updated 2 weeks ago
- A collection of prompts to challenge the reasoning abilities of large language models in presence of misguiding informationβ51Updated this week
- Embedding models from Jina AIβ55Updated 8 months ago
- Formatron empowers everyone to control the format of language models' output with minimal overhead.β116Updated last week
- An endpoint server for efficiently serving quantized open-source LLMs for code.β52Updated 11 months ago
- A prompting libraryβ83Updated last week
- A guidance compatibility layer for llama-cpp-pythonβ35Updated last year
- β64Updated 3 months ago
- Something similar to Apple Intelligence?β54Updated 2 months ago
- Serving LLMs in the HF-Transformers format via a PyFlask APIβ68Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ48Updated 9 months ago
- Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.β109Updated 3 months ago
- A high performance batching router optimises max throughput for text inference workloadβ15Updated last year
- Machine Learning Serving focused on GenAI with simplicity as the top priority.β55Updated 2 months ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing β‘β58Updated 2 weeks ago
- GRDN.AI app for garden optimizationβ68Updated 7 months ago
- Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which includβ¦β25Updated last week
- an AI interaction tool with RAG hybrid search, conversation context, web content processing and structured data analysis with LLM / GPTβ79Updated this week
- Self-hosted LLM chatbot arena, with yourself as the only judgeβ36Updated 7 months ago
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?β19Updated 2 months ago
- π Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platformβ36Updated 7 months ago
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.β157Updated last week
- π§ | RunPod worker of the faster-whisper model for Serverless Endpoint.β64Updated last month
- Self-host LLMs with vLLM and BentoMLβ62Updated this week
- LLaVA server (llama.cpp).β173Updated 11 months ago