akx / ollama-dlLinks
Download models from the Ollama library, without Ollama
☆84Updated 6 months ago
Alternatives and similar repositories for ollama-dl
Users that are interested in ollama-dl are comparing it to the libraries listed below
Sorting:
- Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp☆148Updated last month
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆63Updated last year
- LLM inference in C/C++☆77Updated 3 weeks ago
- VSCode AI coding assistant powered by self-hosted llama.cpp endpoint.☆182Updated 4 months ago
- "a towel is about the most massively useful thing an interstellar AI hitchhiker can have"☆48Updated 7 months ago
- ☆90Updated 5 months ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆45Updated 8 months ago
- Self-host LLMs with vLLM and BentoML☆114Updated this week
- Smart proxy for LLM APIs that enables model-specific parameter control, automatic mode switching (like Qwen3's /think and /no_think), and…☆47Updated 2 weeks ago
- Lightweight Inference server for OpenVINO☆176Updated last week
- ☆87Updated last year
- LM inference server implementation based on *.cpp.☆203Updated this week
- Testing LLM reasoning abilities with lineage relationship quizzes.☆27Updated 2 months ago
- Extract structured data from local or remote LLM models☆42Updated 11 months ago
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆173Updated this week
- A guidance compatibility layer for llama-cpp-python☆34Updated last year
- Distributed Inference for mlx LLm☆92Updated 10 months ago
- ☆71Updated last week
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆43Updated 8 months ago
- klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs☆72Updated 8 months ago
- Serving LLMs in the HF-Transformers format via a PyFlask API☆71Updated 8 months ago
- LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.☆123Updated last year
- Something similar to Apple Intelligence?☆60Updated 11 months ago
- ☆18Updated 3 months ago
- A simple speech-to-text and text-to-speech AI chatbot that can be run fully offline.☆45Updated last year
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆84Updated 5 months ago
- Pybind11 bindings for Whisper.cpp☆57Updated this week
- a browser gui for nvidia smi☆18Updated 2 months ago
- Self-hosted LLM chatbot arena, with yourself as the only judge☆41Updated last year
- Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.☆115Updated last year