mlc-ai / binary-mlc-llm-libsLinks
☆256Updated 3 weeks ago
Alternatives and similar repositories for binary-mlc-llm-libs
Users that are interested in binary-mlc-llm-libs are comparing it to the libraries listed below
Sorting:
- A mobile Implementation of llama.cpp☆319Updated last year
- llama.cpp tutorial on Android phone☆128Updated 4 months ago
- IRIS is an android app for interfacing with GGUF / llama.cpp models locally.☆235Updated 7 months ago
- LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.☆129Updated 2 years ago
- llama.cpp fork used by GPT4All☆56Updated 6 months ago
- Falcon LLM ggml framework with CPU and GPU support☆247Updated last year
- [ICLR-2025-SLLM Spotlight 🔥]MobiLlama : Small Language Model tailored for edge devices☆660Updated 4 months ago
- Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp☆159Updated 4 months ago
- MiniCPM on Android platform.☆634Updated 5 months ago
- automatically quant GGUF models☆200Updated this week
- C++ implementation for 💫StarCoder☆456Updated 2 years ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated 11 months ago
- 使用Android手机的CPU推理stable diffusion☆159Updated last year
- An innovative library for efficient LLM inference via low-bit quantization☆348Updated last year
- Gemma 2 optimized for your local machine.☆375Updated last year
- Making offline AI models accessible to all types of edge devices.☆145Updated last year
- A mobile Implementation of llama.cpp☆25Updated last year
- ☆161Updated last month
- This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…☆128Updated last year
- Download models from the Ollama library, without Ollama☆97Updated 10 months ago
- High-speed and easy-use LLM serving framework for local deployment☆118Updated last month
- Train your own small bitnet model☆75Updated 10 months ago
- Visual Studio Code extension for WizardCoder☆149Updated 2 years ago
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆696Updated last week
- Local ML voice chat using high-end models.☆175Updated 3 weeks ago
- ☆59Updated last year
- ☆63Updated 10 months ago
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆205Updated 3 weeks ago
- WebAssembly (Wasm) Build and Bindings for llama.cpp☆280Updated last year
- Octogen is an Open-Source Code Interpreter Agent Framework☆259Updated last year