mlc-ai / binary-mlc-llm-libsLinks
☆262Updated 2 weeks ago
Alternatives and similar repositories for binary-mlc-llm-libs
Users that are interested in binary-mlc-llm-libs are comparing it to the libraries listed below
Sorting:
- A mobile Implementation of llama.cpp☆321Updated last year
- llama.cpp tutorial on Android phone☆134Updated 6 months ago
- LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.☆129Updated 2 years ago
- Falcon LLM ggml framework with CPU and GPU support☆247Updated last year
- automatically quant GGUF models☆214Updated 3 weeks ago
- 使用Android手机的CPU推理stable diffusion☆158Updated last year
- [ICLR-2025-SLLM Spotlight 🔥]MobiLlama : Small Language Model tailored for edge devices☆664Updated 6 months ago
- AMD related optimizations for transformer models☆95Updated last month
- MiniCPM on iOS.☆67Updated 7 months ago
- llama.cpp fork used by GPT4All☆56Updated 8 months ago
- Train your own small bitnet model☆74Updated last year
- LLM-based code completion engine☆190Updated 9 months ago
- C++ implementation for 💫StarCoder☆455Updated 2 years ago
- On-device LLM Inference Powered by X-Bit Quantization☆272Updated this week
- MiniCPM on Android platform.☆636Updated 7 months ago
- Python bindings for ggml☆146Updated last year
- WebAssembly (Wasm) Build and Bindings for llama.cpp☆284Updated last year
- High-speed and easy-use LLM serving framework for local deployment☆132Updated 3 months ago
- A mobile Implementation of llama.cpp☆26Updated 2 years ago
- Making offline AI models accessible to all types of edge devices.☆142Updated last year
- Implementation of the RWKV language model in pure WebGPU/Rust.☆327Updated 3 weeks ago
- Local ML voice chat using high-end models.☆178Updated 3 weeks ago
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated last year
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated last year
- TTS support with GGML☆193Updated last month
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆740Updated last week
- ☆163Updated 3 months ago
- ☆565Updated last year
- 1.58-bit LLaMa model☆83Updated last year
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,553Updated 7 months ago