mlc-ai / binary-mlc-llm-libs
☆236Updated 4 months ago
Alternatives and similar repositories for binary-mlc-llm-libs:
Users that are interested in binary-mlc-llm-libs are comparing it to the libraries listed below
- llama.cpp tutorial on Android phone☆97Updated 8 months ago
- A mobile Implementation of llama.cpp☆305Updated last year
- llama.cpp fork with additional SOTA quants and improved performance☆231Updated this week
- automatically quant GGUF models☆164Updated last week
- ☆157Updated this week
- 使用Android手机的CPU推理stable diffusion☆150Updated last year
- LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.☆122Updated last year
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆261Updated 5 months ago
- Python bindings for ggml☆140Updated 6 months ago
- High-speed and easy-use LLM serving framework for local deployment☆94Updated 2 weeks ago
- ☆529Updated 5 months ago
- MobiLlama : Small Language Model tailored for edge devices☆628Updated last year
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆164Updated 3 weeks ago
- 使用Android cpu 运行 RWKV V4 ONNX☆70Updated last year
- On-device LLM Inference Powered by X-Bit Quantization☆225Updated 2 weeks ago
- ☆116Updated 11 months ago
- MiniCPM on Android platform.☆630Updated last week
- Running any GGUF SLMs/LLMs locally, on-device in Android☆232Updated last week
- AMD related optimizations for transformer models☆72Updated 4 months ago
- Inference of Mamba models in pure C☆187Updated last year
- 参考自mlc-llm,个人尝试在android手机上部署大模型并运行☆85Updated 7 months ago
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,492Updated last week
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆683Updated 7 months ago
- A mobile Implementation of llama.cpp☆25Updated last year
- ☆55Updated 4 months ago
- Train llama with lora on one 4090 and merge weight of lora to work as stanford alpaca.☆51Updated last year
- 基于MNN-llm的安卓手机部署大语言模型:Qwen1.5-0.5B-Chat☆70Updated 11 months ago
- C++ implementation for 💫StarCoder☆453Updated last year
- Falcon LLM ggml framework with CPU and GPU support☆246Updated last year
- Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts☆112Updated last year