mlc-ai / binary-mlc-llm-libs
☆241Updated last week
Alternatives and similar repositories for binary-mlc-llm-libs
Users that are interested in binary-mlc-llm-libs are comparing it to the libraries listed below
Sorting:
- llama.cpp tutorial on Android phone☆101Updated last week
- A mobile Implementation of llama.cpp☆311Updated last year
- IRIS is an android app for interfacing with GGUF / llama.cpp models locally.☆203Updated 3 months ago
- 使用Android手机的CPU推理stable diffusion☆152Updated last year
- MiniCPM on Android platform.☆630Updated last month
- automatically quant GGUF models☆175Updated this week
- ☆532Updated 6 months ago
- Falcon LLM ggml framework with CPU and GPU support☆246Updated last year
- A Ollama client for Android!☆84Updated last year
- A mobile Implementation of llama.cpp☆25Updated last year
- LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.☆123Updated last year
- a lightweight LLM model inference framework☆728Updated last year
- A set of bash scripts to automate deployment of GGML/GGUF models [default: RWKV] with the use of KoboldCpp on Android - Termux☆41Updated 10 months ago
- ☆156Updated 10 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆152Updated 11 months ago
- On-device LLM Inference Powered by X-Bit Quantization☆237Updated last week
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…☆147Updated last year
- Python bindings for ggml☆140Updated 8 months ago
- MobiLlama : Small Language Model tailored for edge devices☆636Updated this week
- Extension for using alternative GitHub Copilot (StarCoder API) in VSCode☆100Updated last year
- 4 bits quantization of LLaMa using GPTQ☆130Updated last year
- Offline voice input panel & keyboard with punctuation for Android.☆103Updated 11 months ago
- ☆156Updated last month
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,514Updated last month
- ☆543Updated 4 months ago
- Making offline AI models accessible to all types of edge devices.☆138Updated last year
- Merge Transformers language models by use of gradient parameters.☆208Updated 9 months ago
- ggml implementation of BERT☆488Updated last year
- Demonstration of running a native LLM on Android device.☆136Updated this week
- Inference code for Mistral and Mixtral hacked up into original Llama implementation☆371Updated last year