mlc-ai / binary-mlc-llm-libsLinks
☆283Updated last month
Alternatives and similar repositories for binary-mlc-llm-libs
Users that are interested in binary-mlc-llm-libs are comparing it to the libraries listed below
Sorting:
- A mobile Implementation of llama.cpp☆324Updated last year
- llama.cpp tutorial on Android phone☆143Updated 8 months ago
- IRIS is an android app for interfacing with GGUF / llama.cpp models locally.☆262Updated 11 months ago
- MiniCPM on Android platform.☆634Updated 9 months ago
- LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.☆130Updated 2 years ago
- [ICLR-2025-SLLM Spotlight 🔥]MobiLlama : Small Language Model tailored for edge devices☆669Updated 8 months ago
- A mobile Implementation of llama.cpp☆26Updated 2 years ago
- ☆59Updated last year
- ☆65Updated last year
- automatically quant GGUF models☆220Updated 3 weeks ago
- Making offline AI models accessible to all types of edge devices.☆145Updated last year
- Falcon LLM ggml framework with CPU and GPU support☆249Updated last year
- On-device LLM Inference Powered by X-Bit Quantization☆276Updated this week
- Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp☆167Updated 8 months ago
- llama.cpp fork used by GPT4All☆55Updated 10 months ago
- Visual Studio Code extension for WizardCoder☆148Updated 2 years ago
- ☆576Updated last year
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆763Updated last week
- C++ implementation for 💫StarCoder☆459Updated 2 years ago
- Inference on CPU code for LLaMA models☆137Updated 2 years ago
- Train your own small bitnet model☆76Updated last year
- A multimodal, function calling powered LLM webui.☆217Updated last year
- ggml implementation of BERT☆499Updated last year
- 参考自mlc-llm,个人尝试在android手机上部署大模型并运行☆90Updated last year
- WebAssembly (Wasm) Build and Bindings for llama.cpp☆285Updated last year
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,556Updated 9 months ago
- AMD related optimizations for transformer models☆96Updated 3 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆351Updated last year
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated last year
- Locally run an Instruction-Tuned Chat-Style LLM (Android/Linux/Windows/Mac)☆263Updated 2 years ago