google-ai-edge / LiteRT-LMLinks
☆445Updated this week
Alternatives and similar repositories for LiteRT-LM
Users that are interested in LiteRT-LM are comparing it to the libraries listed below
Sorting:
- ☆695Updated 2 weeks ago
- Train Large Language Models on MLX.☆203Updated last month
- Inference, Fine Tuning and many more recipes with Gemma family of models☆273Updated 3 months ago
- ☆300Updated 2 months ago
- A command-line interface tool for serving LLM using vLLM.☆433Updated last week
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆541Updated last week
- ☆155Updated last month
- Real Time Speech Transcription with FastRTC ⚡️and Local Whisper 🤗☆687Updated 3 months ago
- Fast Streaming TTS with Orpheus + WebRTC (with FastRTC)☆339Updated 6 months ago
- FastMLX is a high performance production ready API to host MLX models.☆332Updated 7 months ago
- 1.58 Bit LLM on Apple Silicon using MLX☆225Updated last year
- Fast parallel LLM inference for MLX☆224Updated last year
- Examples on how to use various LLM providers with a Wine Classification problem☆131Updated 2 weeks ago
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inference☆920Updated 3 weeks ago
- LiteRT, successor to TensorFlow Lite. is Google's On-device framework for high-performance ML & GenAI deployment on edge platforms, via e…☆894Updated this week
- Docs for GGUF quantization (unofficial)☆293Updated 3 months ago
- Official python implementation of UTCP. UTCP is an open standard that lets AI agents call any API directly, without extra middleware.☆583Updated 3 weeks ago
- Sparse Inferencing for transformer based LLMs☆201Updated 2 months ago
- On-device LLM Inference Powered by X-Bit Quantization☆271Updated 2 months ago
- MLX-Embeddings is the best package for running Vision and Language Embedding models locally on your Mac using MLX.☆215Updated last month
- API Server for Transformer Lab☆79Updated this week
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆479Updated last week
- ☆1,986Updated last week
- Code to accompany the Universal Deep Research paper (https://arxiv.org/abs/2509.00244)☆445Updated 2 months ago
- Big & Small LLMs working together☆1,187Updated this week
- Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon☆273Updated last year
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆140Updated 3 months ago
- ☆93Updated 3 weeks ago
- No-code CLI designed for accelerating ONNX workflows☆215Updated 4 months ago
- Gemma 2 optimized for your local machine.☆376Updated last year