google-ai-edge / LiteRT-LMLinks

☆445

Alternatives and similar repositories for LiteRT-LM

Users that are interested in LiteRT-LM are comparing it to the libraries listed below

Sorting:

NVlabs / Jet-Nemotron
☆695Updated 2 weeks ago
Goekdeniz-Guelmez / mlx-lm-lora
Train Large Language Models on MLX.
☆203Updated last month
huggingface / huggingface-gemma-recipes
Inference, Fine Tuning and many more recipes with Gemma family of models
☆273Updated 3 months ago
MetaStone-AI / XBai-o4
☆300Updated 2 months ago
Chen-zexi / vllm-cli
A command-line interface tool for serving LLM using vLLM.
☆433Updated last week
huawei-csl / SINQ
Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …
☆541Updated last week
google-ai-edge / ai-edge-apis
☆155Updated last month
sofdog-gh / realtime-transcription-fastrtc
Real Time Speech Transcription with FastRTC ⚡️and Local Whisper 🤗
☆687Updated 3 months ago
freddyaboulton / orpheus-cpp
Fast Streaming TTS with Orpheus + WebRTC (with FastRTC)
☆339Updated 6 months ago
arcee-ai / fastmlx
FastMLX is a high performance production ready API to host MLX models.
☆332Updated 7 months ago
exo-explore / mlx-bitnet
1.58 Bit LLM on Apple Silicon using MLX
☆225Updated last year
willccbb / mlx_parallm
Fast parallel LLM inference for MLX
☆224Updated last year
ivanfioravanti / wine_variety_classification
Examples on how to use various LLM providers with a Wine Classification problem
☆131Updated 2 weeks ago
ngxson / wllama
WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
☆920Updated 3 weeks ago
google-ai-edge / LiteRT
LiteRT, successor to TensorFlow Lite. is Google's On-device framework for high-performance ML & GenAI deployment on edge platforms, via e…
☆894Updated this week
iuliaturc / gguf-docs
Docs for GGUF quantization (unofficial)
☆293Updated 3 months ago
universal-tool-calling-protocol / python-utcp
Official python implementation of UTCP. UTCP is an open standard that lets AI agents call any API directly, without extra middleware.
☆583Updated 3 weeks ago
NimbleEdge / sparse_transformers
Sparse Inferencing for transformer based LLMs
☆201Updated 2 months ago
Picovoice / picollm
On-device LLM Inference Powered by X-Bit Quantization
☆271Updated 2 months ago
Blaizzy / mlx-embeddings
MLX-Embeddings is the best package for running Vision and Language Embedding models locally on your Mac using MLX.
☆215Updated last month
transformerlab / transformerlab-api
API Server for Transformer Lab
☆79Updated this week
SakanaAI / treequest
A Tree Search Library with Flexible API for LLM Inference-Time Scaling
☆479Updated last week
Mega4alik / ollm
☆1,986Updated last week
NVlabs / UniversalDeepResearch
Code to accompany the Universal Deep Research paper (https://arxiv.org/abs/2509.00244)
☆445Updated 2 months ago
HazyResearch / minions
Big & Small LLMs working together
☆1,187Updated this week
JosefAlbers / Phi-3-Vision-MLX
Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon
☆273Updated last year
adriancable / qwen3.c
Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.
☆140Updated 3 months ago
ibm-granite / granite-4.0-language-models
☆93Updated 3 weeks ago
onnx / turnkeyml
No-code CLI designed for accelerating ONNX workflows
☆215Updated 4 months ago
huggingface / local-gemma
Gemma 2 optimized for your local machine.
☆376Updated last year