Picovoice / picollm
On-device LLM Inference Powered by X-Bit Quantization
☆238Updated this week
Alternatives and similar repositories for picollm
Users that are interested in picollm are comparing it to the libraries listed below
Sorting:
- Recipes for on-device voice AI and local LLM☆82Updated this week
- On-device streaming text-to-speech engine powered by deep learning☆79Updated last week
- VSCode AI coding assistant powered by self-hosted llama.cpp endpoint.☆180Updated 3 months ago
- Open source LLM UI, compatible with all local LLM providers.☆174Updated 7 months ago
- Local LLM Server with NPU Acceleration☆180Updated last week
- ☆89Updated 4 months ago
- A ggml (C++) re-implementation of tortoise-tts☆182Updated 8 months ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆562Updated 3 months ago
- Run Orpheus 3B Locally With LM Studio☆401Updated last month
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inference☆701Updated 2 weeks ago
- Something similar to Apple Intelligence?☆60Updated 10 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆473Updated this week
- automatically quant GGUF models☆175Updated this week
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆43Updated 7 months ago
- A fast batching API to serve LLM models☆182Updated last year
- Lightweight Inference server for OpenVINO☆166Updated this week
- Replace OpenAI with Llama.cpp Automagically.☆317Updated 11 months ago
- An implementation of the Nvidia's Parakeet models for Apple Silicon using MLX.☆152Updated this week
- A mobile Implementation of llama.cpp☆311Updated last year
- Easy to use interface for the Whisper model optimized for all GPUs!☆202Updated 2 weeks ago
- EntityDB is an in-browser vector database wrapping indexedDB and Transformers.js over WebAssembly☆154Updated last week
- Code for Papeg.ai☆223Updated 4 months ago
- Pybind11 bindings for Whisper.cpp☆57Updated 2 weeks ago
- WebAssembly (Wasm) Build and Bindings for llama.cpp☆259Updated 9 months ago
- ☆130Updated 2 weeks ago
- Fast Streaming TTS with Orpheus + WebRTC (with FastRTC)☆274Updated last month
- This small API downloads and exposes access to NeuML's txtai-wikipedia and full wikipedia datasets, taking in a query and returning full …☆91Updated last month
- LLM Inference on consumer devices☆114Updated 2 months ago
- Fast parallel LLM inference for MLX☆187Updated 10 months ago
- SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.☆266Updated this week