ngxson / wllama
WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
☆620Updated last week
Alternatives and similar repositories for wllama:
Users that are interested in wllama are comparing it to the libraries listed below
- WebAssembly (Wasm) Build and Bindings for llama.cpp☆246Updated 8 months ago
- A cross-platform browser ML framework.☆669Updated 4 months ago
- FastMLX is a high performance production ready API to host MLX models.☆274Updated this week
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆728Updated this week
- Apple MLX engine for LM Studio☆466Updated this week
- Large Language Models (LLMs) applications and tools running on Apple Silicon in real-time with Apple MLX.☆430Updated last month
- EntityDB is an in-browser vector database wrapping indexedDB and Transformers.js over WebAssembly☆137Updated 2 months ago
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.☆566Updated 4 months ago
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆1,057Updated this week
- VS Code extension for LLM-assisted code/text completion☆608Updated this week
- ☆694Updated this week
- 🕸️🦀 A WASM vector similarity search written in Rust☆935Updated last year
- MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. I…☆278Updated last week
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆547Updated last month
- Efficient visual programming for AI language models☆351Updated 6 months ago
- Blazing fast whisper turbo for ASR (speech-to-text) tasks☆199Updated 5 months ago
- Vercel and web-llm template to run wasm models directly in the browser.☆143Updated last year
- LLM-based code completion engine☆181Updated 2 months ago
- Implementation of F5-TTS in MLX☆504Updated this week
- Python tools for WhisperKit: Model conversion, optimization and evaluation☆205Updated 2 months ago
- Fast parallel LLM inference for MLX☆174Updated 8 months ago
- 1.58 Bit LLM on Apple Silicon using MLX☆192Updated 10 months ago
- An extremely fast implementation of whisper optimized for Apple Silicon using MLX.☆675Updated 10 months ago
- GGUF implementation in C as a library and a tools CLI program☆261Updated 2 months ago
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆237Updated 2 weeks ago
- On-device Diffusion Models for Apple Silicon☆599Updated 3 months ago
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆550Updated this week
- Web-optimized vector database (written in Rust).☆219Updated 3 weeks ago
- TypeScript generator for llama.cpp Grammar directly from TypeScript interfaces☆135Updated 8 months ago