ngxson / wllama
WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
β684Updated last week
Alternatives and similar repositories for wllama:
Users that are interested in wllama are comparing it to the libraries listed below
- WebAssembly (Wasm) Build and Bindings for llama.cppβ257Updated 9 months ago
- A cross-platform browser ML framework.β689Updated 5 months ago
- Stateful load balancer custom-tailored for llama.cpp ππ¦β747Updated last week
- Apple MLX engine for LM Studioβ535Updated last week
- FastMLX is a high performance production ready API to host MLX models.β297Updated last month
- VS Code extension for LLM-assisted code/text completionβ692Updated 3 weeks ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM β¦β556Updated 2 months ago
- Large-scale LLM inference engineβ1,405Updated last week
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.β1,228Updated this week
- πΈοΈπ¦ A WASM vector similarity search written in Rustβ951Updated last year
- β863Updated 7 months ago
- Big & Small LLMs working togetherβ733Updated this week
- Vercel and web-llm template to run wasm models directly in the browser.β148Updated last year
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)β579Updated last week
- Large Language Models (LLMs) applications and tools running on Apple Silicon in real-time with Apple MLX.β442Updated 3 months ago
- Chat with AI large language models running natively in your browser. Enjoy private, server-free, seamless AI conversations.β732Updated this week
- On-device LLM Inference Powered by X-Bit Quantizationβ235Updated this week
- Replace OpenAI with Llama.cpp Automagically.β318Updated 10 months ago
- Run Large-Language Models (LLMs) π directly in your browser!β202Updated 8 months ago
- An extremely fast implementation of whisper optimized for Apple Silicon using MLX.β696Updated 11 months ago
- LLM-based code completion engineβ185Updated 3 months ago
- A SQLite extension for generating text embeddings from GGUF models using llama.cppβ185Updated 5 months ago
- Start a server from the MLX library.β185Updated 9 months ago
- JS tokenizer for LLaMA 1 and 2β351Updated 10 months ago
- SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.β264Updated this week
- Suno AI's Bark model in C/C++ for fast text-to-speech generationβ808Updated 5 months ago
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUsβ335Updated last week
- A collection of π€ Transformers.js demos and example applicationsβ1,438Updated 2 weeks ago
- llama.cpp fork with additional SOTA quants and improved performanceβ400Updated this week
- MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. Iβ¦β350Updated 3 weeks ago