ngxson / wllama
WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
β653Updated last month
Alternatives and similar repositories for wllama:
Users that are interested in wllama are comparing it to the libraries listed below
- WebAssembly (Wasm) Build and Bindings for llama.cppβ249Updated 8 months ago
- A cross-platform browser ML framework.β683Updated 4 months ago
- Stateful load balancer custom-tailored for llama.cpp ππ¦β737Updated last week
- VS Code extension for LLM-assisted code/text completionβ656Updated 3 weeks ago
- FastMLX is a high performance production ready API to host MLX models.β288Updated 3 weeks ago
- πΈοΈπ¦ A WASM vector similarity search written in Rustβ944Updated last year
- Python & JS/TS SDK for running AI-generated code/code interpreting in your AI appβ1,662Updated last week
- Apple MLX engine for LM Studioβ506Updated this week
- MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. Iβ¦β301Updated last week
- Replace OpenAI with Llama.cpp Automagically.β313Updated 10 months ago
- Large Language Models (LLMs) applications and tools running on Apple Silicon in real-time with Apple MLX.β435Updated 2 months ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM β¦β551Updated 2 months ago
- Big & Small LLMs working togetherβ686Updated this week
- Gemma 2 optimized for your local machine.β367Updated 8 months ago
- Vercel and web-llm template to run wasm models directly in the browser.β146Updated last year
- SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.β262Updated this week
- Minimal LLM inference in Rustβ985Updated 5 months ago
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.β1,155Updated this week
- A client side vector search library that can embed, store, search, and cache vectors. Works on the browser and node. It outperforms OpenAβ¦β197Updated 10 months ago
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.β577Updated 5 months ago
- The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edgeβ1,342Updated this week
- Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.β2,019Updated this week
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)β570Updated this week
- Suno AI's Bark model in C/C++ for fast text-to-speech generationβ796Updated 5 months ago
- Local AI API Platformβ2,615Updated this week
- E2B Desktop Sandbox for LLMs. E2B Sandbox with desktop graphical environment that you can connect to any LLM for secure computer use.β587Updated this week
- EntityDB is an in-browser vector database wrapping indexedDB and Transformers.js over WebAssemblyβ146Updated 3 months ago
- An extremely fast implementation of whisper optimized for Apple Silicon using MLX.β686Updated 11 months ago
- Qwen 2.5 Coder 1.5B with Code Interpreterβ280Updated 5 months ago
- SemanticFinder - frontend-only live semantic search with transformers.jsβ268Updated 2 weeks ago