evilsocket / cake
Distributed LLM and StableDiffusion inference for mobile, desktop and server.
☆2,847Updated 6 months ago
Alternatives and similar repositories for cake
Users that are interested in cake are comparing it to the libraries listed below
Sorting:
- Blazingly fast LLM inference.☆5,568Updated this week
- Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.☆2,054Updated 2 weeks ago
- A lightweight library for portable low-level GPU computation using WebGPU.☆3,864Updated 2 months ago
- Examples using MLX Swift☆1,774Updated this week
- A fast multimodal LLM for real-time voice☆3,916Updated 2 months ago
- Local realtime voice AI☆2,290Updated 2 months ago
- The python library for real-time communication☆3,851Updated 2 weeks ago
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,698Updated 2 months ago
- On-device Speech Recognition for Apple Silicon☆4,587Updated this week
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,913Updated 3 weeks ago
- Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models☆2,746Updated 4 months ago
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆4,012Updated 3 weeks ago
- A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speec…☆1,345Updated this week
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,580Updated this week
- SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.☆1,710Updated 3 weeks ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,188Updated this week
- Deploy high-performance AI models and inference pipelines on FastAPI with built-in batching, streaming and more.☆3,099Updated this week
- 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)☆6,340Updated 4 months ago
- Implementation for MatMul-free LM.☆2,997Updated 6 months ago
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.☆8,313Updated this week
- Minimal LLM inference in Rust☆983Updated 6 months ago
- A blazing fast inference solution for text embeddings models☆3,520Updated this week
- first base model for full-duplex conversational audio☆1,738Updated 4 months ago
- Open Source framework for voice and multimodal conversational AI☆5,937Updated this week
- AirLLM 70B inference with single 4GB GPU☆5,767Updated this week
- High-speed Large Language Model Serving for Local Deployment☆8,191Updated 2 months ago
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆3,906Updated 9 months ago
- g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains☆4,213Updated 3 months ago
- pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tid…☆2,553Updated this week
- Official inference framework for 1-bit LLMs☆18,338Updated this week