vitoplantamura / OnnxStreamLinks
Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but also Mistral 7B on desktops and servers. ARM, x86, WASM, RISC-V supported. Accelerated by XNNPACK. Python, C# and JS(WASM) bindings available.
☆2,025Updated last week
Alternatives and similar repositories for OnnxStream
Users that are interested in OnnxStream are comparing it to the libraries listed below
Sorting:
- Llama 2 Everywhere (L2E)☆1,527Updated 5 months ago
- ☆1,283Updated 2 years ago
- Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)☆568Updated 2 years ago
- Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++☆5,275Updated this week
- This repository contains a pure C++ ONNX implementation of multiple offline AI models, such as StableDiffusion (1.5 and XL), ControlNet, …☆628Updated 8 months ago
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,562Updated 10 months ago
- ☆1,026Updated 2 years ago
- An extensible, easy-to-use, and portable diffusion web UI 👨🎨☆1,674Updated 2 years ago
- Fork of Facebooks LLaMa model to run on CPU☆771Updated 2 years ago
- C++ implementation for BLOOM☆809Updated 2 years ago
- MiniLLM is a minimal system for running modern LLMs on consumer-grade GPUs☆944Updated 2 years ago
- Fast stable diffusion on CPU and AI PC☆1,973Updated 3 weeks ago
- Suno AI's Bark model in C/C++ for fast text-to-speech generation☆852Updated last year
- Simple UI for LLM Model Finetuning☆2,062Updated 2 years ago
- Explore large language models in 512MB of RAM☆1,198Updated 3 weeks ago
- Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.☆3,713Updated last year
- Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.☆2,804Updated 2 weeks ago
- CLIP inference in plain C/C++ with no extra dependencies☆548Updated 7 months ago
- BentoDiffusion: A collection of diffusion models served with BentoML☆380Updated 9 months ago
- How to run Stable Diffusion on Raspberry Pi 4☆94Updated 3 years ago
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,426Updated last month
- Raspberry Pi Voice Assistant☆811Updated last year
- NVIDIA Linux open GPU with P2P support☆1,313Updated 7 months ago
- Quantized inference code for LLaMA models☆1,047Updated 2 years ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,907Updated 2 years ago
- Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Tra…☆1,292Updated 2 years ago
- A simple "Be My Eyes" web app with a llama.cpp/llava backend☆492Updated 2 years ago
- Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit☆783Updated last year
- Inference Llama 2 in one file of pure 🔥☆2,116Updated 2 months ago
- A toolbox for working with WebRTC, Audio and AI☆700Updated 2 years ago