vitoplantamura / OnnxStream
Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but also Mistral 7B on desktops and servers. ARM, x86, WASM, RISC-V supported. Accelerated by XNNPACK.
☆1,909Updated 3 weeks ago
Alternatives and similar repositories for OnnxStream:
Users that are interested in OnnxStream are comparing it to the libraries listed below
- Llama 2 Everywhere (L2E)☆1,510Updated last month
- Stable Diffusion and Flux in pure C/C++☆3,814Updated last week
- This repository contains a pure C++ ONNX implementation of multiple offline AI models, such as StableDiffusion (1.5 and XL), ControlNet, …☆612Updated 10 months ago
- ☆1,271Updated last year
- Fork of Facebooks LLaMa model to run on CPU☆772Updated last year
- Fast stable diffusion on CPU☆1,596Updated this week
- ☆1,023Updated last year
- Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)☆563Updated last year
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,468Updated 3 weeks ago
- An extensible, easy-to-use, and portable diffusion web UI 👨🎨☆1,664Updated last year
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆3,944Updated this week
- C++ implementation for BLOOM☆810Updated last year
- Quantized inference code for LLaMA models☆1,051Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,822Updated last year
- Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and in…☆1,690Updated this week
- 4 bits quantization of LLaMA using GPTQ☆3,033Updated 7 months ago
- BentoDiffusion: A collection of diffusion models served with BentoML☆349Updated this week
- Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend…☆1,962Updated 10 months ago
- Raspberry Pi Voice Assistant☆762Updated last month
- Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.☆3,636Updated 11 months ago
- Suno AI's Bark model in C/C++ for fast text-to-speech generation☆779Updated 2 months ago
- CLIP inference in plain C/C++ with no extra dependencies☆476Updated 5 months ago
- Simple UI for LLM Model Finetuning☆2,052Updated last year
- WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.☆1,577Updated 6 months ago
- ☆772Updated 2 years ago
- Use Code Llama with Visual Studio Code and the Continue extension. A local LLM alternative to GitHub Copilot.☆566Updated 6 months ago
- Tiny Dream - An embedded, Header Only, Stable Diffusion C++ implementation☆257Updated last year
- Stable diffusion for real-time music generation☆3,527Updated 6 months ago
- ggml implementation of BERT☆480Updated 11 months ago
- MiniLLM is a minimal system for running modern LLMs on consumer-grade GPUs☆894Updated last year