vitoplantamura / OnnxStream
Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but also Mistral 7B on desktops and servers. ARM, x86, WASM, RISC-V supported. Accelerated by XNNPACK.
☆1,849Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for OnnxStream
- Stable Diffusion and Flux in pure C/C++☆3,455Updated 2 weeks ago
- Llama 2 Everywhere (L2E)☆1,511Updated 2 weeks ago
- ☆1,258Updated last year
- Fast stable diffusion on CPU☆1,484Updated last week
- Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)☆557Updated last year
- An extensible, easy-to-use, and portable diffusion web UI 👨🎨☆1,666Updated last year
- ☆1,021Updated 10 months ago
- This repository contains a pure C++ ONNX implementation of multiple offline AI models, such as StableDiffusion (1.5 and XL), ControlNet, …☆608Updated 6 months ago
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆3,634Updated last week
- Suno AI's Bark model in C/C++ for fast text-to-speech☆719Updated this week
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,420Updated 3 months ago
- Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.☆3,590Updated 7 months ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆4,921Updated 3 months ago
- Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and in…☆1,480Updated 3 weeks ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆3,604Updated last week
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,759Updated last year
- Cross-Platform, GPU Accelerated Whisper 🏎️☆1,727Updated 8 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆3,956Updated 4 months ago
- C++ implementation for BLOOM☆811Updated last year
- CLIP inference in plain C/C++ with no extra dependencies☆456Updated 2 months ago
- Simple UI for LLM Model Finetuning☆2,046Updated 10 months ago
- NVIDIA Linux open GPU with P2P support☆901Updated 5 months ago
- Tensor computation with WebGPU acceleration☆586Updated 3 months ago
- A diffusion model to colorize black and white images☆633Updated last year
- Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,811Updated 9 months ago
- Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.☆1,594Updated this week
- Fork of Facebooks LLaMa model to run on CPU☆771Updated last year
- Stateful load balancer custom-tailored for llama.cpp☆556Updated this week
- ☆759Updated 2 years ago