vitoplantamura / OnnxStreamLinks
Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but also Mistral 7B on desktops and servers. ARM, x86, WASM, RISC-V supported. Accelerated by XNNPACK.
☆1,956Updated last month
Alternatives and similar repositories for OnnxStream
Users that are interested in OnnxStream are comparing it to the libraries listed below
Sorting:
- Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)☆567Updated last year
- Llama 2 Everywhere (L2E)☆1,519Updated 5 months ago
- This repository contains a pure C++ ONNX implementation of multiple offline AI models, such as StableDiffusion (1.5 and XL), ControlNet, …☆618Updated 3 weeks ago
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,529Updated 3 months ago
- Stable Diffusion and Flux in pure C/C++☆4,175Updated 3 months ago
- ☆1,275Updated last year
- Fork of Facebooks LLaMa model to run on CPU☆773Updated 2 years ago
- Suno AI's Bark model in C/C++ for fast text-to-speech generation☆826Updated 7 months ago
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,216Updated 3 weeks ago
- Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,867Updated last year
- CLIP inference in plain C/C++ with no extra dependencies☆504Updated last week
- Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.☆2,189Updated last week
- C++ implementation for BLOOM☆810Updated 2 years ago
- An extensible, easy-to-use, and portable diffusion web UI 👨🎨☆1,672Updated last year
- Fast stable diffusion on CPU☆1,719Updated last week
- Simple UI for LLM Model Finetuning☆2,063Updated last year
- ☆1,027Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,882Updated last year
- Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.☆3,668Updated last year
- Instruct-tune LLaMA on consumer hardware☆362Updated 2 years ago
- ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…☆2,169Updated 8 months ago
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆4,647Updated 2 months ago
- Cross-Platform, GPU Accelerated Whisper 🏎️☆1,801Updated last year
- Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than…☆1,147Updated this week
- Quantized inference code for LLaMA models☆1,051Updated 2 years ago
- WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.☆1,613Updated 10 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆4,288Updated 2 weeks ago
- Real-time audio to chords, lyrics, beat, and melody.☆693Updated 10 months ago
- Tiny Dream - An embedded, Header Only, Stable Diffusion C++ implementation☆262Updated last year
- Locally run an Instruction-Tuned Chat-Style LLM (Android/Linux/Windows/Mac)☆263Updated 2 years ago