vitoplantamura / OnnxStream
Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but also Mistral 7B on desktops and servers. ARM, x86, WASM, RISC-V supported. Accelerated by XNNPACK.
☆1,928Updated last week
Alternatives and similar repositories for OnnxStream:
Users that are interested in OnnxStream are comparing it to the libraries listed below
- Llama 2 Everywhere (L2E)☆1,517Updated 2 months ago
- Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)☆566Updated last year
- Fast stable diffusion on CPU☆1,646Updated this week
- ☆1,274Updated last year
- This repository contains a pure C++ ONNX implementation of multiple offline AI models, such as StableDiffusion (1.5 and XL), ControlNet, …☆613Updated 11 months ago
- Fork of Facebooks LLaMa model to run on CPU☆773Updated 2 years ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,846Updated last year
- Stable Diffusion and Flux in pure C/C++☆3,979Updated 3 weeks ago
- ☆1,025Updated last year
- An extensible, easy-to-use, and portable diffusion web UI 👨🎨☆1,668Updated last year
- Suno AI's Bark model in C/C++ for fast text-to-speech generation☆790Updated 4 months ago
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,094Updated 3 weeks ago
- C++ implementation for BLOOM☆809Updated last year
- Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.☆3,647Updated last year
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,494Updated 2 weeks ago
- ☆1,533Updated last year
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆4,622Updated this week
- A BERT that you can train on a (gaming) laptop.☆209Updated last year
- Instruct-tune LLaMA on consumer hardware☆362Updated last year
- CLIP inference in plain C/C++ with no extra dependencies☆489Updated 7 months ago
- SHARK Studio -- Web UI for SHARK+IREE High Performance Machine Learning Distribution☆1,438Updated last week
- Simple UI for LLM Model Finetuning☆2,063Updated last year
- WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.☆1,591Updated 8 months ago
- BentoDiffusion: A collection of diffusion models served with BentoML☆361Updated last week
- Explore large language models in 512MB of RAM☆1,186Updated last month
- Beautiful and Easy to use Stable Diffusion WebUI☆991Updated 9 months ago
- 4 bits quantization of LLaMA using GPTQ☆3,047Updated 8 months ago
- 🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading☆9,541Updated 6 months ago
- Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,855Updated last year
- Quantized inference code for LLaMA models☆1,052Updated 2 years ago