Noeda/rllama

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Noeda/rllama)

Noeda / rllama

Rust+OpenCL+AVX2 implementation of LLaMA inference code

☆554

Alternatives and similar repositories for rllama

Users that are interested in rllama are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

KerfuffleV2 / smolrsrwkv
View on GitHub
A relatively basic implementation of RWKV in Rust written by someone with very little math and ML knowledge. Supports 32, 8 and 4 bit eva…
☆95Sep 2, 2023Updated 2 years ago
rustformers / llm
View on GitHub
[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models
☆6,156Jun 24, 2024Updated 2 years ago
chelsea0x3b / dfdx
View on GitHub
Deep learning in Rust, with shape checked tensors and neural networks
☆1,921Jul 23, 2024Updated 2 years ago
KerfuffleV2 / ggml-sys-bleedingedge
View on GitHub
Bleeding edge low level Rust binding for GGML
☆18Jun 26, 2024Updated 2 years ago
LaurentMazare / diffusers-rs
View on GitHub
An implementation of the diffusers api in Rust
☆592Apr 4, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
srush / llama2.rs
View on GitHub
A fast llama2 decoder in pure Rust.
☆1,063Nov 30, 2023Updated 2 years ago
sobelio / llm-chain
View on GitHub
`llm-chain` is a powerful rust crate for building chains in large language models allowing you to summarise text and complete complex tas…
☆1,605Oct 31, 2024Updated last year
chelsea0x3b / llama-dfdx
View on GitHub
LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
☆113Jul 27, 2023Updated 2 years ago
Narsil / smelte-rs
View on GitHub
☆58Apr 6, 2023Updated 3 years ago
gaxler / llama2.rs
View on GitHub
Inference Llama 2 in one file of pure Rust 🦀
☆235Sep 11, 2023Updated 2 years ago
tracel-ai / burn
View on GitHub
Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
☆15,633Updated this week
LaurentMazare / tch-rs
View on GitHub
Rust bindings for the C++ api of PyTorch.
☆5,454Jul 17, 2026Updated last week
mdrokz / rust-llama.cpp
View on GitHub
LLama.cpp rust bindings
☆423Jun 27, 2024Updated 2 years ago
huggingface / candle
View on GitHub
Minimalist ML framework for Rust
☆20,717Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Gadersd / llama2-burn
View on GitHub
Llama2 LLM ported to Rust burn
☆280Apr 16, 2024Updated 2 years ago
onehr / llama-rs
View on GitHub
Run LLaMA inference on CPU, with Rust 🦀🚀🦙
☆35Jan 5, 2025Updated last year
guillaume-be / rust-bert
View on GitHub
Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
☆3,075Jan 13, 2026Updated 6 months ago
webonnx / wonnx
View on GitHub
A WebGPU-accelerated ONNX inference run-time written 100% in Rust, ready for native and the web
☆1,755Jul 21, 2024Updated 2 years ago
sonos / tract
View on GitHub
Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference
☆3,008Updated this week
tazz4843 / whisper-rs
View on GitHub
Rust bindings to https://github.com/ggerganov/whisper.cpp
☆946Jul 30, 2025Updated 11 months ago
EricLBuehler / mistral.rs
View on GitHub
Fast, flexible LLM inference
☆7,518Updated this week
keyvank / femtoGPT
View on GitHub
Pure Rust implementation of a minimal Generative Pretrained Transformer
☆936Oct 21, 2025Updated 9 months ago
sarah-quinones / faer-rs
View on GitHub
Linear algebra foundation for the Rust programming language
☆2,564Jun 24, 2026Updated last month
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
edgenai / llama_cpp-rs
View on GitHub
High-level, optionally asynchronous Rust bindings to llama.cpp
☆250Jun 5, 2024Updated 2 years ago
Narsil / ggblas
View on GitHub
☆28Aug 10, 2023Updated 2 years ago
Rust-GPU / rust-cuda
View on GitHub
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
☆5,280Apr 29, 2026Updated 2 months ago
neuronika / neuronika
View on GitHub
Tensors and dynamic neural networks in pure Rust.
☆1,086Oct 10, 2022Updated 3 years ago
chelsea0x3b / cudarc
View on GitHub
Safe rust wrapper around CUDA toolkit
☆1,186Jun 19, 2026Updated last month
rustformers / llmcord
View on GitHub
A Discord bot, written in Rust, that generates responses using the LLaMA language model.
☆94Aug 13, 2023Updated 2 years ago
rust-ndarray / ndarray
View on GitHub
ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations
☆4,312Updated this week
rust-ml / linfa
View on GitHub
A Rust machine learning framework.
☆4,711May 30, 2026Updated last month
sporksmith / objgraph
View on GitHub
☆13Aug 29, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
nmntz / bloomz.cpp
View on GitHub
C++ implementation for BLOOM
☆811May 13, 2023Updated 3 years ago
floneum / kalosm
View on GitHub
Instant, controllable, local pre-trained AI models in Rust
☆2,211Updated this week
pykeio / ort
View on GitHub
Fast ML inference & training for ONNX models in Rust
☆2,417Updated this week
kenba / opencl3
View on GitHub
A Rust implementation of the Khronos OpenCL 3.0 API.
☆134Mar 8, 2026Updated 4 months ago
skeskinen / bert.cpp
View on GitHub
ggml implementation of BERT
☆501Feb 23, 2024Updated 2 years ago
PABannier / biogpt.cpp
View on GitHub
Port of Microsoft's BioGPT in C/C++ using ggml
☆87Feb 21, 2024Updated 2 years ago
NolanoOrg / cformers
View on GitHub
SoTA Transformers with C-backend for fast inference on your CPU.
☆312Dec 9, 2023Updated 2 years ago