okuvshynov/llama-sandbox

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/okuvshynov/llama-sandbox)

okuvshynov / llama-sandbox

A collection of experiments related to LLM inference with llama.cpp/mlx

☆40

Alternatives and similar repositories for llama-sandbox

Users that are interested in llama-sandbox are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

diegohce / gogwave
View on GitHub
Go language bindings for the ggwave C++ library
☆14Apr 9, 2025Updated last year
mscheong01 / speculative_decoding.c
View on GitHub
minimal C implementation of speculative decoding based on llama2.c
☆30Jul 15, 2024Updated last year
triple-mu / Stable-Diffusion-TensorRT
View on GitHub
Stable Diffusion in TensorRT 8.5+
☆15Mar 19, 2023Updated 3 years ago
triple-mu / HunyuanDiT-TensorRT-libtorch
View on GitHub
HunyuanDiT with TensorRT and libtorch
☆18May 22, 2024Updated 2 years ago
ggerganov / bark.cpp
View on GitHub
Port of Suno AI's Bark in C/C++ for fast inference
☆55Apr 15, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
kroggen / mamba.c
View on GitHub
Inference of Mamba, Mamba2 and Mamba3 models in pure C
☆202Mar 18, 2026Updated 3 months ago
EdVince / model_zoo
View on GitHub
Recording models
☆12Sep 19, 2023Updated 2 years ago
l-sf / Nanodet_openvino_quant_deploy
View on GitHub
本仓库在OpenVINO推理框架下部署Nanodet检测算法，并重写预处理和后处理部分，具有超高性能！让你在Intel CPU平台上的检测速度起飞！并基于NNCF和PPQ工具将模型量化(PTQ)至int8精度，推理速度更快！
☆16Jun 14, 2023Updated 3 years ago
ggerganov / hnguessr
View on GitHub
Guess the Hacker News titles
☆13Mar 24, 2022Updated 4 years ago
astonzhang / Parameterization-of-Hypercomplex-Multiplications
View on GitHub
Implementation for the PHM paper at ICLR'21
☆13Mar 1, 2023Updated 3 years ago
Akirato / PERM-GaussianKG
View on GitHub
PERM GaussianKG
☆10Nov 24, 2021Updated 4 years ago
xbresson / Long_Tailed_Learning_Requires_Feature_Learning
View on GitHub
Repository for ICLR'23 Long-tailed Learning Requires Feature Learning
☆10Feb 22, 2023Updated 3 years ago
Dominic23331 / rtmpose_tensorrt
View on GitHub
☆22Apr 10, 2024Updated 2 years ago
yuxiaoranyu / stable_diffusion_trt_triton
View on GitHub
☆20Dec 29, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
edgenai / llama_cpp-rs
View on GitHub
High-level, optionally asynchronous Rust bindings to llama.cpp
☆247Jun 5, 2024Updated 2 years ago
srush / anynp
View on GitHub
Proof-of-concept of global switching between numpy/jax/pytorch in a library.
☆17Jun 18, 2024Updated 2 years ago
triple-mu / AI-on-Board
View on GitHub
Examples of AI model running on the board, such as horizon/rockchip and so on.
☆20Jul 10, 2023Updated 3 years ago
Mihaiii / trivia
View on GitHub
A live multiplayer trivia game where users can bid for the subject of the next question
☆29Jan 9, 2026Updated 6 months ago
benpry / chain-of-thought-metaphor
View on GitHub
This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…
☆14Apr 28, 2023Updated 3 years ago
AlexBuz / llama-zip
View on GitHub
LLM-powered lossless compression tool
☆317Jun 16, 2026Updated 3 weeks ago
EdVince / llm-cpp
View on GitHub
☆33Jul 23, 2024Updated last year
karlhigley / ranking-metrics-torch
View on GitHub
Simple ranking metrics for PyTorch on CPU or GPU
☆15Nov 20, 2020Updated 5 years ago
hopef / llama3_chat
View on GitHub
Llama3 Streaming Chat Sample
☆22Apr 24, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
G-structure / PromptMutant
View on GitHub
An implementation of Deepmind's Promptbreeder.
☆23Dec 22, 2023Updated 2 years ago
lavaman131 / dinov2.cpp
View on GitHub
DINOv2 inference engine written in C/C++ using ggml and OpenCV.
☆97May 6, 2025Updated last year
rgerganov / vmdecrypt
View on GitHub
Decrypt multicast Verimatrix streams
☆13Apr 21, 2022Updated 4 years ago
tangledgroup / llama-cpp-wasm
View on GitHub
WebAssembly (Wasm) Build and Bindings for llama.cpp
☆293Jul 23, 2024Updated last year
giannisdaras / sgilo
View on GitHub
[ICML 2022] Official implementation of "Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for Inverse Problems".
☆12Jul 19, 2022Updated 3 years ago
triple-mu / TensorRT2ONNX
View on GitHub
A tool convert TensorRT engine/plan to a fake onnx
☆41Nov 22, 2022Updated 3 years ago
globaledgesoft / Unsupported-Operation-Development-in-SNPE
View on GitHub
This project is intended to build and deploy an SNPE model on Qualcomm Devices, which are having unsupported layers which are not part of…
☆10Oct 4, 2021Updated 4 years ago
Kazuhito00 / MobileSAM-ONNX-Sample
View on GitHub
MobileSAM のエンコーダー/デコーダーをONNXに変換し、推論するサンプル
☆12Apr 11, 2024Updated 2 years ago
ZhimingZhou / AdaShift-LGANs-MaxGP-refactored
View on GitHub
This is a joint implementation of AdaShift optimizer, LGANs, and MaxGP.
☆14Oct 7, 2020Updated 5 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
vedaldi / micro_llama
View on GitHub
A tiny, didactical implementation of LLAMA 3
☆42Dec 2, 2024Updated last year
maxencenoble / tree-diffusion-schrodinger-bridge
View on GitHub
Tree-Based Diffusion Schrödinger Bridge with Applications to Wasserstein Barycenters
☆10Mar 5, 2024Updated 2 years ago
raymond1123 / hgemm
View on GitHub
☆30Nov 16, 2024Updated last year
DataXujing / YOLOv12-TensorRT
View on GitHub
YOLOv12 TensorRT 端到端模型加速推理和INT8量化实现
☆14Mar 5, 2025Updated last year
ternaus / clip2onnx
View on GitHub
Converts CLIP models to ONNX
☆11Jan 17, 2023Updated 3 years ago
kyegomez / Exa
View on GitHub
Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…
☆27Nov 11, 2024Updated last year
Danielhp95 / nix-stable-diffusion
View on GitHub
Nix-friendly fork of: Optimized Stable Diffusion modified to run on lower GPU VRAM
☆10Sep 11, 2022Updated 3 years ago