Linaro / tinyBLASLinks
A fork of OpenBLAS with Armv8-A SVE (Scalable Vector Extension) support
☆17Updated 5 years ago
Alternatives and similar repositories for tinyBLAS
Users that are interested in tinyBLAS are comparing it to the libraries listed below
Sorting:
- A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…☆141Updated 2 months ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Updated 11 months ago
- Inference RWKV v7 in pure C.☆43Updated 3 months ago
- The Quasi Quantum Assembly Programming Language☆36Updated last month
- GGUF implementation in C as a library and a tools CLI program☆297Updated 4 months ago
- ☆62Updated last year
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU☆13Updated last year
- Editor with LLM generation tree exploration☆81Updated 10 months ago
- ☆150Updated last week
- 1.58 Bit LLM on Apple Silicon using MLX☆237Updated last year
- tiny code to access tenstorrent blackhole☆61Updated 7 months ago
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆52Updated 10 months ago
- Inference of Mamba models in pure C☆196Updated last year
- Lightweight Llama 3 8B Inference Engine in CUDA C☆53Updated 9 months ago
- webgpu autograd library☆33Updated 7 months ago
- Train your own small bitnet model☆76Updated last year
- asynchronous/distributed speculative evaluation for llama3☆39Updated last year
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning☆294Updated this week
- A library for incremental loading of large PyTorch checkpoints☆56Updated 2 years ago
- Thin wrapper around GGML to make life easier☆41Updated 2 months ago
- A minimalistic C++ Jinja templating engine for LLM chat templates☆202Updated 3 months ago
- Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.☆21Updated 4 months ago
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆55Updated 4 months ago
- WebGPU LLM inference tuned by hand☆151Updated 2 years ago
- Samples of good AI generated CUDA kernels☆99Updated 7 months ago
- A massively parallel, optimal functional runtime in Rust☆31Updated last year
- Because it's there.☆16Updated last year
- C API for MLX☆158Updated this week
- noise_step: Training in 1.58b With No Gradient Memory☆220Updated last year
- Inference Llama/Llama2/Llama3 Modes in NumPy☆21Updated 2 years ago