Linaro / tinyBLASLinks
A fork of OpenBLAS with Armv8-A SVE (Scalable Vector Extension) support
☆17Updated 5 years ago
Alternatives and similar repositories for tinyBLAS
Users that are interested in tinyBLAS are comparing it to the libraries listed below
Sorting:
- Inference RWKV v7 in pure C.☆42Updated 2 months ago
- The Quasi Quantum Assembly Programming Language☆36Updated last month
- Editor with LLM generation tree exploration☆80Updated 10 months ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Updated 10 months ago
- tiny code to access tenstorrent blackhole☆61Updated 6 months ago
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆54Updated 3 months ago
- 1.58 Bit LLM on Apple Silicon using MLX☆227Updated last year
- Lightweight Llama 3 8B Inference Engine in CUDA C☆53Updated 9 months ago
- A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…☆141Updated 2 months ago
- Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…☆88Updated this week
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning☆225Updated this week
- Tenstorrent console based hardware information program☆57Updated this week
- Inference of Mamba models in pure C☆195Updated last year
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU☆13Updated last year
- Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.☆19Updated 3 months ago
- noise_step: Training in 1.58b With No Gradient Memory☆221Updated 11 months ago
- Samples of good AI generated CUDA kernels☆94Updated 6 months ago
- Mistral7B playing DOOM☆138Updated last year
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆85Updated this week
- High-performance FlashAttention-2 for AMD, Intel, and Apple GPUs. Drop-in replacement for PyTorch SDPA. Triton backend for ROCm (MI300X, …☆117Updated this week
- ☆62Updated last year
- Train your own small bitnet model☆75Updated last year
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆125Updated 8 months ago
- Pivotal Token Search☆135Updated this week
- webgpu autograd library☆33Updated 6 months ago
- a simplified version of Google's Gemma model to be used for learning☆26Updated last year
- GGUF implementation in C as a library and a tools CLI program☆296Updated 3 months ago
- ☆64Updated last year
- A massively parallel, optimal functional runtime in Rust☆31Updated last year
- ☆141Updated this week