AMD-AIG-AIMA / AMD-LLMLinks
☆187Updated 9 months ago
Alternatives and similar repositories for AMD-LLM
Users that are interested in AMD-LLM are comparing it to the libraries listed below
Sorting:
- Code sample showing how to run and benchmark models on Qualcomm's Window PCs☆99Updated 8 months ago
- Docker-based inference engine for AMD GPUs☆231Updated 8 months ago
- ☆196Updated last month
- Run and explore Llama models locally with minimal dependencies on CPU☆190Updated 8 months ago
- Algebraic enhancements for GEMM & AI accelerators☆277Updated 3 months ago
- Dead Simple LLM Abliteration☆219Updated 4 months ago
- Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit …☆351Updated last month
- An implementation of bucketMul LLM inference☆217Updated 11 months ago
- Tensor library & inference framework for machine learning☆77Updated last week
- ☆163Updated last year
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆126Updated 2 months ago
- This project collects GPU benchmarks from various cloud providers and compares them to fixed per token costs. Use our tool for efficient …☆221Updated 6 months ago
- ☆121Updated 3 weeks ago
- GPU-targeted vendor-agnostic AI library for Windows, and Mistral model implementation.☆58Updated last year
- Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 times faster with just a few l…☆285Updated 3 weeks ago
- A copy of ONNX models, datasets, and code all in one GitHub repository. Follow the README to learn more.☆105Updated last year
- Mistral7B playing DOOM☆132Updated 11 months ago
- throwaway GPT inference☆140Updated last year
- Online compiler for HIP and NVIDIA® CUDA® code to WebGPU☆181Updated 5 months ago
- Absolute minimalistic implementation of a GPT-like transformer using only numpy (<650 lines).☆252Updated last year
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆782Updated this week
- See Through Your Models☆394Updated 3 months ago
- ☆340Updated this week
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024☆180Updated last year
- Fully Open Language Models with Stellar Performance☆231Updated last week
- Wang Yi's GPT solution☆141Updated last year
- GGUF implementation in C as a library and a tools CLI program☆273Updated 5 months ago
- A CLI to manage install and configure llama inference implemenation in multiple languages☆67Updated last year
- A playground to make it easy to try crazy things☆33Updated last week
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆211Updated last year