joennlae / halutmatmul
Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator
☆209Updated last year
Alternatives and similar repositories for halutmatmul:
Users that are interested in halutmatmul are comparing it to the libraries listed below
- Algebraic enhancements for GEMM & AI accelerators☆275Updated 2 months ago
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024☆180Updated last year
- A thin, highly portable toolkit for efficiently compiling dense loop-based computation.☆148Updated 2 years ago
- ☆241Updated last year
- This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited r…☆154Updated last year
- A Data-Centric Compiler for Machine Learning☆82Updated last year
- High-Performance SGEMM on CUDA devices☆90Updated 3 months ago
- Absolute minimalistic implementation of a GPT-like transformer using only numpy (<650 lines).☆251Updated last year
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆259Updated this week
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆123Updated 2 weeks ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆274Updated last year
- Inference of Mamba models in pure C☆188Updated last year
- Fork of LLVM to support AMD AIEngine processors☆138Updated this week
- DiscoGrad - automatically differentiate across conditional branches in C++ programs☆202Updated 7 months ago
- Code sample showing how to run and benchmark models on Qualcomm's Window PCs☆96Updated 7 months ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆82Updated this week
- throwaway GPT inference☆139Updated 11 months ago
- Nod.ai 🦈 version of 👻 . You probably want to start at https://github.com/nod-ai/shark for the product and the upstream IREE repository …☆106Updated 3 months ago
- ☆297Updated this week
- ☆252Updated last year
- Open source machine learning accelerators☆378Updated last year
- A library for incremental loading of large PyTorch checkpoints☆56Updated 2 years ago
- An open-source efficient deep learning framework/compiler, written in python.☆698Updated 2 months ago
- An implementation of bucketMul LLM inference☆217Updated 10 months ago
- The Riallto Open Source Project from AMD☆77Updated 3 weeks ago
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆144Updated last year
- The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.☆90Updated this week
- Attention in SRAM on Tenstorrent Grayskull☆35Updated 9 months ago
- A copy of ONNX models, datasets, and code all in one GitHub repository. Follow the README to learn more.☆105Updated last year
- A pure, low-level tensor program representation enabling tensor program optimization via program rewriting. See the web demo at https://g…☆70Updated 10 months ago