joennlae / halutmatmul
Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator
☆207Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for halutmatmul
- Deep learning accelerator architectures requiring half the multipliers☆263Updated 7 months ago
- A thin, highly portable toolkit for efficiently compiling dense loop-based computation.☆148Updated last year
- ☆234Updated 8 months ago
- Absolute minimalistic implementation of a GPT-like transformer using only numpy (<650 lines).☆250Updated last year
- Exploring the scalable matrix extension of the Apple M4 processor☆134Updated 2 weeks ago
- DiscoGrad - automatically differentiate across conditional branches in C++ programs☆204Updated 2 months ago
- A Data-Centric Compiler for Machine Learning☆82Updated 10 months ago
- This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited r…☆137Updated 6 months ago
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆111Updated 7 months ago
- Code sample showing how to run and benchmark models on Qualcomm's Window PCs☆87Updated last month
- An implementation of bucketMul LLM inference☆214Updated 4 months ago
- An open-source efficient deep learning framework/compiler, written in python.☆652Updated last week
- A library for incremental loading of large PyTorch checkpoints☆56Updated last year
- A copy of ONNX models, datasets, and code all in one GitHub repository. Follow the README to learn more.☆106Updated 11 months ago
- throwaway GPT inference☆139Updated 5 months ago
- Sequential Logic☆95Updated this week
- Flash Attention in ~100 lines of CUDA (forward pass only)☆626Updated 7 months ago
- Fork of LLVM to support AMD AIEngine processors☆106Updated this week
- FlashAttention (Metal Port)☆386Updated last month
- Inference of Mamba models in pure C☆178Updated 8 months ago
- ☆162Updated 5 months ago
- A BERT that you can train on a (gaming) laptop.☆211Updated last year
- A Detailed Introduction to My Favorite Statistical Measure, Hoeffding's D☆95Updated 8 months ago
- A tool to analyze and debug neural networks in pytorch. Use a GUI to traverse the computation graph and view the data from many different…☆270Updated 3 weeks ago
- Richard is gaining power☆176Updated 3 months ago
- ☆248Updated last year
- Nvidia Instruction Set Specification Generator☆216Updated 4 months ago
- A pure NumPy implementation of Mamba.☆216Updated 4 months ago
- Open source machine learning accelerators☆360Updated 7 months ago
- GGUF implementation in C as a library and a tools CLI program☆244Updated 4 months ago