Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator
☆215Dec 10, 2023Updated 2 years ago
Alternatives and similar repositories for halutmatmul
Users that are interested in halutmatmul are comparing it to the libraries listed below
Sorting:
- 10x faster matrix and vector operations☆2,519Oct 12, 2022Updated 3 years ago
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024☆185Apr 16, 2024Updated last year
- Algebraic enhancements for GEMM & AI accelerators☆288Feb 28, 2025Updated last year
- Learning Accurate Decision Trees with Bandit Feedback via Quantized Gradient Descent☆17Sep 8, 2022Updated 3 years ago
- BARVINN: A Barrel RISC-V Neural Network Accelerator: https://barvinn.readthedocs.io/en/latest/☆94Jan 5, 2025Updated last year
- Open Source Compiler Framework using ONNX as Frontend and IR☆33Aug 17, 2022Updated 3 years ago
- A safe and efficient target language for functional compilers☆20May 5, 2018Updated 7 years ago
- Heterogeneous Accelerated Computed Cluster (HACC) Resources Page☆22Oct 7, 2025Updated 5 months ago
- FPGA acceleration of arbitrary precision floating point computations.☆40May 17, 2022Updated 3 years ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆281Nov 3, 2023Updated 2 years ago
- NNgen: A Fully-Customizable Hardware Synthesis Compiler for Deep Neural Network☆361Oct 17, 2023Updated 2 years ago
- Fun with wgpu: Simulating slime mold☆24Aug 22, 2024Updated last year
- PyTorch Quantization Framework For OCP MX Datatypes.☆16May 30, 2025Updated 9 months ago
- C++17 Wrapper for ScaLAPACK☆11Oct 5, 2023Updated 2 years ago
- Minimax: a Compressed-First, Microcoded RISC-V CPU☆224Feb 19, 2026Updated 2 weeks ago
- Linear algebra accelerators for RISC-V (published in ICCD 17)☆66Oct 5, 2017Updated 8 years ago
- Proposed RISC-V Composable Custom Extensions Specification☆70Jun 28, 2025Updated 8 months ago
- Rust-based Scheme Compiler, written in the Nanopass style☆12Jun 12, 2018Updated 7 years ago
- Seamless Voice Interactions with LLMs☆12Oct 28, 2023Updated 2 years ago
- Optimised multi-node MPI sorting algorithms in Julia☆10Sep 25, 2024Updated last year
- A configurable RTL to bitstream FPGA toolchain☆56Mar 2, 2026Updated last week
- HNSW implementation in Rust. Reference: https://arxiv.org/ftp/arxiv/papers/1603/1603.09320.pdf☆243Nov 18, 2024Updated last year
- Simian Process Oriented Conservative JIT PDES from LANL☆13Dec 12, 2025Updated 2 months ago
- C++ library for graph ordering☆15Mar 20, 2020Updated 5 years ago
- Netlib Scalapack with robust CMake☆14Feb 25, 2026Updated last week
- Multiple 1-stencil implementations using nvidia cuda.☆13Dec 2, 2017Updated 8 years ago
- ☆11Jun 29, 2021Updated 4 years ago
- ☆15Jan 4, 2023Updated 3 years ago
- muSYCL, the SYCL musical!☆13Aug 25, 2024Updated last year
- the inelegant parser☆13Dec 28, 2021Updated 4 years ago
- Package manager and build abstraction tool for FPGA/ASIC development☆1,391Feb 13, 2026Updated 3 weeks ago
- A simple transformer implementation without difficult syntax and extra bells and whistles.☆54May 12, 2023Updated 2 years ago
- Sparse Boolean linear algebra for Nvidia Cuda, OpenCL and CPU computations☆16Aug 19, 2022Updated 3 years ago
- Experimental plugin for scikit-learn to be able to run (some estimators) on Intel GPUs via numba-dpex.☆16Feb 28, 2024Updated 2 years ago
- [alpha] Expose Julia functions to PyTorch☆15Aug 9, 2019Updated 6 years ago
- Binary Neural Network-based COVID-19 Face-Mask Wear and Positioning Predictor on Edge Devices☆12Jul 1, 2021Updated 4 years ago
- Universal Python binding for the LMDB 'Lightning' Database☆13Nov 7, 2017Updated 8 years ago
- An open-sourced PyTorch library for developing energy efficient multiplication-less models and applications.☆14Feb 3, 2025Updated last year
- ☆32Mar 31, 2025Updated 11 months ago