rejunity / tiny-asic-1_58bit-matrix-mulLinks
Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit
☆173Updated last year
Alternatives and similar repositories for tiny-asic-1_58bit-matrix-mul
Users that are interested in tiny-asic-1_58bit-matrix-mul are comparing it to the libraries listed below
Sorting:
- ☆119Updated 2 years ago
- Machine-Learning Accelerator System Exploration Tools☆196Updated last week
- An AI accelerator implementation with Xilinx FPGA☆79Updated last year
- The Riallto Open Source Project from AMD☆85Updated 9 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆112Updated last year
- ☆37Updated 2 years ago
- A high-efficiency system-on-chip for floating-point compute workloads.☆44Updated last year
- A survey on Hardware Accelerated LLMs☆61Updated last year
- A new LLM solution for RTL code generation, achieving state-of-the-art performance in non-commercial solutions and outperforming GPT-3.5.☆248Updated 11 months ago
- DNN Compiler for Heterogeneous SoCs☆60Updated 3 weeks ago
- Run 64-bit Linux on LiteX + RocketChip☆208Updated 3 months ago
- Research and Materials on Hardware implementation of Transformer Model☆296Updated 11 months ago
- a mini 2x2 systolic array and PE demo☆68Updated last month
- A minimal Tensor Processing Unit (TPU) inspired by Google's TPUv1.☆194Updated last year
- Fully opensource spiking neural network accelerator☆164Updated 2 years ago
- Attention in SRAM on Tenstorrent Grayskull☆40Updated last year
- Ocelot: The Berkeley Out-of-Order Machine With V-EXT support☆224Updated 2 weeks ago
- This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited r…☆170Updated last year
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆229Updated last year
- Torch2Chip (MLSys, 2024)☆55Updated 9 months ago
- ☆117Updated 3 weeks ago
- Open source machine learning accelerators☆396Updated last year
- Universal Memory Interface (UMI)☆157Updated this week
- Verilog evaluation benchmark for large language model☆369Updated 6 months ago
- FREE TPU V3plus for FPGA is the free version of a commercial AI processor (EEP-TPU) for Deep Learning EDGE Inference☆169Updated 2 years ago
- ☆163Updated 7 months ago
- Tenstorrent TT-BUDA Repository☆314Updated 9 months ago
- Samples of good AI generated CUDA kernels☆99Updated 8 months ago
- Opensource software/hardware platform to build edge AI solutions deployed on FPGA or custom ASIC hardware.☆286Updated this week
- An Open Workflow to Build Custom SoCs and run Deep Models at the Edge☆104Updated 2 weeks ago