rejunity / tiny-asic-1_58bit-matrix-mul
Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit
☆148Updated last year
Alternatives and similar repositories for tiny-asic-1_58bit-matrix-mul
Users that are interested in tiny-asic-1_58bit-matrix-mul are comparing it to the libraries listed below
Sorting:
- ☆89Updated last year
- An AI accelerator implementation with Xilinx FPGA☆43Updated 3 months ago
- Machine-Learning Accelerator System Exploration Tools☆161Updated 2 weeks ago
- A survey on Hardware Accelerated LLMs☆51Updated 4 months ago
- A new LLM solution for RTL code generation, achieving state-of-the-art performance in non-commercial solutions and outperforming GPT-3.5.☆191Updated 3 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆106Updated 7 months ago
- ☆22Updated last year
- The Riallto Open Source Project from AMD☆77Updated last month
- Verilog evaluation benchmark for large language model☆259Updated 3 months ago
- ☆27Updated 2 months ago
- Run 64-bit Linux on LiteX + RocketChip☆196Updated 9 months ago
- An Open Workflow to Build Custom SoCs and run Deep Models at the Edge☆77Updated this week
- Spatz is a compact RISC-V-based vector processor meant for high-performance, small computing clusters.☆108Updated this week
- FREE TPU V3plus for FPGA is the free version of a commercial AI processor (EEP-TPU) for Deep Learning EDGE Inference☆145Updated last year
- 1.58-bit LLaMa model☆81Updated last year
- A high-efficiency system-on-chip for floating-point compute workloads.☆30Updated 4 months ago
- Fully opensource spiking neural network accelerator☆146Updated 2 years ago
- Ocelot: The Berkeley Out-of-Order Machine With V-EXT support☆162Updated 4 months ago
- PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applications☆40Updated 2 years ago
- Universal Memory Interface (UMI)☆145Updated this week
- ☆39Updated last year
- Torch2Chip (MLSys, 2024)☆51Updated last month
- Small-scale Tensor Processing Unit built on an FPGA☆183Updated 5 years ago
- Research and Materials on Hardware implementation of Transformer Model☆259Updated 2 months ago
- ☆42Updated 3 weeks ago
- NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions☆30Updated last month
- Vector Acceleration IP core for RISC-V*☆177Updated 3 weeks ago
- hardware design of universal NPU(CNN accelerator) for various convolution neural network☆119Updated 2 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 7 months ago
- Experimental BitNet Implementation☆64Updated last year