SDA: Low-Bit Stable Diffusion Acceleration on Edge FPGAs
☆19May 23, 2024Updated 2 years ago
Alternatives and similar repositories for SDA_code
Users that are interested in SDA_code are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [FCCM 2023] PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs☆14Jun 26, 2025Updated 11 months ago
- An efficient spatial accelerator enabling hybrid sparse attention mechanisms for long sequences☆32Mar 7, 2024Updated 2 years ago
- ☆30Apr 26, 2019Updated 7 years ago
- 从零快速使用Ubuntu,搭建深度学习环境,持续更新中☆12Apr 18, 2023Updated 3 years ago
- ☆14Aug 1, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆15May 23, 2024Updated 2 years ago
- [TRETS 2025][FPGA 2024] FPGA Accelerator for Imbalanced SpMV using HLS☆21Aug 24, 2025Updated 9 months ago
- ☆17Nov 20, 2022Updated 3 years ago
- ☆17Feb 3, 2022Updated 4 years ago
- Scraping repository of the most relevant topics with regards to Spatio-Temporal Neural Networks available in the arXiv archive. The repos…☆19Jun 1, 2026Updated last week
- CNN simd based accelerator using Vitis HLS☆11Jul 15, 2022Updated 3 years ago
- A simple cycle-accurate DaDianNao simulator☆13Mar 27, 2019Updated 7 years ago
- A Fast DNN Accelerator Design Space Exploration Framework.☆46Aug 10, 2022Updated 3 years ago
- HLS implemented systolic array structure☆41Nov 13, 2017Updated 8 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Kratos: An FPGA Benchmark for Unrolled Deep Neural Networks with Fine-Grained Sparsity and Mixed Precision☆12Jan 19, 2026Updated 4 months ago
- LLM-Aided FPGA Design for Signal Processing Applications☆34Jun 4, 2025Updated last year
- (Verilog) A simple convolution layer implementation with systolic array structure☆13May 9, 2022Updated 4 years ago
- RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior (CVPR2022)☆18Oct 13, 2024Updated last year
- [SIGGRAPH Asia'22] The codes of 3QNet: 3D Point Cloud Geometry Quantization Compression Network☆17Feb 16, 2023Updated 3 years ago
- PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005☆47Nov 8, 2024Updated last year
- High-level synthesis (HLS) implementation of Sparse Matrix Vector Multiplication☆19Feb 17, 2022Updated 4 years ago
- Implementation of convolution layer in different flavors☆68Oct 8, 2017Updated 8 years ago
- FPGA implement of 8x8 weight stationary systolic array DNN accelerator☆18Feb 27, 2021Updated 5 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆26Nov 4, 2022Updated 3 years ago
- A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code☆16Mar 19, 2023Updated 3 years ago
- Training wide residual networks for deployment using a single bit for each weight - Official Code Repository for ICLR 2018 Published Pape…☆36May 27, 2020Updated 6 years ago
- Implementation of the Winograd algorithm.☆24Nov 6, 2018Updated 7 years ago
- ☆13Jul 2, 2016Updated 9 years ago
- Alveo Versal Example Design☆67May 28, 2026Updated last week
- This is forked from Xilinx HLS-Tiny-Tutorial. I'm learning HLS and adding Verilator testbench to verify the generated RTL☆28Oct 4, 2021Updated 4 years ago
- ☆34Oct 2, 2023Updated 2 years ago
- A project for self-implementation of deep learning on FPGAs☆17Aug 24, 2020Updated 5 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A Dataflow library for graph analytics acceleration☆14Dec 15, 2015Updated 10 years ago
- Implementation of Input Stationary, Weight Stationary and Output Stationary dataflow for given neural network on a tiled architecture☆10Apr 19, 2020Updated 6 years ago
- A simple cycle accurate template model for ASIC/FPGA hardware design. Including a cycle accurate FIFO design example. More designs are co…☆17Sep 5, 2019Updated 6 years ago
- vector accelerating unit☆36Dec 1, 2020Updated 5 years ago
- Floating point morton order comparison operator.☆17May 1, 2024Updated 2 years ago
- ☆12Aug 26, 2022Updated 3 years ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆113Oct 15, 2024Updated last year