KohakuBlueleaf / HakuTPU
An AI accelerator implementation with Xilinx FPGA
☆26Updated 2 months ago
Alternatives and similar repositories for HakuTPU:
Users that are interested in HakuTPU are comparing it to the libraries listed below
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆130Updated 11 months ago
- A comprehensive codebase for training and finetuning Image <> Latent models.☆30Updated last month
- ☆87Updated last year
- ☆65Updated 3 months ago
- TerDiT: Ternary Diffusion Models with Transformers☆69Updated 9 months ago
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆18Updated 4 months ago
- ☆12Updated 4 months ago
- Official repository for VQDM:Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization paper☆33Updated 6 months ago
- ☆46Updated 4 months ago
- Model code for inferencing T5☆62Updated 3 weeks ago
- ☆28Updated 7 months ago
- LoRA fine-tune directly on the quantized models.☆27Updated 4 months ago
- Useful utilities for huggingface☆17Updated this week
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆104Updated 5 months ago
- Implementation of layer diffuse inference using refiners☆25Updated 11 months ago
- ☆22Updated 9 months ago
- Evolve diffusion models by merging.☆13Updated 9 months ago
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆37Updated 7 months ago
- Repository with which to explore k-diffusion and diffusers, and within which changes to said packages may be tested.☆55Updated last year
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆57Updated 3 months ago
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆47Updated 8 months ago
- ☆10Updated last year
- The official repository of Quamba☆29Updated 4 months ago
- A demo for the Direct Ascent Synthesis: Hidden Generative Capabilities in Discriminative Models paper (https://arxiv.org/abs/2502.07753)☆37Updated 3 weeks ago
- This repository shows how to use Q8 kernels with `diffusers` to optimize inference of LTX-Video on ADA GPUs.☆15Updated 2 months ago
- ☆20Updated 2 years ago
- [WIP] Better (FP8) attention for Hopper☆26Updated last month
- This package introduces a perceptual loss implementation based on the modern ConvNeXt architecture.☆11Updated 4 months ago
- finetune your florence2 model easy☆20Updated 8 months ago
- ☆11Updated last year