PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu
☆78Dec 3, 2024Updated last year
Alternatives and similar repositories for torch-cublas-hgemm
Users that are interested in torch-cublas-hgemm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x fast…☆287Oct 12, 2024Updated last year
- [PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and min…☆10Aug 13, 2024Updated last year
- Low-Rank Llama Custom Training☆23Mar 27, 2024Updated 2 years ago
- ☆18Mar 18, 2024Updated 2 years ago
- Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops☆30Mar 16, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆13Dec 22, 2024Updated last year
- ☆12Jan 4, 2024Updated 2 years ago
- Cuda extensions for PyTorch☆12Dec 2, 2025Updated 6 months ago
- JAX bindings for Flash Attention v2☆108Feb 28, 2026Updated 4 months ago
- Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"☆11Mar 31, 2024Updated 2 years ago
- https://hf.co/hexgrad/Kokoro-82M☆14Jan 14, 2026Updated 5 months ago
- ComfyUI node that generates animated dotted waveform visualizations from audio input with multiple animation styles including teardrop-sh…☆31Apr 9, 2026Updated 2 months ago
- ☆19Apr 23, 2025Updated last year
- ☆19Dec 4, 2025Updated 6 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆29Jan 26, 2026Updated 5 months ago
- ☆18Dec 2, 2024Updated last year
- FORA introduces simple yet effective caching mechanism in Diffusion Transformer Architecture for faster inference sampling.☆56Jul 8, 2024Updated last year
- A lightweight design for computation-communication overlap.☆241Jan 20, 2026Updated 5 months ago
- Code for reproducibility of the E-swish paper experiments☆16Jan 23, 2018Updated 8 years ago
- ☆21Mar 3, 2025Updated last year
- ☆21Jun 6, 2024Updated 2 years ago
- Code for my workshop "Production-ready WebAssembly with Rust" presented at RustLab 2023 in Florence☆16Nov 23, 2023Updated 2 years ago
- Graph model execution API for Candle☆18Jul 27, 2025Updated 11 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆82Aug 12, 2024Updated last year
- ☆148Jun 16, 2024Updated 2 years ago
- ☆33Nov 11, 2024Updated last year
- A repo to do interpretability of pre-trained acoustic models☆15Oct 15, 2023Updated 2 years ago
- A ComfyUI node implementation for ByteDance's Sa2VA☆96Dec 22, 2025Updated 6 months ago
- TensorRT Extension for Stable Diffusion Web UI (Enhanced)☆15Feb 12, 2025Updated last year
- Triton-based implementation of Sparse Mixture of Experts.☆278Oct 3, 2025Updated 8 months ago
- Creates prompts for Video Models by sequence analysis and prompting using Qwen2.5-VL models from Alibaba.☆57Apr 2, 2025Updated last year
- A ComfyUI plugin that provides a user interface of StableStudio☆23Aug 15, 2025Updated 10 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆17Dec 12, 2021Updated 4 years ago
- JoyTag is a state of the art AI vision model for tagging images, with a focus on sex positivity and inclusivity. It uses the Danbooru tag…☆70May 22, 2024Updated 2 years ago
- Easily create video datasets with auto-captioning for Hunyuan-Video LoRA finetuning☆15Apr 2, 2025Updated last year
- ☆18Apr 3, 2023Updated 3 years ago
- ☆45Oct 15, 2025Updated 8 months ago
- ComfyUI node to use the moondream tiny vision language model☆110Aug 12, 2024Updated last year
- Utilities for Training Very Large Models☆59Sep 25, 2024Updated last year