brycelelbach/autocuda

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/brycelelbach/autocuda)

brycelelbach / autocuda

Skills and tools for automatically writing and optimizing CUDA kernels

☆83

Alternatives and similar repositories for autocuda

Users that are interested in autocuda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OpenMOSS / MOSS-Video-Preview
View on GitHub
A real-time video understanding foundation model built on Llama-3.2-Vision, featuring comprehensively extended video processing and multi…
☆136Apr 13, 2026Updated 2 weeks ago
moligod / DF-AutomatedTool
View on GitHub
三角洲自动化工具，自由设置，支持单端市场倒卖子弹抢枪皮等一系列自动化点击操作，以及双端倒卖等一系列操作。
☆79Jan 31, 2026Updated 2 months ago
Tes-bo / TesboCFD
View on GitHub
Fully GPU-Accelerated CFD Solver. Star the repository to stay updated on the latest development features and news.
☆22Mar 9, 2026Updated last month
flashinfer-ai / cubloaty
View on GitHub
a size profiler for cuda binary
☆70Jan 15, 2026Updated 3 months ago
1runcacu / openclaude
View on GitHub
This project converts Anthropic's @anthropic-ai/sdk into an OpenAI-style API interface, providing seamless compatibility for Claude-Code …
☆35Sep 4, 2025Updated 7 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
PINTO0309 / tflite-input-output-rewriter
View on GitHub
This tool displays tflite signatures and rewrites the input/output OP name to the name of the signature. There is no need to install Tens…
☆14Dec 13, 2023Updated 2 years ago
PluralisResearch / AsyncPP
View on GitHub
Asynchronous pipeline parallel optimization
☆21Feb 2, 2026Updated 2 months ago
juntang-zhuang / ACProp-Optimizer
View on GitHub
Implementation for ACProp ( Momentum centering and asynchronous update for adaptive gradient methdos, NeurIPS 2021)
☆16Oct 11, 2021Updated 4 years ago
Visual-AI / SPTNet
View on GitHub
[ICLR2024] SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning
☆36Apr 9, 2025Updated last year
phpinto / options_dynamic_hedging
View on GitHub
C++ implementation of a Dynamic Delta Hedging strategy for European Options. Delta Hedging is a great strategy for trying to create a neu…
☆15Aug 1, 2022Updated 3 years ago
christophmschaefer / miluphcuda
View on GitHub
smoothed particle hydrodynamics code
☆35Mar 16, 2026Updated last month
ReiherGroup / CoRe_optimizer
View on GitHub
Continual Resilient (CoRe) Optimizer for PyTorch
☆12Jun 10, 2024Updated last year
mhyrzt / Simple-MADRL-Chess
View on GitHub
MADRL project solving chess environment using PPO with two different methods: 2 agents/networks and a single agent/network.
☆21Apr 1, 2023Updated 3 years ago
sgl-project / DeepGEMM
View on GitHub
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
☆22Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
hao-ai-lab / cse234-w25
View on GitHub
Website for CSE 234, Winter 2025
☆14Mar 24, 2025Updated last year
YibooZhao / cogvideox_vis_attention
View on GitHub
☆10Nov 18, 2024Updated last year
agkphysics / tensorflow-wheels
View on GitHub
Custom TensorFlow Python wheels.
☆11Sep 29, 2024Updated last year
xdhe1216 / ACNet
View on GitHub
☆11Jun 6, 2021Updated 4 years ago
xlite-dev / flux-faster
View on GitHub
A forked version of flux-fast that makes flux-fast even faster with cache-dit, 3.3x speedup on NVIDIA L20.
☆24Jul 18, 2025Updated 9 months ago
Toseic / LLM-inference-arxiv-daily
View on GitHub
🎓Automatically Update LLM inference systems Papers Daily using Github Actions (Update Every 12th hours)
☆12Updated this week
cocktailpeanutlabs / open-webui
View on GitHub
☆10Jul 13, 2024Updated last year
HydraQYH / hp_rms_norm
View on GitHub
High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)
☆30Jan 22, 2026Updated 3 months ago
ISEEKYAN / verl_megatron_practice
View on GitHub
(best/better) practices of megatron on veRL and tuning guide
☆132Apr 22, 2026Updated last week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
HydraQYH / expert_specialization_moe
View on GitHub
Expert Specialization MoE Solution based on CUTLASS
☆26Apr 14, 2026Updated 2 weeks ago
zqOuO / GWT
View on GitHub
☆13Apr 1, 2026Updated 3 weeks ago
ziplab / efficient-stable-diffusion
View on GitHub
☆16Sep 12, 2023Updated 2 years ago
YellowAndGreen / Yolov5-OpenCV-Cpp-Python-ROS
View on GitHub
Inference with YOLOv5, OpenCV 4.5.4 DNN, C++, ROS and Python
☆13Feb 12, 2023Updated 3 years ago
ModelTC / Qwen-Image-Edit-Causal
View on GitHub
In our implementation of Qwen-Image-Edit, we employ block causal attention to improve inference speed.
☆50Feb 16, 2026Updated 2 months ago
NTT123 / cute-viz
View on GitHub
Cute layout visualization
☆37Jan 18, 2026Updated 3 months ago
spencer-luo / common_util
View on GitHub
common util library for C++
☆12Apr 22, 2026Updated last week
cpuimage / minSDTF
View on GitHub
Stable Diffusion V1.5 Inference With PyTorch Weights And More Features Like Stable Diffusion Web UI In Keras 3.x
☆16May 28, 2025Updated 11 months ago
cognizant-ai-labs / aquasurf
View on GitHub
Efficient Activation Function Optimization through Surrogate Modeling
☆11Oct 3, 2023Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
jyanln / AlignReg
View on GitHub
☆17Apr 17, 2024Updated 2 years ago
dimitreOliveira / stable-diffusion-textual-inversion-app
View on GitHub
Custom Textual-Inversion for Stable-Diffusion models with Keras.
☆19Oct 21, 2023Updated 2 years ago
IST-DASLab / gemm-fp8
View on GitHub
High Performance FP8 GEMM Kernels for SM89 and later GPUs.
☆21Jan 24, 2025Updated last year
Tencent / KsanaDiT
View on GitHub
KsanaDiT: High-Performance DiT (Diffusion Transformer) Inference Framework for Video & Image Generation
☆50Mar 30, 2026Updated 3 weeks ago
bwangca / fast-map
View on GitHub
A PyTorch implementation of computing mean average precision in parallel
☆16Jul 7, 2022Updated 3 years ago
OurBluePrint / easy_video
View on GitHub
☆20Mar 3, 2025Updated last year
RedShift51 / fast-latent-decoders
View on GitHub
Toward Lightweight and Fast Decoders for Latent Diffusion Models in Image and Video Generation
☆22Dec 26, 2024Updated last year