Skills and tools for automatically writing and optimizing CUDA kernels
☆83Apr 24, 2026Updated this week
Alternatives and similar repositories for autocuda
Users that are interested in autocuda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A real-time video understanding foundation model built on Llama-3.2-Vision, featuring comprehensively extended video processing and multi…☆136Apr 13, 2026Updated 2 weeks ago
- 三角洲自动化工具,自由设置,支持单端市场倒卖子弹抢枪皮等一系列自动化点击操作,以及双端倒卖等一系列操作。☆79Jan 31, 2026Updated 2 months ago
- Fully GPU-Accelerated CFD Solver. Star the repository to stay updated on the latest development features and news.☆22Mar 9, 2026Updated last month
- a size profiler for cuda binary☆70Jan 15, 2026Updated 3 months ago
- This project converts Anthropic's @anthropic-ai/sdk into an OpenAI-style API interface, providing seamless compatibility for Claude-Code …☆35Sep 4, 2025Updated 7 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- This tool displays tflite signatures and rewrites the input/output OP name to the name of the signature. There is no need to install Tens…☆14Dec 13, 2023Updated 2 years ago
- Asynchronous pipeline parallel optimization☆21Feb 2, 2026Updated 2 months ago
- Implementation for ACProp ( Momentum centering and asynchronous update for adaptive gradient methdos, NeurIPS 2021)☆16Oct 11, 2021Updated 4 years ago
- [ICLR2024] SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning☆36Apr 9, 2025Updated last year
- C++ implementation of a Dynamic Delta Hedging strategy for European Options. Delta Hedging is a great strategy for trying to create a neu…☆15Aug 1, 2022Updated 3 years ago
- smoothed particle hydrodynamics code☆35Mar 16, 2026Updated last month
- Continual Resilient (CoRe) Optimizer for PyTorch☆12Jun 10, 2024Updated last year
- MADRL project solving chess environment using PPO with two different methods: 2 agents/networks and a single agent/network.☆21Apr 1, 2023Updated 3 years ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆22Updated this week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Website for CSE 234, Winter 2025☆14Mar 24, 2025Updated last year
- ☆10Nov 18, 2024Updated last year
- Custom TensorFlow Python wheels.☆11Sep 29, 2024Updated last year
- ☆11Jun 6, 2021Updated 4 years ago
- A forked version of flux-fast that makes flux-fast even faster with cache-dit, 3.3x speedup on NVIDIA L20.☆24Jul 18, 2025Updated 9 months ago
- 🎓Automatically Update LLM inference systems Papers Daily using Github Actions (Update Every 12th hours)☆12Updated this week
- ☆10Jul 13, 2024Updated last year
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated 3 months ago
- (best/better) practices of megatron on veRL and tuning guide☆132Apr 22, 2026Updated last week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Expert Specialization MoE Solution based on CUTLASS☆26Apr 14, 2026Updated 2 weeks ago
- ☆13Apr 1, 2026Updated 3 weeks ago
- ☆16Sep 12, 2023Updated 2 years ago
- Inference with YOLOv5, OpenCV 4.5.4 DNN, C++, ROS and Python☆13Feb 12, 2023Updated 3 years ago
- In our implementation of Qwen-Image-Edit, we employ block causal attention to improve inference speed.☆50Feb 16, 2026Updated 2 months ago
- Cute layout visualization☆37Jan 18, 2026Updated 3 months ago
- common util library for C++☆12Apr 22, 2026Updated last week
- Stable Diffusion V1.5 Inference With PyTorch Weights And More Features Like Stable Diffusion Web UI In Keras 3.x☆16May 28, 2025Updated 11 months ago
- Efficient Activation Function Optimization through Surrogate Modeling☆11Oct 3, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆17Apr 17, 2024Updated 2 years ago
- Custom Textual-Inversion for Stable-Diffusion models with Keras.☆19Oct 21, 2023Updated 2 years ago
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆21Jan 24, 2025Updated last year
- KsanaDiT: High-Performance DiT (Diffusion Transformer) Inference Framework for Video & Image Generation☆50Mar 30, 2026Updated 3 weeks ago
- A PyTorch implementation of computing mean average precision in parallel☆16Jul 7, 2022Updated 3 years ago
- ☆20Mar 3, 2025Updated last year
- Toward Lightweight and Fast Decoders for Latent Diffusion Models in Image and Video Generation☆22Dec 26, 2024Updated last year