☆72Mar 21, 2026Updated 2 weeks ago
Alternatives and similar repositories for agent-gpu-skills
Users that are interested in agent-gpu-skills are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.☆13Nov 3, 2023Updated 2 years ago
- ☆49May 20, 2025Updated 10 months ago
- CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark☆34Jun 24, 2025Updated 9 months ago
- Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang…☆13Aug 12, 2022Updated 3 years ago
- A intelligent matrix format designer for SpMV☆10Oct 10, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Mirror of http://gitlab.hpcrl.cse.ohio-state.edu/chong/ppopp19_ae, refactoring for understanding☆16Oct 20, 2021Updated 4 years ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- Everyone loves OS☆20Mar 3, 2026Updated last month
- Parallel SpMV using CSR representation, built in CUDA☆14Jun 27, 2020Updated 5 years ago
- ☆32Apr 2, 2025Updated last year
- ☆57Feb 24, 2026Updated last month
- Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention☆53Updated this week
- Source code of the SC '23 paper: "DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multipli…☆29Jun 18, 2024Updated last year
- A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆31Mar 28, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- a simple API to use CUPTI☆10Aug 19, 2025Updated 7 months ago
- Code and dataset for the EMNLP 2024 paper: GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory☆49Sep 26, 2024Updated last year
- NVIDIA cuTile learn☆166Dec 9, 2025Updated 4 months ago
- ☆19Feb 18, 2025Updated last year
- triton for dsa☆60Apr 2, 2026Updated last week
- FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swa…☆38Oct 5, 2025Updated 6 months ago
- libsmctrl论文的复现,添加了python端接口,可以在python端灵活调用接口来分配计算资源☆12May 21, 2024Updated last year
- Inverse Rendering Toolkit☆14Feb 24, 2025Updated last year
- Optimize GEMM with tensorcore step by step☆37Dec 17, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆24May 9, 2025Updated 11 months ago
- Ongoing research training transformer models at scale☆18Updated this week
- ☆152Mar 18, 2024Updated 2 years ago
- A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs☆29Nov 29, 2023Updated 2 years ago
- ☆17Feb 24, 2026Updated last month
- Official implementation for Training LLMs with MXFP4☆123Apr 25, 2025Updated 11 months ago
- [SIGGRAPH 2025] Official Implementation of "Instant Self-Intersection Repair for 3D Meshes"☆41Mar 26, 2026Updated 2 weeks ago
- WaferLLM: Large Language Model Inference at Wafer Scale☆99Jan 7, 2026Updated 3 months ago
- Houdini HDK backended research oriented physics-based animation infrastructure.☆13Apr 27, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A Single .py File Sympy Extension to Generate Eigen C++ Code from the Symbols.☆12Dec 17, 2025Updated 3 months ago
- ☆11Nov 19, 2025Updated 4 months ago
- LoRAFusion: Efficient LoRA Fine-Tuning for LLMs☆26Sep 23, 2025Updated 6 months ago
- ☆21Jan 23, 2023Updated 3 years ago
- Code repository for ICLR 2025 paper "LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid"☆27Mar 2, 2025Updated last year
- ☆24Oct 13, 2024Updated last year
- Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation☆20Jun 11, 2025Updated 9 months ago