☆93Apr 7, 2026Updated this week
Alternatives and similar repositories for ai-infra-skills
Users that are interested in ai-infra-skills are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16Jan 14, 2025Updated last year
- Gensis is a lightweight deep learning framework written from scratch in Python, with Triton as its backend for high-performance computing…☆36Jan 15, 2026Updated 2 months ago
- LaTeX 排版学术论文编写规则(国家标准GB/T 7713.2—2022)☆24Mar 25, 2024Updated 2 years ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- ☆18Mar 4, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- 官方transformers源码解析。AI大模型时代,pytorch、transformer是新操作系统,其他都是 运行在其上面的软件。☆16Sep 25, 2023Updated 2 years ago
- Bjontegaard metric calculation. Include BD-PSNR and BD-rate☆13Sep 4, 2024Updated last year
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- ☆17Aug 2, 2023Updated 2 years ago
- ☆38May 23, 2025Updated 10 months ago
- ☆22Apr 22, 2024Updated last year
- SGLang Kernel Wheel Index☆18Updated this week
- In this repository, I share some useful resources that you should know before pursuing your Master's or Ph.D. degree.☆24Jan 12, 2025Updated last year
- FuseFL: One-Shot Federated Learning through the Lens of Causality with Progressive Model Fusion (NeurIPS 2024 Spotlight)☆14Mar 31, 2025Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆73Jun 25, 2024Updated last year
- Bjontegaard metric computation in the python language☆17Oct 9, 2020Updated 5 years ago
- TensorRT-in-Action 是一个 GitHub 代码库,提供了使用 TensorRT 的代码示例,并有对应 Jupyter Notebook。☆15Jun 1, 2023Updated 2 years ago
- This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.☆43Sep 29, 2025Updated 6 months ago
- ☆20Sep 28, 2024Updated last year
- Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".☆16Sep 15, 2024Updated last year
- ☆16Feb 6, 2024Updated 2 years ago
- Offical implementation of our paper "Exploring the Potential of Diffusion Large Language Models in Code Generation".☆20Oct 29, 2025Updated 5 months ago
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆18Apr 22, 2025Updated 11 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Uniform Framework: multi-target localization, emotion recognition, and defect detection.☆15Dec 15, 2025Updated 3 months ago
- ☆40Dec 14, 2025Updated 3 months ago
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆174Nov 11, 2025Updated 4 months ago
- Some microbenchmarks and design docs before commencement☆11Feb 1, 2021Updated 5 years ago
- R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning☆36Feb 9, 2026Updated 2 months ago
- Example apps for LeapSDK☆60Updated this week
- the original reference implementation of a specified llama.cpp backend for Qualcomm Hexagon NPU on Android phone, https://github.com/ggml…☆38Jul 14, 2025Updated 8 months ago
- All-in-one benchmarking platform for evaluating LLM.☆15Nov 12, 2025Updated 4 months ago
- Here are my personal paper reading notes (including machine learning systems, AI infrastructure, and other interesting stuffs).☆174Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- [NeurIPS 2023] Token-Scaled Logit Distillation for Ternary Weight Generative Language Models☆18Dec 6, 2023Updated 2 years ago
- A set of examples around MegEngine☆31Dec 8, 2023Updated 2 years ago
- Flash Attention from Scratch on CUDA Ampere☆162Sep 1, 2025Updated 7 months ago
- AFPQ code implementation☆23Nov 6, 2023Updated 2 years ago
- ☆42Mar 28, 2024Updated 2 years ago
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.☆30Mar 26, 2026Updated 2 weeks ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆174Feb 11, 2026Updated last month