Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.
☆66Nov 24, 2025Updated 7 months ago
Alternatives and similar repositories for libsmctrl
Users that are interested in libsmctrl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Tutorials for NVIDIA CUPTI samples☆68Nov 3, 2025Updated 8 months ago
- eBPF for GPU UVM offloading and scheduling in Linux kernel☆59Apr 15, 2026Updated 2 months ago
- ☆31Apr 8, 2026Updated 2 months ago
- An interference-aware scheduler for fine-grained GPU sharing☆163Nov 26, 2025Updated 7 months ago
- ☆12Nov 5, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.☆43May 29, 2022Updated 4 years ago
- Automatic Parallelism Using LLVM☆10Aug 2, 2014Updated 11 years ago
- ☆12Aug 17, 2022Updated 3 years ago
- ☆84Apr 18, 2025Updated last year
- An efficient storage system for concurrent graph processing☆10Feb 1, 2021Updated 5 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆33Feb 10, 2025Updated last year
- ☆14Feb 5, 2025Updated last year
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆111Dec 17, 2025Updated 6 months ago
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆21Jan 24, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Evaluation utilities based on SymPy.☆22Dec 12, 2024Updated last year
- Composable and Embeddable Communication Runtime for Distributed AI Services☆102Jun 5, 2026Updated 3 weeks ago
- a simple API to use CUPTI☆10Aug 19, 2025Updated 10 months ago
- ☆12May 13, 2025Updated last year
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆173Dec 12, 2023Updated 2 years ago
- Personal house automation system with a REST/Json interface☆18Feb 20, 2024Updated 2 years ago
- Scaling Sparse Fine-Tuning to Large Language Models☆19Jan 31, 2024Updated 2 years ago
- ☆38Jun 4, 2026Updated last month
- This serves as a repository for reproducibility of the SC21 paper "In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated…☆38Sep 25, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A collection of benchmarks and tests for the Patmos processor and compiler☆19Dec 2, 2024Updated last year
- A curated list for Efficient Large Language Models☆11Mar 25, 2024Updated 2 years ago
- BGEMM-CUDA is a CUDA-based low-bit GEMM kernel library for efficient neural network inference. It implements optimized binary and ternary…☆20Aug 30, 2024Updated last year
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆17Nov 4, 2025Updated 8 months ago
- Open-source implementation of the CUDA API.☆13May 5, 2012Updated 14 years ago
- ☆28Aug 19, 2022Updated 3 years ago
- We propose Bidirectional Evolutionary Search (BES), a search framework that couples forward candidate evolution with backward goal decomp…☆160May 28, 2026Updated last month
- 2D and 3D Matrix Convolution and Matrix Multiplication with CUDA☆10Jun 14, 2021Updated 5 years ago
- Potluck with different functions for different purposes that can be shared among C programs☆14May 9, 2026Updated last month
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆34Sep 14, 2024Updated last year
- Repository to demo GPU Sharing with Time Slicing, MPS, MIG and others☆65Oct 17, 2024Updated last year
- Artifacts for our NSDI'23 paper TGS☆97Jun 10, 2024Updated 2 years ago
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections☆22Oct 15, 2024Updated last year
- [ICML 2022] ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks☆15May 18, 2022Updated 4 years ago
- A test case for VFIO_PLATFORM currently based on the PL330 DMA controller. The effort on VFIO_PLATFORM has been partially funded by the S…☆13Dec 12, 2022Updated 3 years ago
- Personal learn Linux system proc notes☆14Dec 6, 2018Updated 7 years ago