stanford-mast / INFaaSView external linksLinks
Model-less Inference Serving
☆94Nov 4, 2023Updated 2 years ago
Alternatives and similar repositories for INFaaS
Users that are interested in INFaaS are comparing it to the libraries listed below
Sorting:
- Fine-grained GPU sharing primitives☆148Jul 28, 2025Updated 6 months ago
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications☆127May 9, 2022Updated 3 years ago
- ☆15Aug 15, 2024Updated last year
- Artifacts for our ASPLOS'23 paper ElasticFlow☆55May 10, 2024Updated last year
- REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…☆104Dec 24, 2022Updated 3 years ago
- [ACM EuroSys 2023] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access☆56Aug 6, 2025Updated 6 months ago
- Website for Systems Research Seminar at UIUC☆20Updated this week
- ☆38Jun 27, 2025Updated 7 months ago
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- BATCH: Adaptive Batching for Efficient MachineLearning Serving on Serverless Platforms☆11Aug 7, 2021Updated 4 years ago
- Reading seminar in Harvard Cloud Networking and Systems Group☆16Aug 29, 2022Updated 3 years ago
- Evaluating different memory managers for dynamic GPU memory☆26Dec 16, 2020Updated 5 years ago
- Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling☆16Sep 27, 2023Updated 2 years ago
- modified cutlass☆15Oct 26, 2020Updated 5 years ago
- Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020☆137Jul 25, 2024Updated last year
- AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)☆93Jul 14, 2023Updated 2 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆34Feb 10, 2025Updated last year
- ☆84Feb 5, 2026Updated last week
- An Efficient Dynamic Resource Scheduler for Deep Learning Clusters☆41Oct 28, 2017Updated 8 years ago
- Serverless for all computation☆42Feb 14, 2023Updated 3 years ago
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆125Jun 23, 2022Updated 3 years ago
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆1,006Sep 19, 2024Updated last year
- Dorylus: Affordable, Scalable, and Accurate GNN Training☆76May 31, 2021Updated 4 years ago
- Studying GPU Multi-tenancy☆11Jan 11, 2019Updated 7 years ago
- ☆42Sep 8, 2023Updated 2 years ago
- ☆22Feb 18, 2025Updated 11 months ago
- Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion☆32May 15, 2024Updated last year
- Lightweight and Parallel Deep Learning Framework☆264Nov 26, 2022Updated 3 years ago
- Berkeley OS Prelim Reading Notes☆15Sep 20, 2023Updated 2 years ago
- ☆53Dec 26, 2024Updated last year
- GPU-scheduler-for-deep-learning☆210Nov 5, 2020Updated 5 years ago
- A schedule language for large model training☆152Aug 21, 2025Updated 5 months ago
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆25Nov 21, 2024Updated last year
- ☆26Aug 31, 2023Updated 2 years ago
- Thousand Island Scanner: Scaling Video Analysis on AWS Lambda☆13Oct 25, 2019Updated 6 years ago
- Slides from 2021-12-15 talk, "TVM Developer Bootcamp – Writing Hardware Backends"☆11Jan 20, 2022Updated 4 years ago
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆135Feb 22, 2024Updated last year
- General system research material (not limited to paper) reading notes.☆22Mar 17, 2021Updated 4 years ago
- hardware test for CPU,GPU,I/O,memory bandwidth performance☆25Sep 21, 2018Updated 7 years ago