☆47Jun 27, 2024Updated last year
Alternatives and similar repositories for melange-release
Users that are interested in melange-release are comparing it to the libraries listed below
Sorting:
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆36Aug 29, 2025Updated 6 months ago
- ☆12Oct 16, 2022Updated 3 years ago
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆24Nov 21, 2024Updated last year
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆135Feb 22, 2024Updated 2 years ago
- LLM Serving Performance Evaluation Harness☆83Feb 25, 2025Updated last year
- Stateful LLM Serving☆97Mar 11, 2025Updated 11 months ago
- ☆66Nov 4, 2024Updated last year
- A universal workflow system for exactly-once DAGs☆23Jun 1, 2023Updated 2 years ago
- A benchmark suite for evaluating FaaS scheduler.☆23Nov 5, 2022Updated 3 years ago
- ☆150Oct 9, 2024Updated last year
- ☆15May 2, 2023Updated 2 years ago
- Code for our ICLR Trustworthy ML 2020 workshop paper "Improved Image Wasserstein Attacks and Defenses"☆14Apr 28, 2020Updated 5 years ago
- A throughput-oriented high-performance serving framework for LLMs☆947Oct 29, 2025Updated 4 months ago
- Disaggregated serving system for Large Language Models (LLMs).☆778Apr 6, 2025Updated 11 months ago
- A large-scale simulation framework for LLM inference☆545Jul 25, 2025Updated 7 months ago
- CausIL is an approach to estimate the causal graph for a cloud microservice system, where the nodes are the service-specific metrics whil…☆13Jul 3, 2023Updated 2 years ago
- Latent Large Language Models☆19Aug 24, 2024Updated last year
- A low-latency & high-throughput serving engine for LLMs☆482Jan 8, 2026Updated 2 months ago
- Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"☆77Oct 15, 2025Updated 4 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆143Dec 4, 2024Updated last year
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- Efficient and easy multi-instance LLM serving☆528Sep 3, 2025Updated 6 months ago
- ☆17May 10, 2024Updated last year
- ☆20May 14, 2025Updated 9 months ago
- An Open-Source SCAlable Interface for ISA Extensionsfor RISC-V Processors. New Version:☆17Feb 29, 2024Updated 2 years ago
- Artifacts for our SIGCOMM'23 paper Ditto☆15Oct 17, 2023Updated 2 years ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆210Sep 21, 2024Updated last year
- This is the course project for CSCE585: ML Systems. Students will build their machine learning systems based on the provided infrastructu…☆12Dec 15, 2020Updated 5 years ago
- ddl-benchmarks: Benchmarks for Distributed Deep Learning☆36May 29, 2020Updated 5 years ago
- This repository stores the source code for the Mistral Hackathon 2024 in Paris☆16Aug 23, 2024Updated last year
- A parallelism VAE avoids OOM for high resolution image generation☆85Aug 4, 2025Updated 7 months ago
- ☆19Jan 10, 2023Updated 3 years ago
- Example code to create and train a Pytorch model using the new C++ frontend.☆17Mar 19, 2019Updated 6 years ago
- [ICML 2024] Serving LLMs on heterogeneous decentralized clusters.☆34May 6, 2024Updated last year
- RISC-V ISA based 32-bit processor written in HLS☆16Nov 7, 2019Updated 6 years ago
- PyTorch library for cost-effective, fast and easy serving of MoE models.☆285Updated this week
- ☆131Nov 11, 2024Updated last year
- Source code for OSDI 2023 paper titled "Cilantro - Performance-Aware Resource Allocation for General Objectives via Online Feedback"☆40Jul 6, 2023Updated 2 years ago
- All-in-one UI for merged LLMs in Hugging Face☆25Jun 10, 2024Updated last year