Accelerating Deep Learning Training Through Transparent Storage Tiering (CCGrid'22)
☆19Dec 13, 2022Updated 3 years ago
Alternatives and similar repositories for monarch
Users that are interested in monarch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆26Dec 12, 2017Updated 8 years ago
- Artifacts of VLDB'22 paper "COMET: A Novel Memory-Efficient Deep Learning TrainingFramework by Using Error-Bounded Lossy Compression"☆10Aug 2, 2022Updated 3 years ago
- Ginex: SSD-enabled Billion-scale Graph Neural Network Training on a Single Machine via Provably Optimal In-memory Caching☆43Jul 10, 2024Updated last year
- ☆23Jun 21, 2023Updated 3 years ago
- ☆57Jan 25, 2021Updated 5 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆15Jan 21, 2023Updated 3 years ago
- Argobots bindings for the Mercury RPC library☆27Jun 22, 2026Updated last week
- Ephemeral distributed filesystem build up from the local storage of several nodes. It is an evolution of AdaFS done inside the NGIO proje…☆37Feb 10, 2022Updated 4 years ago
- A tracing tool to analyze the I/O behavior of a program.☆12Sep 25, 2019Updated 6 years ago
- ML Input Data Processing as a Service. This repository contains the source code for Cachew (built on top of TensorFlow).☆41Sep 10, 2024Updated last year
- A file system with the power of an object store.☆28Mar 6, 2019Updated 7 years ago
- ☆38Jan 15, 2021Updated 5 years ago
- ☆32May 28, 2024Updated 2 years ago
- ☆20Jul 26, 2021Updated 4 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Beacon is a monitoring tool for HPC centers, and has been deployed on the current No.3 Sunway TaihuLight Supercomputer for over a year. W…☆21Dec 18, 2020Updated 5 years ago
- A persistent key-value store that is embeddable and optimized for fast storage.☆37Jun 3, 2026Updated 3 weeks ago
- Benchmarking tool for assessing LLM models' performance across different hardwares☆17Dec 8, 2023Updated 2 years ago
- DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.☆58Aug 21, 2024Updated last year
- ☆21May 13, 2022Updated 4 years ago
- BlindDB: an Encrypted, Distributed, and Searchable Key-value Store☆10Oct 10, 2017Updated 8 years ago
- A Filesystem Semi-Microkernel.☆48Oct 24, 2023Updated 2 years ago
- Notes and Examples to get started Parallel Computing with CUDA.☆13Nov 1, 2019Updated 6 years ago
- Official Implementation of APB (ACL 2025 main Oral) and Spava (ACL 2026 main).☆37Apr 6, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- VaniDL is an tool for analyzing I/O patterns and behavior with Deep Learning Applications.☆10Jul 8, 2022Updated 3 years ago
- Near-optimal Prefetching System☆33Nov 17, 2021Updated 4 years ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆25May 12, 2025Updated last year
- ☆18Apr 25, 2025Updated last year
- EPCC I/O benchmarking applications☆12Dec 15, 2021Updated 4 years ago
- a high performance system for customized-precision distributed deep learning☆12Dec 10, 2020Updated 5 years ago
- UnifyFS: A file system for burst buffers☆121Sep 29, 2025Updated 9 months ago
- Primo: Practical Learning-Augmented Systems with Interpretable Models☆19Dec 26, 2023Updated 2 years ago
- IO-500☆37Sep 29, 2020Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆16Nov 2, 2022Updated 3 years ago
- Thallium is a C++14 library wrapping Margo, Mercury, and Argobots and providing an object-oriented way to use these libraries.☆16May 4, 2026Updated last month
- Herald: Accelerating Neural Recommendation Training with Embedding Scheduling (NSDI 2024)☆23May 9, 2024Updated 2 years ago
- Bringing AI practically to science!☆25Jun 18, 2026Updated 2 weeks ago
- SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training☆36Mar 1, 2023Updated 3 years ago
- Write a simple file system from zero.☆12Apr 14, 2024Updated 2 years ago
- Source code for the FAST '23 paper “MadFS: Per-File Virtualization for Userspace Persistent Memory Filesystems”☆18Mar 5, 2023Updated 3 years ago