A low-cost, high-performance deep learning training framework that enables efficient 100B-scale model fine-tuning on a commodity server with a consumer- grade GPU and limited main memory capacity [ICDE 25]
☆23Mar 21, 2025Updated last year
Alternatives and similar repositories for LoHan
Users that are interested in LoHan are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- CAM: Asynchronous GPU-Initiated, CPU-Managed SSD Management for Batching Storage Access [ICDE'25]☆19Mar 3, 2025Updated last year
- Cost-efficient Out-of-core GNN Training System on TB-scale Graph [ICDE 25]☆22Jan 6, 2025Updated last year
- Demystifying Datapath Accelerator Enhanced Off-path SmartNIC [ICNP24]☆59Dec 5, 2024Updated last year
- An awesome language and its compiler.☆35Jun 12, 2022Updated 3 years ago
- ☆49May 20, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Reduction Server in Rust☆14Apr 9, 2024Updated 2 years ago
- Artifacts of EVT ASPLOS'24☆30Mar 6, 2024Updated 2 years ago
- A web-based platform that provides live streaming of classroom sessions at Zhejiang University.☆17Jan 3, 2026Updated 4 months ago
- Shuhai is a benchmarking-memory tool that allows FPGA programmers to demystify all the underlying details of memories, e.g., HBM and DDR4…☆117Jun 15, 2025Updated 11 months ago
- An open-source simulator framework for neural processing units☆39Mar 23, 2026Updated last month
- ☆14Nov 7, 2025Updated 6 months ago
- FpgaNIC is an FPGA-based Versatile 100Gb SmartNIC for GPUs [ATC 22]☆143Aug 17, 2023Updated 2 years ago
- ☆11Apr 10, 2024Updated 2 years ago
- C-like language compiler, the final project of ZJU Compiler Principle course☆43Oct 9, 2022Updated 3 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Automated Design of Agentic Systems☆10Sep 7, 2024Updated last year
- ☆11Aug 9, 2021Updated 4 years ago
- Centaur, a framework for hybrid CPU-FPGA databases☆28May 2, 2017Updated 9 years ago
- Does all kind of cool stuff to make analyzing meta classes easier. Now featuring WRedLogger.py, the previous backend of NetDbg☆10Jun 7, 2023Updated 2 years ago
- Hill Space is All You Need☆17Jul 11, 2025Updated 10 months ago
- Implemented Darius IP (originally target PYNQ) of convolution and maxpool on Xilinx FPGA with SDK☆16Dec 2, 2018Updated 7 years ago
- ☆13May 11, 2026Updated last week
- Legacy Code of ZJU Campus App for iOS☆11Jan 31, 2024Updated 2 years ago
- linux kernel modules examples☆15Nov 18, 2019Updated 6 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- GoldFinch and other hybrid transformer components☆13Dec 9, 2025Updated 5 months ago
- ☆12Aug 13, 2024Updated last year
- FPGA-based stochastic gradient descent (powered by ZipML - Low-precision machine learning on reconfigurable hardware)☆33Feb 10, 2020Updated 6 years ago
- This project is a implementation in PyTorch for ZO-AdaMU optimization: Adapting Perturbation with the Momentum and Uncertainty in Zeroth-…☆14Dec 12, 2023Updated 2 years ago
- ☆15Jan 4, 2026Updated 4 months ago
- Zig regex experiment☆13Nov 6, 2025Updated 6 months ago
- KANs and MLPs☆12Jun 7, 2024Updated last year
- Reproduction study of Grassmann Flows for sequence modeling (arXiv 2512.19428). Shows 22.6% gap vs claimed 10-15%, includes CUDA kernels …☆30Dec 26, 2025Updated 4 months ago
- In-kernel RDMA library☆13Nov 7, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆18Dec 2, 2024Updated last year
- ☆12Jun 5, 2024Updated last year
- ☆36Jun 10, 2024Updated last year
- [EuroSys'25] Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization☆23Apr 13, 2026Updated last month
- A python implementation of delta debugging tool.☆26Feb 9, 2024Updated 2 years ago
- ☆10Apr 19, 2014Updated 12 years ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 3 months ago