PyTorch-UVM on super-large language models.
☆17Dec 21, 2020Updated 5 years ago
Alternatives and similar repositories for cuda-uvm-gpt2
Users that are interested in cuda-uvm-gpt2 are comparing it to the libraries listed below
Sorting:
- Tensors and Dynamic neural networks in Python with strong GPU acceleration☆15Dec 21, 2020Updated 5 years ago
- ☆26Aug 19, 2022Updated 3 years ago
- ☆80Nov 16, 2020Updated 5 years ago
- Pannotia v0.9 is a suite of OpenCL graph applications☆24Sep 13, 2017Updated 8 years ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆31Sep 19, 2024Updated last year
- A fast, accurate, and easy-to-integrate memory simulator that model memory system performance with bandwidth--latency curves.☆33Oct 18, 2025Updated 4 months ago
- ☆33Sep 9, 2020Updated 5 years ago
- MultiscaleGraphSignalTransforms.jl is a collection of software tools written in the Julia programming language for graph signal processin…☆11Sep 13, 2025Updated 5 months ago
- The SEAL-CPU backend is a Reference backend engine for HEBench which is a shared library that implements the required functions specified…☆11Mar 3, 2023Updated 3 years ago
- This serves as a repository for reproducibility of the SC21 paper "In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated…☆39Sep 25, 2023Updated 2 years ago
- a TensorFlow implementation of the paper "Feature Super-Resolution Based Facial Expression Recognition for Multi-scale Low-Resolution Ima…☆13Nov 30, 2021Updated 4 years ago
- open source taxi dispatch software 出行加打车软件UI设计效果图☆14Dec 22, 2020Updated 5 years ago
- DELTA-pytorch:DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation☆12Apr 16, 2024Updated last year
- A demo project demonstrating the performance improvement by cpp extension, which wrapped with pybind11.☆10Nov 16, 2021Updated 4 years ago
- ☆12Apr 30, 2024Updated last year
- The algorithms for multilevel evaluation of balance in signed directed networks☆10Jul 4, 2024Updated last year
- ☆11Jan 5, 2022Updated 4 years ago
- Repo for PyChart 1.39, refs http://download.gna.org/pychart/☆10Sep 29, 2014Updated 11 years ago
- Automatic ReLU Reduction☆15Dec 20, 2023Updated 2 years ago
- CR-LT KGQA Dataset Repository☆10Jun 1, 2025Updated 9 months ago
- A project that patch the xiaomi linux system which can connect to chatGPT with WebRTC and Websocket☆10Aug 29, 2025Updated 6 months ago
- Verilog RTL Implementation of DNN☆10Jun 26, 2018Updated 7 years ago
- An EDM-enabled PHY + a rack-level network simulator☆14Dec 11, 2024Updated last year
- SGQuant: Squeezing the Last Bit on Graph Neural Networks with Specialized Quantization☆11Aug 12, 2020Updated 5 years ago
- Memory management simulator, using Hashed Page Table. Page Replacement Algorithms: Least Recently Used (LRU) and Second Chance.☆10Apr 12, 2021Updated 4 years ago
- Secure Inference Resilient Against Malicious Clients☆14May 3, 2022Updated 3 years ago
- ☆13Oct 6, 2024Updated last year
- including compiler to encode DGL GNN model to instructions, runtime software to transfer data and control the accelerator, and hardware v…☆14Nov 19, 2023Updated 2 years ago
- Ripple: Accelerating Programmable Bootstraps for FHE with Wavelet Approximations☆12Aug 8, 2024Updated last year
- ☆11Jul 2, 2024Updated last year
- Repo to hold HammerBlade PyTorch port. Based on PyTorch v1.4.0☆14Oct 4, 2022Updated 3 years ago
- Linear-Time Self Attention with Codeword Histogram for Efficient Recommendation☆11Mar 23, 2021Updated 4 years ago
- ☆14Feb 5, 2025Updated last year
- ☆12Oct 25, 2022Updated 3 years ago
- PyTorch Codes for Haar Graph Pooling☆11Feb 16, 2023Updated 3 years ago
- Evaluation Code repository for the paper "ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers". (2023…☆13Dec 5, 2023Updated 2 years ago
- Multiple 1-stencil implementations using nvidia cuda.☆13Dec 2, 2017Updated 8 years ago
- ☆15Mar 26, 2025Updated 11 months ago
- Slowdown prediction module of Echo: Simulating Distributed Training at Scale☆13May 17, 2025Updated 9 months ago