A pupil in the computer world.(Felix Fu)
☆254Jun 12, 2024Updated last year
Alternatives and similar repositories for README
Users that are interested in README are comparing it to the libraries listed below
Sorting:
- ☆14Dec 23, 2023Updated 2 years ago
- Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, all…☆35Aug 28, 2023Updated 2 years ago
- Code for benchmarking the speed of DeepSeek R1 from different providers' APIs.☆16Mar 21, 2025Updated 11 months ago
- An easy way to use multi-GPUs to calculate multi-dimensional integration☆20Dec 8, 2022Updated 3 years ago
- compiler learning resources collect.☆2,684Mar 19, 2025Updated 11 months ago
- NeuroSpector: Dataflow and Mapping Optimizer for Deep Neural Network Accelerators☆21Mar 20, 2025Updated 11 months ago
- 笔记本☆18Oct 22, 2021Updated 4 years ago
- ☆29Jun 18, 2014Updated 11 years ago
- [TRETS'23, FPT'20] CHIP-KNN: Configurable and HIgh-Performance K-Nearest Neighbors Accelerator on Cloud FPGAs☆18Apr 9, 2024Updated last year
- BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.☆918Dec 30, 2024Updated last year
- how to learn PyTorch and OneFlow☆485Mar 22, 2024Updated last year
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆57Jul 23, 2024Updated last year
- NCCL Tests☆1,441Feb 9, 2026Updated 2 weeks ago
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…☆619Sep 11, 2024Updated last year
- ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.☆27Jul 6, 2023Updated 2 years ago
- Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"☆29Oct 1, 2024Updated last year
- A fast, accurate, and easy-to-integrate memory simulator that model memory system performance with bandwidth--latency curves.☆33Oct 18, 2025Updated 4 months ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆353Dec 3, 2025Updated 2 months ago
- ☆10Dec 3, 2019Updated 6 years ago
- ☆10Jun 1, 2023Updated 2 years ago
- Some microbenchmarks and design docs before commencement☆12Feb 1, 2021Updated 5 years ago
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,261Aug 28, 2025Updated 6 months ago
- Hybrid Mamba for Few-Shot Segmentation (NIPS 2024)☆42Oct 1, 2024Updated last year
- 《Machine Learning Systems: Design and Implementation》- Chinese Version☆4,764Apr 13, 2024Updated last year
- The source code to my book "The C++ Standard Library".☆41Mar 11, 2023Updated 2 years ago
- ☆84Dec 2, 2022Updated 3 years ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…☆3,176Updated this week
- 【NeurIPS 2024】Official implementation of "Visual Fourier Prompt Tuning"☆39Jan 17, 2025Updated last year
- HFAI deep learning models☆162May 25, 2023Updated 2 years ago
- 一个用YOLO足球视频分析的任务,检测视频中的人与球。 A task of football video analysis to detect people and balls in the video with YOLO☆12Sep 5, 2020Updated 5 years ago
- rabitq rust implementation☆10Feb 4, 2026Updated 3 weeks ago
- A smartphone specs API powered with the most trusted phone information website gsm arena.☆16Feb 1, 2024Updated 2 years ago
- ☆13Updated this week
- A bot that do auto search and gain points☆10Nov 2, 2023Updated 2 years ago
- derived from https://github.com/wilfredinni/python-cheatsheet☆10Nov 8, 2023Updated 2 years ago
- Automatically segment the vertebra from spinal X-ray images with UNet☆10Oct 3, 2023Updated 2 years ago
- A linter for the ruby language for VS Code☆11May 14, 2016Updated 9 years ago
- [AAMAS 2025] Privacy-preserving and Personalized RLHF, with convergence guarantees. The Code contains experiments for training multiple i…☆15Apr 16, 2025Updated 10 months ago
- 软件测试成长之路学习书籍☆11Jan 12, 2021Updated 5 years ago